Data Sets & Data Sharing

By Barry Pearce, Dec 2020. (Updated Feb.2021)

In my article on Improving Methodology in Musicology I suggested that we take advantage of the scientific principles of data sharing and reproduceability. In this article I would like to explain how the BSIP database can help researchers share their data and then how we may quickly and easily access those data sets.

Having collected over 17000 iconographic sources of bowed string instruments over six years of research you might think that I can easily recreate any data set from published articles or books. Yet, despite having such a large database available it has proved frustratingly difficult to fully recreate a number of datasets.

We shall use an article by William Raymaekers published in GSJ LXXI March 2018 as an example. The article was chosen simply because it is recent, focuses on bowed string instruments and uses a data set that takes significant effort to recreate. In the GSJ article approximately 112 sources are listed and a number of problems will be encountered when trying to recreate this data set. Two of the references are not of  interest to us as they do not contain bowed instruments. Four other references double up specifying additional references. Unfortunately the list contains 2 impossible references ("a number of vanitas still life paintings in private collections" and another simply lists inhabited place names). I have failed to properly identify/obtain 5 sources and I believe I have one of the specified sources but i cannot be certain. So out of 114 specified sources, I can only confirm 107 of them, one of them is uncertain.

I highlight these issues, not with the intent to criticise a respected researcher who has followed current practice, but to show how even recent peer reviewed journal articles continue to suffer data set and referencing issues. This is not just a historical issue, but a very present and persistent problem that we need to address.

There are three steps to solving this problem. First we need the sources available without barriers under Open Access. Second, we need to reference the data in a reliable manner. Third, we need to make the data set readily accessible. I shall address each of these in turn, explaining how the BSIP database supports each of these.

Data sharing via BSIP Iconography Database

The BSIP database is the perfect way to share sources specialising in bowed string instrument iconography and supporting Open Access. When processing a data set the first step is to identify which sources are already present in the BSIP database. New sources can be added, and existing sources can be updated with any new information or better imagery.

Reliable referencing

Once a source is available in the BSIP database it may be reliably referenced. The BSIP reference should be included in the citation when a work is referenced, and in any captions of images. Similarly should sources be listed for an example in an appendix or on a website the BSIP Reference should also appear there. Inclusion of the BSIP reference provides absolute clarity about which source is being referred to.

Making the data set readily accessible

This last step is two-fold. Sources in the BSIP Database need to be marked as being part of the data set to be shared. This is done through a tag set being created specifically for the data set, and each source in the data set is then tagged with a tag from that tag set. Where only simple association is required a single tag may suffice. However tags support custom naming, descriptions and web links.

Once marked the data set may be readily obtained via the IRP. There are two ways of obtaining a data set; Browse by Tag Set or by using the IRP Advanced Search and querying on the Tag. The first method provides very fast access to the dataset and has the advantage that the tag itself is shown and is available as a sort order. The advanced search allows further filtering to be performed within that data set but does not show the tag in the results.

Continuing to use William Raymaekers GSJ 2018 article data set as an example, I have ensured all of the 107 sources I managed to identify are in the database. For this data set we use the multiple tags and the tag name is used to tie in with the article reference number in Appendix A, thus providing direct traceability between reference and source, in a data set that was published before BSIP referencing was available. We can see how both access methods work:

Browsing via Tag Set

Advanced Search on Tag

BSIP IRP also supports permalinks for Tag Sets making URLs succinct and dependable. The permalink for the browse by tag set for this dataset is:

https://bsip.org.uk/ref/ts/raymaekers2018

If you are a researcher and wish to share or connect your data set please get in contact!

Raymaekers GSJ LXXI march 2018 Data set
Browsing the Raymaekers GSJ LXXI March 2018 data set in the IRP