International collaboration has become very common in the production of scientific knowledge (Coccia and Wang, 2016; Monastersky & Van Noorden, 2019) and contributes substantially to the research output of several countries (Adams, 2013). Moreover, international collaboration can also support the operation and maintenance of pervasive research infrastructures, for example research data repositories (Kindling et al., 2017).

Even though science operates in this international context, in some cases it can still be desirable to opt for a national, regional or local perspective on aspects of scholarly communication, for example for the purposes of developing and monitoring resources. This need for a location based perspective can also apply to research data repositories, even though it might be difficult to achieve for several reasons.

For example, repositories offer services to both data providers and data users. These groups might differ in terms of their geographical distribution – particularly data users might access and reuse datasets from all over the world, even if the repositories cater mainly to data providers from specific countries. Repositories are also complex infrastructures. Several institutions can be involved in the operation of a repository; and sometimes partners from multiple countries take on various roles.

Despite these difficulties, some authors have successfully used the metadata provided by re3data to study research data repositories from a specific region. For example, re3data metadata contributed to analyze usage and costs associated with research data management (e.g., von der Heyde, 2019) or was used for studying the repository landscape in a specific country (e.g., Li et al., 2022).

In this blog post, we want to illustrate the characteristics of location information in re3data on a map, analyze international collaborations that run research data repositories, and describe what to watch out for when using re3data metadata.

Location information in re3data

Location information in re3data is not directly associated with a repository, but instead refers to the legal address of the institutions responsible for a repository. This approach to modeling location information was chosen because repositories are virtual services, meaning that the data might not be stored on-site. As a result, data is often not stored where it is curated, making it difficult to determine “where a repository is”.

In the re3data Metadata Schema, this information is stored in the element institutionCountry, based on the ISO 3166-1 standard for country names. The controlled vocabulary also includes values for international and EU institutions. The filter “Countries” in the graphical user interface of the re3data search is built on this element.

It is important to note that institutional responsibilities for repositories are very complex and dynamic. Multiple institutions can be responsible for a repository, and responsibilities often change over time. Therefore, if a large number of institutions is listed as partners for a repository, the re3data Editorial Team might not list all of them, but adds a URL to a website listing all institutions to the remarks. For all 2792 repositories listed in re3data currently, there is an average of 2.9 institutions being responsible for the repository.

Institutions associated with repositories indexed in re3data are distributed widely. However, due to the history of the service’s development and the research data repository landscape in general, there currently is an emphasis on European and North American countries (see Figure 1). We are continuously working on covering the white spots on the map. If you know a repository that is not listed in re3data yet, you might help us by filling out the re3data suggest form.

repository_map

Figure 1: Distribution of institution countries associated with repositories indexed in re3data (highlighted countries are mentioned in re3data records; not shown: international and EU institutions).

International collaborations on running repositories

Because multiple institutions from different countries can be specified for a repository in re3data, re3data metadata can be used to study international collaborations that run research data repositories.

As of February 2022, 490 repositories are associated with international and EU institutions. Excluding these institutions to investigate collaborations between individual countries shows that institutions from up to 11 different countries are responsible for a repository. However, repository responsibility overall does not indicate a broad international collaboration in running the repository infrastructure, with institutions from 1.19 countries involved on average. Surprisingly, only 12.4 % of repository entries in re3data are associated with institutions from more than one individual country (excluding international and EU institutions). As shown in Figure 2, collaborations between the USA and Great Britain account for the largest percentage of shared international responsibilities for the repositories listed in re3data.

chord_chart

Figure 2: International collaborations of institutions associated with repositories indexed in re3data (one line represents a common occurrence of two countries in a re3data entry; each combination of countries is counted once per repository; not shown: international and EU institutions).

How to use re3data location information

In this blog post, we described how location information is structured in re3data and we used this information to analyze international collaborations occurring in jointly run research data repositories.

If you are considering using re3data metadata, you should be aware that location information in re3data is not directly associated with a repository, but refers to the institutions responsible for a repository. This is great for analyzing international collaborations in the context of repositories, but it might be an issue if you want to restrict your analysis to repositories from a specific region. This might not always matter (greatly), because as we mentioned above, only 12.4 % of repositories are associated with institutions from more than one country. However, some repositories are very international, and this can affect analyses of the repository landscape, particularly when an analysis requires distinct groups - for example comparisons.

If you are interested in using the re3data API and want to explore the data yourself, you are free to do so: re3data metadata is licensed under the very permissive CC0 license, and – in the open science spirit – anyone is free to (re-)use this information for various purposes. The service also offers a well documented REST API. For an introduction to using the API, we recommend the adaptable Jupyter Notebooks we developed that outline example use cases step by step. Please feel free to contact us if your idea for using re3data metadata is not covered.

References

Adams, J. (2013). The fourth age of research. Nature, 497(7451), 557–560. DOI: 10.1038/497557a

Coccia, M., & Wang, L. (2016). Evolution and convergence of the patterns of international scientific collaboration. Proceedings of the National Academy of Sciences, 113(8), 2057–2061. DOI: 10.1073/pnas.1510820113

Imker, H. J. (2020). Who Bears the Burden of Long-Lived Molecular Biology Databases? Data Science Journal, 19(1), 8. DOI: 10.5334/dsj-2020-008

Kindling, M., Pampel, H., van de Sandt, S., Rücknagel, J., Vierkant, P., Kloska, G., Witt, M., Schirmbacher, P., Bertelmann, R., & Scholze, F. (2017). The Landscape of Research Data Repositories in 2015: A re3data Analysis. D-Lib Magazine, 23(3/4). DOI: 10.1045/march2017-kindling

Li, C., Zhou, Y., Zheng, X., Zhang, Z., Jiang, L., Li, Z., Wang, P., Li, J., Xu, S., & Wang, Z. (2022). Tracing the footsteps of open research data in China. Learned Publishing, 35(1), 46–55. DOI: 10.1002/leap.1439

Monastersky, R., & Van Noorden, R. (2019). 150 years of Nature: A data graphic charts our evolution. Nature, 575(7781), 22–23. DOI: 10.1038/d41586-019-03305-w

Von der Heyde, M. (2019). Open Research Data: Landscape and cost analysis of data repositories currently used by the Swiss research community, and requirements for the future. DOI: 10.5281/ZENODO.2643460

Previous Post