Openness may sound self-evident, but in fact it can mean different things, even within the Open Science community (Pasquetto, Sands & Borgman, 2015). To align understanding and to ensure common goals in the transition to Open Science, clear definitions are needed. The Open Knowledge Foundation defines knowledge as open "if anyone is free to access, use, modify, and share it – subject, at most, to measures that preserve provenance and openness." (Open Definition 2.1) To account for disciplinary and procedural affordances in different research endeavours, the term “intelligent openness” has been coined by Boulten et al. (2012), denoting an openness that is “as open as possible and as closed as necessary”. Overall, a key effect of (intelligent) openness pertains to making “scientific knowledge openly available, accessible and reusable for everyone“ (United Nations Educational, Scientific and Cultural Organization, 2021, Annex p. 4).
Repositories are central infrastructures for opening data, enabling "broad, equitable and ideally open access to content" (European Commission, 2021, p. 155). In this capacity, repositories that provide open access to research output can also be considered "open".
This blog post is not intended to present a new definition of open repositories, but an approach to making certain aspects of openness visible in re3data, with the intention to help researchers find repositories that are suitable for publishing their data.
Figure 1: The re3data icon system.
Generally, access conditions in re3data are covered by three elements (see re3data Metadata Schema): databaseAccessType, dataAccessType and dataUploadType. databaseAccessType refers to the access to the research data repository in general, indicating whether the metadata can be found and accessed by users and services. dataAccessType describes the access to datasets in the repository, and dataUploadType indicates whether researchers can upload datasets to the service. The values for these three elements are constricted by controlled vocabularies. The definitions of the values "open", "restricted" and "closed" are identical for all three elements. Restrictions include requiring fees, registration or institutional membership. The value "embargoed" is only used for the element dataAccessType.
For the re3data icon system, only the elements databaseAccessType and dataAccessType are considered to determine the openness of a repository, as is shown in figure 2.
If databaseAccessType is "open" and dataAccessType, which can occur multiple times for a repository, includes the value "open", the repository is considered open, because users are able to access metadata without restrictions and can access at least some of the repository's holdings openly. (see figure 2 A)
If databaseAccessType is either "open" or "restricted" and dataAccessType does not include the value "open", the repository is considered restricted, because users are able to access metadata with or without restrictions, but access to all data is restricted. (see Figure 2 B)
If databaseAccessType is "closed", the repository is closed, because external users can not overcome access barriers to metadata records. The values for dataAccessType do not change this categorization. (see figure 2 C)
Figure 2: Schematic overview of determining the openness of repositories indexed in re3data.
As mentioned before, this evaluation does not include the element dataUploadType. This is because some services indexed in re3data do not offer data upload in general due to the type of infrastructure they represent. For example, some services offer access to data generated in a specific project, but they do not accept data from researchers beyond that specialized scope. Users looking for services that offer data upload in addition to open content can use dataUploadType as an additional filter in the GUI or in API requests.
In November 2021, most repositories (66.6 %, in total 1845) indexed in re3data placed restrictions on data upload. Most often, registration is required.
Based on the method outlined above, the openness of repositories in terms of the accessibility of metadata and data is evaluated and displayed via the icon system in re3data. An analysis of re3data metadata gives an overview of how research data repositories are distributed across the three categories open, restricted and closed.
Figure 3 shows that currently, the majority of all repositories is considered open (85.6 %, in total 2366), some are restricted (13.7 %, in total 378), and only a few are closed (0.7 %, in total 20). Almost all (94.5 %, in total 2611) repositories make metadata records openly available (databaseAccessType = open). Only 13.7 % (in total 380) of all repositories do not offer open access to any of the datasets they store.
Figure 3: Openness of repositories indexed in re3data.
The landscape of "open repositories" is diverse and manifold, as definitions of open repositories differ in the aspects they include. The icon system in re3data is designed to give users an intuitive overview of the accessibility of metadata and data.
The analysis of re3data metadata shows that it is very common among repositories to make metadata records available without access barriers. For many repositories, unrestricted access to metadata is important, for example to enable users to search for, retrieve and reuse the stored datasets. The method for determining repository openness in re3data outlined in this blogpost reflects the idea that providing unrestricted access to metadata is the first step on the path to becoming an open repository.
Making data publicly available promotes transparency and collaboration in science and is recommended by funding agencies worldwide. However, some repositories restrict access to datasets, because not all datasets can or should be published openly (Levin & Leonelli, 2017). For example, some repositories store research data that are subject to access restrictions for data protection reasons. Therefore, the degree of openness of the data a repository stores is very situational.
We are currently in the process of updating the re3data Metadata Schema, and we highly value your input: What do you think about the method for describing openness of repositories based on the access to metadata and data in re3data? Are there any aspects of openness that you would like to see in the next version of the re3data Metadata Schema?
Boulton, G., Campbell, P., Collins, B., Elias, P., Hall, W., Laurie, G., O’Neill, O., Rawlins, M., Thornton, J., Vallance, P., & Walport, M. (2012). Science as an open enterprise. Royal Society. http://royalsociety.org/policy/projects/science-public-enterprise/report/(accessed 2021-12-01)
Draft Recommendation on Open Science (41 C/22). (2021). United Nations Educational, Scientific and Cultural Organization. https://unesdoc.unesco.org/ark:/48223/pf0000378841 (accessed 2021-12-01)
European Commission. (2021). EU Grants Annotated Model Grant Agreement. https://ec.europa.eu/info/funding-tenders/opportunities/docs/2021-2027/common/guidance/aga_en.pdf (accessed 2021-12-01)
Levin, N., & Leonelli, S. (2017). How Does One “Open” Science? Questions of Value in Biological Research. Science, Technology, & Human Values, 42(2), 280–305. https://doi.org/10.1177/0162243916672071
Pasquetto, I. V., Sands, A. E., & Borgman, C. L. (2015). Exploring openness in data and science: What is “open,” to whom, when, and why? Proceedings of the Association for Information Science and Technology, 52(1), 1–2. https://doi.org/10.1002/pra2.2015.1450520100141