The promise of data sharing and the role of data sharing policies
Sharing data is a tenet of science, yet commonplace in only a few subdisciplines. Publicly available datasets hold great potential benefit for efficient and effective scientific progress. Unfortunately, almost all of these benefits only indirectly impact the stakeholders who bear most of the costs: the primary data-producing investigators (ESA Joint Working Group On Data Sharing And Archiving, 2007). As a result, authors often actively or passively resist sharing their research datasets (Blumenthal et al., 2006; Ochsner et al., 2008; Savage & Vickers, 2009).
Recognizing that a culture change is unlikely to be achieved without policy guidance, funders and journals have begun to request and require that investigators share their research datasets upon study publication (Brown, 2003; Enriquez et al., 2010). Funders are motivated by the promise of resource efficiency and synergistic progress. The motivation for journals to act as an advocate and gatekeeper for data sharing is less straightforward. Journals seek to publish “well-written, properly formatted research that meets community standards” and in so doing have assumed monitoring tasks to “remind researchers of community expectations and enforce some behaviors seen as advantageous to the progress of science” (McCain, 1995). This role has been encouraged through many letters, white papers, and editorials that advocate for strong journal data sharing policies.
Policies should only be adopted if their benefits outweigh their costs, taking into account the wide array of stakeholders (Foster & Sharp, 2007). Requiring data sharing can raise controversy (Campbell, 1999; King, 1995), increase short-term system costs (Beagrie et al., 2009), and possibly dissuade authors from publishing in journals with unusually demanding requirements.
The goal of this study is quantify some of the short-term benefits and costs of journal data sharing policy adoption as reflected in the practices, attitudes, and opinions of data producing authors.
Dimensions of impact
A journal policy that requires its authors, as a condition of publication, to publicly archive supporting datasets may impact the practices, attitudes, and opinions of data producing authors in many ways. Here, we consider the possible short-term and medium-term impacts on authors who publish in the policy-adopting journals.
The most direct impact is likely a change in prevalence of public data archiving, and a shift to data archiving locations specified in the policy. Early studies suggest that authors who publish in journals with data sharing policies are more likely to publicly archive their data than authors who publish similar studies in similar journals without data sharing policies (Piwowar & Chapman, 2008; Piwowar, 2010), but these investigations were correlative and did not attempt to directly measure the impact of journal policies over time. Previous work has clarified the importance of archiving data in best-practice locations: authors who share data only upon request often discriminate against use and users (Campbell, 2000; Reidpath & Allotey, 2001). The permanence of data sharing locations is also a concern, since email addresses and website addresses for labs and projects have been shown to be transient even shortly after publication (Wren, 2008; Wren et al., 2006)
A question of particular interest is whether the adoption of data sharing mandates leads authors to perceive that data sharing has become a community norm in their field. If so, this shift in attitude may in and of itself lead to more data sharing: previous research has found that scientists “look beyond their individual interests to social cues from reference groups when deciding whether to withhold or share information requested by others” (Haas & Park, 2009). This is consistent with findings that information sharing is correlated with a perceived community norm of openness and sharing (Haeussler, 2010; Kuo & Young, 2008), as well as evidence that a scientist’s sharing decisions are influenced by expectation of reciprocity (Gouldner, 1957) and that prior experiences with sharing predicts future sharing decisions (Campbell & Bendavid, 2002).
A few previous studies have attempted to measure and estimate the consequences of sharing data for data-producing investigators (Blumenthal et al.; Gleditsch & Strand, 2003; Piwowar et al., 2007; Ventura, 2005), but there is much to be learned in this area to ground expectations, accurately estimate costs and benefits, and learn about ways policies can be improved to enhance positive experiences and avoid negative ones.
Journal publishers and editors may be particularly interested in whether authors view data sharing mandates more positively after the policies have gone into effect. In particular, knowing whether experience with a data sharing policy dissuades authors from choosing a similar publication venue in the future (Björk & Öörni, 2009) would be of great interest to journals considering the adoption of such policies.
Evaluating the impact of a journal data sharing policy on dataset reuse is also critical. Such an evaluation would require a longer timeframe than that of the current proposal and may be felt most strongly in a different author population; this could be pursued in future work.
Upcoming policy change is a useful opportunity for policy evaluation
Several journals plan to adopt the Joint Data Archiving Policy in early 2011, as described at http://datadryad.org/jdap.
This is a great opportunity to study the impact of journal data sharing mandates on the attitudes, behavior, and experiences of authors to help inform future policy decisions.
The JDAP policy requires data sharing in a public archive. Evidence suggests that stronger policies are associated with a higher frequency of data sharing (Piwowar & Chapman, 2008). Since JDAP requires data sharing as a condition of publication, studying this policy may be an opportunity to measure a strong effect. Also, since the adopting journals include almost all of the high-impact journals in the field of Evolutionary Science (as per the 2009 ISI Journal Citation Reports), it is unlikely that authors who dislike the policies will decide to instead publish in another venue.
We note that authors in evolution and ecology have been surveyed several other times to assess data sharing attitudes and experiences, but these questionnaires were distributed several years ago and have not been correlated to journal policy adoption (Findlay & Houlahan, 2005; Scherle et al., 2008).
Because the prevalence of data archiving itself can be accurately measured with a retrospective analysis of research artifacts, our proposed survey does not emphasize details of data archiving behavior. Instead, the questionnaire will focus on information that is difficult or impossible to collect without a timely survey of data-producing investigators: their attitudes and experiences.
Rigorous research is needed into the impact of journal policies
Although some data sharing mandates appear to have achieved almost universal sharing (Noor et al., 2006), this is not always the case (Piwowar, 2010). Several surveys have asked about authors’ perceived obstacles to sharing data (Findlay & Houlahan, 2005; Hedstrom & Niu, 2008; Scherle et al., 2008), but changes in these attitudes have not been monitored across a policy change. To our knowledge, the impact of data sharing mandates have only been subject to one evaluation: the editors of Physiological Genomics surveyed their authors and reviewers two years after instituting a policy that required public archiving of gene expression microarray data (Ventura, 2005), though unfortunately no baseline measurements are available about attitudes before the policy. Interestingly, a recent large-scale evaluation (Piwowar, 2010) into the prevalence of microarray data sharing found this same journal to have the highest rate of public archiving.
Robust studies of journal policies are infrequent, even in areas beyond data sharing. A systematic review in 2006 examined the effect of journal adoption of CONSORT requirements for standard reporting of clinical trials (Plint et al., 2006). It found that most publications about the journal policies were editorials or letters (699 of 1129 articles about the CONSORT policy). Of the studies, many were non-comparative (231 of 248), only a very few (12) were considered to be comparative studies with appropriate outcomes and sufficient detail, and only one study had predefined control group. The authors conclude that, “studies evaluating the effectiveness of the CONSORT checklist are methodologically weak” (Plint et al., 2006). This suggests that high quality evaluations are indeed rare and thus potentially valuable contributions.
The specific research questions for this study include:
- How do authors’ attitudes, experiences, and practices around public data archiving change when the journals they publish in adopt mandatory data archiving policies?
- Are changes specific to the journals that implement the policies, or do they extend to other journals in the same subfield?
Note: relevant references are collected in a public Mendeley group on journal data sharing policy impact project
Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program. Journal of the American Statistical Association, 105, 493-505.
Beagrie, N., Eakin-Richards, L., & Vision, T. (2009). Business models and cost estimation: Dryad repository case study.
Björk, B., & Öörni, A. (2009). A method for comparing scholarly journals as service providers to authors. Serials Review, 35, 62-69.
Blumenthal, D., et al. (2006). Data withholding in genetics and the other life sciences: Prevalences and predictors. Acad Med, 81, 137-145.
Brown, C. (2003). The changing face of scientific discourse: Analysis of genomic and proteomic database usage and acceptance. Journal of the American Society for Information Science and Technology, 54(10), 926 – 938
Campbell, E. (2000). Data withholding in academic medicine: Characteristics of faculty denied access to research results and biomaterials. Research Policy, 29, 303-312.
Campbell, E.G., & Bendavid, E. (2002). Data-sharing and data-withholding in genetics and the life sciences: Results of a national survey of technology transfer officers. Journal of Health Care Law and Policy, 6, 241.
Campbell, P. (1999). Controversial proposal on public access to research data draws 10,000 comments. The Chronicle of Higher Education, April 16, A42.
Enriquez, V., et al. (2010). Data citation in the wild. poster at IDCC [accepted].
ESA Joint Working Group On Data Sharing And Archiving (2007). Obstacles to data sharing in ecology, evolution, and organismal biology: Workshop.
Findlay, C.S., & Houlahan, J. (2005). Incentives and disincentives to data sharing among ecologists. ESA Annual meeting.
Foster, M.W., & Sharp, R.R. (2007). Share and share alike: Deciding how to distribute the scientific and social benefits of genomic data. Nature Reviews Genetics, 8, 633-639.
Gleditsch, N.P., & Strand, H. (2003). Posting your data: Will you be scooped or will you be famous? International Studies Perspectives, 4, 89-97.
Gouldner, A.W. (1957). Theoretical requirements of the applied social sciences. American Sociological Review, 22(1), 92-102.
Haas, M.R., & Park, S. (2009). To share or not to share? Professional norms, reference groups, and information withholding among life scientists. Organization Science, 21, 873-891.
Haeussler, C. (2010). Information-sharing in academia and the industry: A comparative study. Research policy [forthcoming[
Hedstrom, M., & Niu, J. (2008). Research forum presentation: Incentives to create “Archive-ready” Data: Implications for archives and records management. Society of American Archivists Annual Meeting.
King, G. (1995). A revised proposal, proposal. PS: Political Science and Politics, XXVIII, 443-499.
Kuo, F., & Young, M. (2008). A study of the intention–action gap in knowledge sharing practices. Journal of the American Society for Information Science and Technology, 59, 1224-1237.
McCain, K. (1995). Mandating sharing: Journal policies in the natural sciences. Science Communication, 16, 403-431.
Noor, M.A.F., Zimmerman, K.J., & Teeter, K.C. (2006). Data sharing: How much doesn’t get submitted to genbank? PLoS biology, 4, e228.
Ochsner, S., et al. (2008). Much room for improvement in deposition rates of expression microarray datasets. Nature Methods, 5.
Piwowar, H., & Chapman, W. (2008). A review of journal policies for sharing research data. ELPUB, Toronto.
Piwowar, H., Day, R., & Fridsma, D. (2007). Sharing detailed research data is associated with increased citation rate. PLoS ONE, 2.
Piwowar, H.A. (2010). Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data. University of Pittsburgh, PhD Dissertation.
Plint, A.C., et al. (2006). Does the consort checklist improve the quality of reports of randomised controlled trials? A systematic review. Med J Aust, 185, 263-267.
Reidpath, D., & Allotey, P. (2001). Data sharing in medical research: An empirical investigation. Bioethics, 15, 125-134.
Savage, C.J., & Vickers, A.J. (2009). Empirical study of data sharing by authors publishing in plos journals. PLoS ONE, 4, e7078.
Scherle, R., et al. (2008). Building support for a discipline-based data repository. poster presented at the Third International Conference on Open Repositories, 1-4.
Soreide, K., & Winter, D.C. (2010). Global survey of factors influencing choice of surgical journal for manuscript submission. Surgery, 147(4), 475-480.
Ventura, B. (2005). Mandatory submission of microarray data to public repositories: How is it working? Physiol Genomics, 20, 153-156.
Wren, J.D. (2008). Url decay in medline–a 4-year follow-up study. Bioinformatics, 24, 1381-1385.
Wren, J.D., Grissom, J.E., & Conway, T. (2006). E-mail decay rates among corresponding authors in medline. The ability to communicate with and request materials from authors is being eroded by the expiration of e-mail addresses. EMBO Rep, 7, 122-127.
Wren, J.D., et al. (2007). The write position. A survey of perceived contributions to papers based on byline position and number of authors. EMBO reports, 8, 988-991.