Data Sharing

Why share?

Sharing well-documented data in an accessible way is important across the UC Berkeley research landscape. It promotes collaboration, facilitates reproducibility, and makes you a more competitive researcher by sustaining your data’s impact. Communicating with stakeholders about data sharing early on in the project lifecycle is a critical step in research data management planning. Other decisions, like selecting appropriate metadata standards, file formats, and repositories may all hinge on the choices you make regarding when, how and with whom you share your data. If your research is supported by a federal agency funding source, you may be required to share your data. The National Science Foundation, National Oceanic and Atmospheric Administration, and National Institute of Health all require that data be shared. If your research is not funded by a federal agency with an established sharing policy, consider contacting an RDM consultant for assistance creating a data sharing plan that meets your needs.

Sharing data on your own website

If you choose to share your data on a personal or departmental website, it is important to consider who will be responsible for maintaining that resource in the future. Websites are powerful tools for dissemination, but typically have minimal infrastructure for preservation. Unpaid hosting fees, inadvertent directory structure changes, or neglectful maintenance leading to obsolete file formats all pose a risk to data stored on websites. Consider using websites to offer access to your data in conjunction with a more secure deposit in a trusted repository.

Sharing and discovery within a repository

As a general rule, the more domain-specific a repository is, the more effective its discovery system has the potential to be. For instance, in the National Cancer Institute (NCI) Genomic Data Commons, users can search by disease and sample type. This is because datasets being ingested into the repository have consistent structures and similar characteristics. In a repository that is not domain-specific, like Figshare, datasets are so heterogeneous that the only characteristics two recently ingested packages might share are having a project title, researcher name, and upload date associated with them. It may seem that there is a definite advantage to always depositing in a domain repository, and for many researchers this is the best choice. However, there are advantages to depositing in more domain-agnostic repositories - they have a greater user base outside of your specialization, and may offer opportunities for interdisciplinary collaboration.

Research data lifecycle: