What services can I use to store data during the active phase of my research project?
Data storage during the active phase of research presents challenges, regardless of whether you are housing an acquired data set or collecting and analyzing your own data. Whether you work with text or time series, sensors or surveys, recordings or RNA sequences, many factors influence choice of storage, and the optimal selection may vary at different stages of your work. This guidance grid summarizes many active data storage options available to campus researchers. See Where can I store my data? New guidance available for researchers for a discussion of key issues to consider when designing a data storage strategy.
The Research Data Management program evaluates the storage needs of campus scholars and develops services to fit those needs. We can help you navigate this complex landscape -- please contact RDM Consulting (email@example.com) for assistance.
Note about Data Protection and Security: The “Data protection” column refers to data protection levels (PL0 to PL3) that are defined in UC Berkeley’s Data Classification Standard. In addition to the services listed below, there are other services or solutions that can be used for storing and managing sensitive and restricted data. Please contact RDM Consulting to discuss your sensitive or restricted data needs and options.
|Title||Description||Best suited for||Not well suited for||Data protection levels||Cost||Connection methods|
|IST Performance Storage||
The IST Storage and Backup team provides highly available and highly scalable systems that are offered in two billing tiers: Performance and Utility. Managed storage for high-performance applications.
|Storing data during data preparation and high performance computation Large group file sharing, retaining group or departmental ownership of files Off-site protection copies||PL0, PL1, PL2||Recurring subscription||Block storage, Mountable file-based storage|
|IST Utility Storage||
The IST Storage Team offers two data storage options depending on what type of access is needed: Storage Area Network (SAN) and Network Attached Storage (NAS). These are both highly available and highly scalable systems that are offered in two billing tiers: Performance and Utility. No direct web access. Managed storage for less I/O intensive needs or less frequently accessed materials.
|Storage of large volumes of data with limited I/O requirements Low performance computation File shares Web content delivery Backups||High performance applications||PL0, PL1, PL2||Recurring subscription||Block storage, Mountable file-based storage|
A collaborative authoring platform where you can store files and collaborate with a limited number of collaborators. The storage size is unlimited.
Key features include:
|Gathering data, source materials, and documentation; parking data for later preparation, analysis Collaboration File-sharing with a limited number of collaborators Off-site protection copy||Backup High-speed, large volume data transfer||PL0, PL1||Free to UC users||API, Syncing app, Web browser|
A cloud-hosted platform that allows researchers to store and share documents, photos, research materials and other files for collaboration. Box allows users to simultaneously edit Microsoft Office documents. Berkeley Box is available to all current faculty, staff and students, and comes with free, unlimited storage. Key features include: Unlimited storage Share outside UCB Some additional role capabilities Collaborative editing using Microsoft Office Online Limitations include: 15 GB maximum file size Some FTPS limitations for very large transfers
|Gathering data, source materials, and documentation; parking data for later preparation, analysis Collaboration Large group file sharing Retaining group or departmental ownership of files Off-site protection copy||Backup High-speed, large volume data transfer||PL0, PL1||Free to UC users||API, FTP, Syncing app, Web browser, WebDAV|
CalShare is a tool for creating and managing web sites for collaboration purposes. You can easily create and share sites, documents, images, lists, discussions and surveys. CalShare is also approved by the Campus Information Security and Privacy Committee as an appropriate storage location for sensitive or restricted data. CalShare is run as a recharge service for departments.
|Specialized collaboration for dedicated use by large or long-term projects Storing secure (Protection Level 2) data||Storing more than a small amount of data||PL0, PL1, PL2||Free to UC users||API, Syncing app, Web browser|
|Cloud Archival Storage Solution (CASS) (UCLA)||
CASS is the Cloud Archival Storage Service provided by UCLA. Though supported by a team at UCLA, CASS is available to researchers at other UC campuses and provides a relatively low cost solution for basic large storage. CASS is described as a "Multi-petabyte scale service for network-based storage". Cannot limit access by individual or group.
|Gathering data, source materials, and documentation; parking data for later preparation, analysis Offsite protection copy Backup (with CrashPlan ProE)||PL0||Pay per use||Globus|
|Amazon Web Services (AWS) Storage Services||
Amazon offers multiple kinds of storage that cost different amounts and meet different storage use cases. Please contact RDM Consulting if you are interested in these storage options.
|XSEDE Storage Services||
XSEDE is a single virtual system that scientists can use to interactively share computing resources, data and expertise. People around the world use these resources and services — things like supercomputers, collections of data and new tools — to improve our planet. XSEDE resources include several services for storing research data. Please contact RDM Consulting at firstname.lastname@example.org if you are interested in these storage options.
|Free to public|
|Savio HPC Parallel File System||
Performant "global scratch" storage close to Savio/HPC computation. Large pool (885TB), shared among all Savio users.
|Temporary read/write storage during computations on Savio with moderate to heavy I/O demands.||PL0||Free to UC users|
|Savio HPC Condo Storage||
Berkeley Research Computing (BRC) offers a Condo Storage service for researchers who are Savio Condo Cluster contributors and need additional persistent storage to hold their data sets while using the Savio cluster.
|Users or research groups that need to import, work on, and store large data sets to support their use of Savio||Users whose computation includes heavy I/O; these users should stage data on the parallel filesystem.||PL0||Free to UC users, Pay per use|
|XSEDE Bridges computing and storage||
XSEDE national infrastructure facility hosted at the Pittsburgh (PA) Supercomputer Center. Campus XSEDE champion is Aaron Culich (as of 2016). XSEDE offers free computing and storage to qualified researchers through a competitive application process.
|Active storage when using the Bridges compute cluster provided by XSEDE.||Free to public|