Active Research Data Storage Guidance Grid

What services can I use to store data during the active phase of my research project?  

Data storage during the active phase of research presents challenges, regardless of whether you are housing an acquired data set or collecting and analyzing your own data. Whether you work with text or time series, sensors or surveys, recordings or RNA sequences, many factors influence choice of storage, and the optimal selection may vary at different stages of your work. This guidance grid summarizes many active data storage options available to campus researchers. See Where can I store my data? New guidance available for researchers for a discussion of key issues to consider when designing a data storage strategy.

The Research Data Management program evaluates the storage needs of campus scholars and develops services to fit those needs. We can help you navigate this complex landscape -- please contact RDM Consulting (researchdata@berkeley.edu) for assistance.  

Note about Data Protection and Security: The “Data protection” column refers to data protection levels (PL0 to PL3) that are defined in UC Berkeley’s Data Classification Standard. In addition to the services listed below, there are other services or solutions that can be used for storing and managing sensitive and restricted data. Please contact RDM Consulting to discuss your sensitive or restricted data needs and options.

Title Description Best suited for Not well suited for Data protection levels Cost Connection methods
IST Performance Storage

The IST Storage and Backup team provides highly available and highly scalable systems that are offered in two billing tiers: Performance and Utility. Managed storage for high-performance applications.
Hosted in UCB data center (with UCSD backup available). No direct web access.

Storing data during data preparation and high performance computation Large group file sharing, retaining group or departmental ownership of files Off-site protection copies PL0, PL1, PL2 Recurring subscription Block storage, Mountable file-based storage
IST Utility Storage

The IST Storage Team offers two data storage options depending on what type of access is needed: Storage Area Network (SAN) and Network Attached Storage (NAS). These are both highly available and highly scalable systems that are offered in two billing tiers: Performance and Utility. No direct web access. Managed storage for less I/O intensive needs or less frequently accessed materials.
Hosted in UCB data center (with UCSD backup available).

Storage of large volumes of data with limited I/O requirements Low performance computation File shares Web content delivery Backups High performance applications PL0, PL1, PL2 Recurring subscription Block storage, Mountable file-based storage
bDrive

bDrive is a collaborative authoring platform where you can store files and collaborate with collaborators. It is an enterprise version of Google Drive, which means that it is used under an agreement approved by UC Regents. UC does not accept the vendor's requirement that we waive their liability, making bDrive more secure that your personal Google Drive. bDrive is available to all current faculty, staff and students, and comes with free, unlimited storage.

Gathering data, source materials, and documentation; Parking data for later preparation, analysis; Collaboration; File-sharing with a limited number of collaborators; Off-site protection copy, especially for very large amounts of data Backup PL0, PL1 Free to UC users API, Syncing app, Web browser, WebDAV
Box

A cloud-hosted platform that allows researchers to store and share documents, photos, research materials and other files for collaboration. Box allows users to simultaneously edit Microsoft Office documents. Berkeley Box is available to all current faculty, staff and students, and comes with free, unlimited storage. Key features include: Unlimited storage Share outside UCB Some additional role capabilities Collaborative editing using Microsoft Office Online Limitations include: 15 GB maximum file size Some FTPS limitations for very large transfers

Gathering data, source materials, and documentation; parking data for later preparation, analysis Collaboration Large group file sharing Retaining group or departmental ownership of files Off-site protection copy Backup High-speed, large volume data transfer PL0, PL1 Free to UC users API, FTP, Syncing app, Web browser, WebDAV
CalShare

CalShare is a tool for creating and managing web sites for collaboration purposes. You can easily create and share sites, documents, images, lists, discussions and surveys. CalShare is also approved by the Campus Information Security and Privacy Committee as an appropriate storage location for sensitive or restricted data. CalShare is run as a recharge service for departments.

CalShare includes:
Project/site dashboard
Integrates well with Microsoft Office, OneDrive for Business
Off-site disaster recovery

Specialized collaboration for dedicated use by large or long-term projects Storing secure (Protection Level 2) data Storing more than a small amount of data PL0, PL1, PL2 Free to UC users API, Syncing app, Web browser
Cloud Archival Storage Solution (CASS) (UCLA)

CASS is the Cloud Archival Storage Service provided by UCLA. Though supported by a team at UCLA, CASS is available to researchers at other UC campuses and provides a relatively low cost solution for basic large storage. CASS is described as a "Multi-petabyte scale service for network-based storage". Cannot limit access by individual or group.

Gathering data, source materials, and documentation; parking data for later preparation, analysis Offsite protection copy Backup (with CrashPlan ProE) PL0 Pay per use Globus
Amazon Web Services (AWS) Storage Services

Amazon offers multiple kinds of storage that cost different amounts and meet different storage use cases. Please contact RDM Consulting if you are interested in these storage options.

Recurring subscription
XSEDE Storage Services

XSEDE is a single virtual system that scientists can use to interactively share computing resources, data and expertise. People around the world use these resources and services — things like supercomputers, collections of data and new tools — to improve our planet. XSEDE resources include several services for storing research data. Please contact RDM Consulting at researchdata@berkeley.edu if you are interested in these storage options.

Free to public
Savio HPC Parallel File System

Performant "global scratch" storage close to Savio/HPC computation. Large pool (885TB), shared among all Savio users.

Temporary read/write storage during computations on Savio with moderate to heavy I/O demands. PL0 Free to UC users
Savio HPC Condo Storage

Berkeley Research Computing (BRC) offers a Condo Storage service for researchers who are Savio Condo Cluster contributors and need additional persistent storage to hold their data sets while using the Savio cluster.

Users or research groups that need to import, work on, and store large data sets to support their use of Savio Users whose computation includes heavy I/O; these users should stage data on the parallel filesystem. PL0 Free to UC users, Pay per use
XSEDE Bridges computing and storage

XSEDE national infrastructure facility hosted at the Pittsburgh (PA) Supercomputer Center. Campus XSEDE champion is Aaron Culich (as of 2016). XSEDE offers free computing and storage to qualified researchers through a competitive application process.

Active storage when using the Bridges compute cluster provided by XSEDE. Free to public