Guide to Finding Data at Berkeley
- Conduct a literature search to determine what datasets were used in the past research to study the same topic, but be aware of the limitations of this approach.
- Citation of data is not as standardized as bibliographic citations and is often incomplete or ambiguous.
- The datasets used in published research aren't always available for other researchers. Reasons include:
- Legitimate issues like confidentiality (e.g. patient data) or copyright.
- Intentional decisions not to publish data due to such factors as lack of incentive or the absence of a culture of sharing in a discipline.
- Don't rule out personally contacting a researcher to inquire about the availability of their data.
- Seek help from a campus librarian or consultant.
- Search the web (or library databases) directly to find relevant data.
- When searching, carefully consider who is likely to collect the type of data you want and how it was likely collected. Examples of who might collect data are academic researchers, government agencies, NGO's, IGO's, or think tanks. Typical collection methods include surveys, administrative records, lab experiments, or environmental sensors.
- Try searching a specific research data repository.
- Don’t neglect your library collection (and specialist librarians).
There are a number of resources for finding data online and on campus. The list below is not exhaustive, but provides pointers to data repositories, guides, and campus resources where you can get help finding data.
- Hathi Trust Research Center - The Hathi Trust Research Center (HTRC) provides research access to search, collect, analyze, and visualize the full text of nearly 3 million public domain works and is intended for nonprofit and educational researchers.
- Digital Public Library of America - The Digital Public Library of America (DPLA) maintains an open API to encourage use of data contained in the DPLA platform of close to 12 million items (and growing) which range from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science.
- UCB Libraries Data Lab - The Library Data Lab offers consultations on research involving numeric data, including finding and recommending data sources and advising on technical data issues such as file format conversion, web scraping, and basic statistical software use.
- D-Lab Data Resources - The D-Lab helps Berkeley faculty, staff, and graduate students move forward with world-class research in data intensive social science. UC Data, which is now part of D-Lab, provides access to a broad range of computerized social science data to faculty, staff, and students at UC Berkeley, and helps researchers understand the content and context of social science data, including geography, weighting, complex designs, and missing data.
- GeoData@UC Berkeley - The UC Berkeley Libraries' geoportal where users can search, preview, display, map, and download geospatial data in a variety of formats including shapefiles, KML, and raster data formats. GeoData@UC Berkeley is part of the OpenGeoportal project.
- Library data collections - access to extensive databases and electronic resources on various subjects. Check out the subject guides in your discipline for information on resources available to you.
- Geospatial Innovation Facility Data Resources - The Geospatial Innovation Facility (GIF) at UC Berkeley's College of Natural Resources provides leadership and training across a broad array of integrated mapping technologies, including analysis and visualization of spatial data, application development, state-of-the-art geospatial and web technologies, and opportunities for researchers to learn how they can use spatial data.
- Haas School Business Library - The Thomas J. Long Business Library is a hub for business information at UC Berkeley. They source the highest quality resources -- academic and professional, print and online -- for conducting business research. A reference librarian is available to help you find the data you need.
Purchasing Data: The library accepts application for the Data Acquisition and Access Program, which assists in purchasing datasets.