Finding Data


General Tips

  • Always keep your research question in mind; this will help you determine the relevance of any data you find
  • Consider if you need complete data or if a sample would suffice
  • Do you need data at a point in time, or trend/time series data? current, or historical?
  • Are there variables that are crital, and others that are ideal?
    What is your desired data unit of analysis? Humans? Industries? Specimens? Genetic information?
  • Do you want data from a specific country or other geography?

Library Resources

Discipline specific help

Consult a subject librarian relevant to your topic area of assistnace, and/or consult these guides to start:​


Advanced Keyword Search in OskiCat

  • Enter your search terms in the box(es) at the top; e.g. India and economics
  • Scroll down to Material Types and select Computer/Data Files
  • Hit Enter or Submit


Online Data Repositories

Data Repository Description
ICPSR the Inter-university Consortium for Political and Social Research maintains and provides access to a vast archive of social science data for research and instruction. ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.
GenBank GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.
dbGaP The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans.
Federal Statistical Research Data Center (FSRDC) Federal Statistical Research Data Centers are partnerships between federal statistical agencies and leading research institutions. They are secure facilities providing authorized access to restricted-use microdata for statistical purposes only.
ZTRAX/data partnerships In cooperation with the DLab and with support from The Center on the Economics and Demography of Aging, the Demography Lab houses the complete ZTRAX dataset. These data cover a huge subset of current and historical US real estate transactions.
UCSF Clinical Data Research access to UCSF electronic medical record data (APeX) - Research Data Browser (RDB), Clinical Data Warehouse (CDW), and more.
UCSF Population Health Data A searchable list of more than 100 datasets for population health, health services, and health equity research. The site is a repository for federal, state, local, and tribal government information, made available to the public.
Dryad The Dryad Digital Repository is a curated resource that makes research data discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of data types.
Figshare Figshare is an online open access repository where researchers can preserve and share their research outputs.It is free to upload content and free to access, in adherence to the principle of open data.

Other resources that can point you to subject-specific data repositories:




Campus Resources

There are a number of resources for finding data online and on campus. The list below is not exhaustive, but provides pointers to data repositories, guides, and campus resources where you can get help finding data.

    • Purchasing Data -  

          • Conduct a literature search to determine what datasets were used in the past research to study the same topic, but be aware of the limitations of this approach.

            • Citation of data is not as standardized as bibliographic citations and is often incomplete or ambiguous.
            • The datasets used in published research aren't always available for other researchers. Reasons include:
              • Legitimate issues like confidentiality (e.g. patient data) or copyright.
              • Intentional decisions not to publish data due to such factors as lack of incentive or the absence of a culture of sharing in a discipline.
            • Don't rule out personally contacting a researcher to inquire about the availability of their data.
          • Seek help from a campus librarian or consultant.
          • Search the web (or library databases) directly to find relevant data.
            • When searching, carefully consider who is likely to collect the type of data you want and how it was likely collected. Examples of who might collect data are academic researchers, government agencies, NGO's, IGO's, or think tanks. Typical collection methods include surveys, administrative records, lab experiments, or environmental sensors.
            • Try searching a specific research data repository.
            • Don’t neglect your library collection (and specialist librarians).
        • Hathi Trust Research Center - The Hathi Trust Research Center (HTRC) provides research access to search, collect, analyze, and visualize the full text of nearly 3 million public domain works and is intended for nonprofit and educational researchers.
        • Digital Public Library of America - The Digital Public Library of America (DPLA) maintains an open API to encourage use of data contained in the DPLA platform of close to 12 million items (and growing) which range from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science
        • D-Lab Data Resources - The D-Lab helps Berkeley faculty, staff, and graduate students move forward with world-class research in data intensive social science. UC Data, which is now part of D-Lab, provides access to a broad range of computerized social science data to faculty, staff, and students at UC Berkeley, and helps researchers understand the content and context of social science data, including geography, weighting, complex designs, and missing data.
        • GeoData@UC Berkeley - The UC Berkeley Libraries' geoportal where users can search, preview, display, map, and download geospatial data in a variety of formats including shapefiles, KML, and raster data formats. GeoData@UC Berkeley is part of the OpenGeoportal project.
        • Library data collections - access to extensive databases and electronic resources on various subjects. Check out the subject guides in your discipline for information on resources available to you.
        • Geospatial Innovation Facility Data Resources - The Geospatial Innovation Facility (GIF) at UC Berkeley's College of Natural Resources provides leadership and training across a broad array of integrated mapping technologies, including analysis and visualization of spatial data, application development, state-of-the-art geospatial and web technologies, and opportunities for researchers to learn how they can use spatial data.
        • Haas School Business Library - The Thomas J. Long Business Library is a hub for business information at UC Berkeley. They source the highest quality resources -- academic and professional, print and online -- for conducting business research. A reference librarian is available to help you find the data you need.

        Research data lifecycle: