Data Management Plan for NEH Preservation & Access program

In 2014, a small team of collaborators applied to the NEH Preservation & Access program for funding to support the encoding of less commonly used and ancient languages for inclusion the in the Unicode standard. (the Unicode standard is an international standard for representing writing in computer systems.) The work is being coordinated by a researcher at Berkeley and involves collaboration (via the internet) with researchers around the world. The actual amount of data produced by this effort is very small -- a description of the writing system as standardized, so computer typography files, and some documents documenting the work of the project. The DMP addresses in prose fashion all the points required by NEH and prompted for by the DMPTool. Note, however, the plan does not strictly follow the layout of the DMPTool. Interesting even novel points about this plan are that it documents how intermediate versions of the work will be safeguarded and that the final product will be redundantly available both as a published resource and archived with appropriate identifiers in a repository. It also proposes a specific financial arrangement for the handling and storage of the materials. The Winning Data Management Plan Universal Scripts Project PI Deborah Anderson The deliverables resulting from this project will be: § final Unicode proposals and script research reports (in PDF form), both of which will be freely accessible from the Unicode Consortium website and the Merritt Repository at the California Digital Library § information on the status of all unencoded scripts (i.e., whether a proposal has been written or not, any contact name for the proposal, and other information), all to be made available on the project’s website. Because the final Unicode proposals and research reports will be hosted on the Unicode website and in the California Digital Library (CDL), the project materials will have redundancy and reliable backups. When submitted to the California Digital Library, these final proposals and research documents will have metadata that conforms to the standard defined by the Open Languages Archive Community (OLAC) associated with them. (The staging area for submission of the final documents to CDL will be the UC Berkeley Department of Linguistics server, which acts as a “prearchive.”) Preliminary and incremental versions of the proposals will be stored on the Unicode website, hosted by the Unicode Consortium. The UC Berkeley Library has committed to maintaining an archived version of the project website, currently located on the Linguistics department’s web server. The cost of permanently archiving these materials is unknown; indeed, this is a thorny question which UC Berkeley and other agencies are currently struggling with. The Merritt Repository is currently charging $390 per year per terabyte for archival storage. There is also an option to purchase five years’ worth of storage in advance at a discount ($290 per year per terabtye). While the storage required for this project’s deliverables is unknown, it is estimated to be tiny—in the dozens of megabytes. Merritt, the State of California, and the University of California have all agreed to ensure that materials archived will contained to be archived even after initial funding has been exhausted, but the mechanism for this is still under discussion. This proposal project includes a lump sum of $1500 to be paid directly to the department for the use of its archive services, staff support, and the perpetual maintenance of the materials. It will be up to posterity, of course, to make good on this contract.