DataBank is a scalable data repository designed for institutional deployment.
- provide a definitive, sustainable, referenceable location for (potentially large) research datasets
- allow researchers to store, reference, manage and discover datasets
DataBank instances will expose both human- and machine-readable metadata describing their datasets, and will assign Digital Object Identifiers (DOIs) to hosted datasets, obtained automatically using the DataCite API, to aid discovery and citation. Both VMware virtualized services may be deployed locally or on a variety of cloud infrastructures, and both will be SWORD-compliant, using the SWORD-2 protocol to wrap datasets for repository submission. DataStage will use SWORD to submit valuable datasets to any compliant institutional or subject-specific repository, while DataBank will provide a SWORD-compliant ingest service for datasets from DataStage or similar SWORD-compliant clients. Both the SWORD communication protocol and the DataStage data packaging protocol can be used with any data types.
By default, all objects are assigned a DOI and a cc-zero Open Data Waiver, and all RDF-format metadata is visible to the outside world, but other licensing/secrecy arrangements can be accommodated. Users can define an optional embargo period (making metadata visible but withholding the underlying data), add richer metadata to make their data easier to find (when searching within DataBank, or via web crawlers like Google), and users can revise datasets that have already been submitted (new DOI issued for each version, all versions kept in perpetuity). DataBank can also be run as a “dark” archive with metadata and data invisible to the outside world.
Institutions can have their own dataBank instances hosted within an external cloud (e.g. Eduserv), or can choose to deploy DataBank on local hardware, at institutional, departmental or individual research group level.
DataBank can be used together with DataStage, or separately.
DataBank is a virtualized, cloud-deployable version of the databank created by Oxford's Bodleian Libraries. We are actively pursuing a variety of sustainability options for DataBank, but at minimum, the software will be maintained and developed for use by the Bodleian Libraries, with their code made available open-source under an MIT license.