We’ve done it!
It is my great pleasure to announce that the DataFlow project has just released the beta versions of DataStage and DataBank. There will be an official launch workshop to learn to use the software on 2 March in Oxford (register here). The software is currently available as pre-packaged virtual machines, and DataStage is also available as debian-packaged files you can install yourself.
A few notes before you dive in...
2) Back up your data. You can use live data in this system: no matter what may happen to DataStage or DataBank, if you have “root” access to the machine holding the files, you can always get your data back. But it will be quicker and easier to load a backup. Don’t forget to include your new DataStage or DataBank in your usual backup routine.
3) When it’s time to update... We expect to release the 1.0 version of the code as a standalone installation – users will need to reinstall it as a clean copy. You will have to re-create all the accounts, but the data files themselves will not be lost. DataStage will be able to pick the data files up again and reassign them correctly, so long as the administrator sets up identical account names with the fresh installation. In the long run, we would like to use the debian packaging to make this smoother, so you would "install" the whole thing but only the changed system files would be updated, leaving the rest untouched. We will keep you posted!
4) The easiest mistake to make: If you are using live data, make sure you understand the permission system. If you put data in the system, thinking you’ll work out how to restrict access later, you may accidentally expose it. By default, metadata held in DataBank is visible to Google and every other web crawler (although the underlying data is automatically under embargo, and cannot be accessed by outsiders).
You can download both virtual machines, along with installation readme files, here: https://oxfile.ox.ac.uk/oxfile/work/extBox?id=1947532ECCA1DDE10.
Debian-packaged installation files for DataStage are available here: http://apt-repo.bodleian.ox.ac.uk/dataflow/.
All software is targeted at the Ubuntu Linux 11.10 Oneiric Ocelot operating system, and the VMs work with VMWare Fusion 4.x.