If you want to know more about depositing your research data in the Humanities Lab Corpus, contact the corpus server manager Jens Larsson. Please look at the points below first, since they answer some common questions.
Common questions from depositors
What kind of data can I store?
The corpus currently hosts images, audio, video, and various kinds of text data such as word lists and transcriptions. For reasons of long-term storage, data should be stored in uncompressed and open-source formats. (For example, MP3 is not allowed, since it is both compressed and a proprietary format.)
See this page (http://www.mpi.nl/corpus/html/lamus/apa.html) for information on which file formats are allowed in the corpus.
Will my research data have to become publicly available?
No. As a depositor, you do not have an obligation to make your material accessible to the public, in accordance with ethical and legal concerns specific to your collection. This may result in sensitive material being permanently unavailable for public access, if their distribution could potentially cause harm or distress to the participants.
Since research data may have different levels of sensitivity, each individual item can be designated an individual access specification. The depositor may also choose to apply other conditions on the access or use of the data at their own discretion.
It is the depositor’s responsibility to ensure that they have sought the appropriate permissions from the creators of the materials in deciding on access levels.
What kind of markup or metadata will my data need to have?
The digital data stored in the corpus need to be marked with proper metadata in the IMDI format before they are uploaded. This is done with Arbil, a program developed at the Max Planck institute for Psycholinguistics, especially for the corpus server. The metadata contain information on such things as recording date, content-type (e.g. narratives, conversations, experiment data, songs etc.), and data about participants (these data can of course be anonymised).
You can read more about Arbil and its functions here (opens in new window) (http://tla.mpi.nl/tools/tla-tools/arbil/).
Are the data backed up?
Yes. Data are backed up by LUNARC (http://www.lunarc.lu.se/), The Lund Center for Scientific and Technical Computing, a part of SNIC (http://www.snic.vr.se), Swedish National Infrastructure for Computing.
What does the corpus browser look like?
You can browse the Humanities Lab corpus in any common web browser with a Java plugin. Click here to open the corpus browser in your web browser. Please refer to User Information if the corpus browser does not display correctly.