The ALLEGRA corpus for German, Romansh and Italian

Description

The ALLEGRA corpus is a new language resource for Rumantsch Grischun. It takes the form of a sentence-aligned trilingual corpus consisting of press releases in the three official languages of the canton of Grisons (its name is the acronym for “ALigned press reLEases of the GRisons Administration”; allegra also means 'hello' in Romansh). The web site of the Grisons administration gives access to all press releases since 1997. Most of these releases have been written in German and translated to Rumantsch Grischun and Italian.

The ALLEGRA corpus was prepared as follows:

Reference

The ALLEGRA corpus is described in more detail in our LREC 2012 paper:

Yves Scherrer & Bruno Cartoni (2012). The Trilingual ALLEGRA Corpus: Presentation and Possible Use for Lexicon Induction. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul.
PDF
If you use the corpus, please refer to this paper.

Downloads

Yves Scherrer & Bruno Cartoni, 7.5.2012