LINGUIST List 19.1565
|
Thu May 15 2008
FYI: New Release of the TueBa-D/Z German Treebank
Editor for this issue: Ann Sawyer
<sawyer linguistlist.org>
|
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.
|
Directory
1. Kathrin
Beck,
New Release of the TueBa-D/Z German Treebank
Message 1: New Release of the TueBa-D/Z German Treebank
|
Date: 14-May-2008
From: Kathrin Beck <kathrin.beck uni-tuebingen.de>
Subject: New Release of the TueBa-D/Z German Treebank
E-mail this message to a friend
The Division of Computational Linguistics at the Seminar fuer Sprachwissenschaft of the University of Tuebingen (Germany) is happy to announce the release a referentially and syntactically annotated German corpus: * The Tuebingen Treebank of Written German (TueBa-D/Z) - fourth release The TueBa-D/Z treebank is a manually annotated German newspaper corpus based on data taken from the daily issues of the 'die tageszeitung'. It currently comprises approximately 36 000 sentences (ca. 640 000 words). The syntactic annotation scheme of the TueBa-D/Z distinguishes four levels of syntactic constituency: the lexical level, the phrasal level, the level of topological fields, and the clausal level. In addition to constituent structure, annotated trees contain edge labels between nodes which encode grammatical functions. Words are annotated with inflectional morphology at the lexical level. The treebank is available in 3 different formats: * NEGRA export format * XML format * Penn Treebank format Currently, about 36 000 sentences of the treebank (about 1 700 articles) have been enriched with anaphoric and coreference relations referring to nominal and pronominal antecedents. Linking relations include: coreferential (two NPs refer to the same extralinguistic referent), anaphoric/cataphoric (a definite pronoun refers to a contextual antecedent) and other relations (split-antecedent, instance) as well as marking of expletive pronouns. The referential annotation is available in a unified representation of syntactic and referential information, in the NEGRA Export and XML formats. What is new in the fourth release: - about 9 000 additional sentences - about 600 more articles with referential annotation - cleaner versions of the trees published in the third release The license for TueBa-D/Z is granted free of charge for scientific use. For more information, please refer to: http://www.sfs.uni-tuebingen.de/de_tuebadz.shtml http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml With best regards, Erhard W. Hinrichs Kathrin Beck Yannick Versley Holger Wunsch Heike Zinsmeister Linguistic Field(s): Computational Linguistics Discourse Analysis Syntax Text/Corpus Linguistics Subject Language(s): German, Standard (deu)
Read more issues|LINGUIST home page|Top of issue
|
|

Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.
|
|