Publishing Partner: Cambridge University Press CUP Extra Publisher Login

FYI: GerManC Corpus is Now Available


Author: Richard Whitt

Linguistic Field(s): Computational Linguistics
Historical Linguistics
Text/Corpus Linguistics

FYI Body: THE COMPLETE GERMANC CORPUS, A REPRESENTATIVE CORPUS OF EARLY
MODERN GERMAN FROM 1650 TO 1800, IS NOW PUBLICLY AVAILABLE AT THE
OXFORD TEXT ARCHIVE:
HTTP://WWW.OTA.OX.AC.UK/DESC/2544

FOLLOWING THE MODEL OF THE ARCHER CORPUS AND GIVEN THE AIM OF
REPRESENTATIVENESS, THE GERMANC CORPUS CONSISTS OF TEXT SAMPLES OF
ABOUT 2000 WORDS FROM EIGHT GENRES: DRAMA, NEWSPAPERS, SERMONS
AND PERSONAL LETTERS (TO REPRESENT ORALLY ORIENTED REGISTERS) AND
NARRATIVE PROSE (FICTION OR NON-FICTION), SCHOLARLY (I.E. HUMANITIES),
SCIENTIFIC AND LEGAL TEXTS (TO REPRESENT MORE PRINT-ORIENTED REGISTERS). IN
ORDER TO FACILITATE TRACING HISTORICAL DEVELOPMENTS, THE WHOLE PERIOD WAS
DIVIDED INTO FIFTY YEAR SECTIONS (IN THIS CASE 1650-1700, 1700-1750 AND
1750-1800), AND AN EQUAL NUMBER OF TEXTS FROM EACH GENRE WAS
SELECTED FOR EACH OF THESE SUB-PERIODS.

THE COMPLETE CORPUS THUS CONSISTS OF 360 SAMPLES, COMPRISING
APPROXIMATELY 800,000 WORDS. APPENDIX 1 IN THE DOWNLOAD PACKAGE
CONTAINS A LISTS OF THE FILES IN THE CORPUS WITH FULL DOCUMENTATION IN AN
EXCEL SPREADSHEET.

PROJECT TEAM: MARTIN DURRELL (PI), PAUL BENNETT (CO-INVESTIGATOR), SILKE
SCHEIBLE (RA), RICHARD J. WHITT (RA), AND ASTRID ENSSLIN (RA,
NEWSPAPER CORPUS).

Back   FYI main page