Workshop on
The Digitization of Language Data:
The Need for Standards

Santa Barbara, California
June 21 - 24, 2001










Working Group Reports & Recommendations

Working Group Assignment Responses

Members of the linguistics profession are faced with two urgent situations: the number of languages in the world is rapidly diminishing while the number of initiatives to digitize language data is rapidly multiplying as a result of the increasing availability and sophistication of web technology.

The latter might seem to be an unalloyed good in the face of the former, but there are two ways things may go wrong without adequate collaboration among archivists, field linguists, and language engineers.


First, a common standard for the digitization of linguistic data may never be agreed upon; and the resulting variation in archiving practices and language representation would seriously inhibit data access, searching, and cross-linguistic comparison. Second, standards may be set without guidance from the people who best know the range of structural possibilities in human language--descriptive linguists who have done fieldwork on poorly described languages.

If linguistic archives are to offer the widest possible access to the data and provide it in a maximally useful form, consensus must be reached about certain aspects of archive infrastructure. This workshop is intended to further the informed development of archive infrastructure by promoting communication among field linguists, archivists, and language engineers.

Description of Workshop

Linguist List homepage | Workshop Proposal | Contact the Organizers

This workshop is funded by NSF grant #BCS-0091713.
Site last updated on June 12, 2001.