* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 18.2657

Wed Sep 12 2007

Diss: Comp Ling/Morphology: Xanthos: 'Apprentissage automatique de ...'

Editor for this issue: Hunter Lockwood <hunterlinguistlist.org>

To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
        1.    Aris Xanthos, Apprentissage automatique de la morphologie: le cas des structures racine-schème

Message 1: Apprentissage automatique de la morphologie: le cas des structures racine-schème
Date: 12-Sep-2007
From: Aris Xanthos <Aris.Xanthosunil.ch>
Subject: Apprentissage automatique de la morphologie: le cas des structures racine-schème
E-mail this message to a friend

Institution: University of Lausanne
Program: Department of Linguistics
Dissertation Status: Completed
Degree Date: 2007

Author: Aris Xanthos

Dissertation Title: Apprentissage automatique de la morphologie: le cas des structures racine-schème

Linguistic Field(s): Computational Linguistics

Subject Language(s): Arabic, Standard (arb)

Dissertation Director:
François Bavaud
John A. Goldsmith
Remi J. Jolivet

Dissertation Abstract:

This dissertation is concerned with the development of algorithmic methods
for the unsupervised learning of natural language morphology, using a
symbolically transcribed wordlist. It focuses on the case of languages
approaching the introflectional type, such as Arabic or Hebrew. The
morphology of such languages is traditionally described in terms of
discontinuous units: consonantal roots and vocalic patterns. Inferring this
kind of structure is a challenging task for current unsupervised learning
systems, which generally operate with continuous units.

In this study, the problem of learning root-and-pattern morphology is
divided into a phonological and a morphological subproblem. The
phonological component of the analysis seeks to partition the symbols of a
corpus (phonemes, letters) into two subsets that correspond well with the
phonetic definition of consonants and vowels; building around this result,
the morphological component attempts to establish the list of roots and
patterns in the corpus, and to infer the rules that govern their
combinations. We assess the extent to which this can be done on the basis
of two hypotheses: (i) the distinction between consonants and vowels can be
learned by observing their tendency to alternate in speech; (ii) roots and
patterns can be identified as sequences of the previously discovered
consonants and vowels respectively.

The proposed algorithm uses a purely distributional method for partitioning
symbols. Then it applies analogical principles to identify a preliminary
set of reliable roots and patterns, and gradually enlarge it. This
extension process is guided by an evaluation procedure based on the minimum
description length principle, in line with the approach to morphological
learning embodied in Linguistica (Goldsmith, 2001). The algorithm is
implemented as a computer program named Arabica; it is evaluated with
regard to its ability to account for the system of plural formation in a
corpus of Arabic nouns.

This thesis shows that complex linguistic structures can be discovered
without recourse to a rich set of a priori hypotheses about the phenomena
under consideration. It illustrates the possible synergy between learning
mechanisms operating at distinct levels of linguistic description, and
attempts to determine where and why such a cooperation fails. It concludes
that the tension between the universality of the consonant-vowel
distinction and the specificity of root-and-pattern structure is crucial
for understanding the advantages and weaknesses of this approach.

Read more issues|LINGUIST home page|Top of issue

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.