Index of synthesis data
|Author:||Hugo Cesar de Castro Carneiro|
|Submitter Email:||click here to access email|
My M.Sc. thesis is called ''The function of the index of synthesis of the
languages in part-of-speech tagging with weightless artificial neural
In this thesis my motivation is based on ''like vs. gostam (Portuguese for
''they like'')'' paradigm. In which ''like'' has an ambiguous part of
speech, as it can be a preposition, a conjunction, a verb or even other
part of speech, needing to have a word like ''they'' adjacent to it in
order to help readers to know that it is a ''verb'' (in this context). On
the other hand, ''gostam'' in Portuguese is always a verb, as the ''-am''
suffix informs the reader that ''gostam'' is really a verb.
So, I am testing a system I've developed in 5 languages: Mandarin Chinese,
English, Portuguese, German and Turkish (from the most isolating language
to the most synthetic). And when I get the information I need from these 5
languages, I will test the system in 4 others: Thai (more synthetic than
Mandarin Chinese and more isolating than English), Japanese (more synthetic
than English and more isolating than Portuguese), Italian (more synthetic
than Portuguese and more isolating than German) and Russian (more synthetic
than German and more isolating than Turkish).
But I have one problem: The indices of synthesis of these languages are
only estimated by me, and maybe even their order is somewhat wrong (is
Portuguese or German the most synthetic?).
I would like to know if someone can help me find an index of synthesis of
these languages? Or where can I get a text in each of these languages with
all words with each of their morphemes separated?
I am concluding my master studies this year, but I need to send a paper to
a journal before I get my M.Sc. in Computer Science degree.
Sums main page