* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 22.625

Mon Feb 07 2011

Qs: Corpora to Compare Spoken and Written Language

Editor for this issue: Danielle St. Jean <daniellelinguistlist.org>

We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate.

In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query.

To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.cfm.
        1.     Alain Jambin , Corpora to Compare Spoken and Written Language

Message 1: Corpora to Compare Spoken and Written Language
Date: 01-Feb-2011
From: Alain Jambin <alain.jambinsfr.fr>
Subject: Corpora to Compare Spoken and Written Language
E-mail this message to a friend

I would be very thankful for any type of information (research: articles
or books) connected with the idea of comparing speech formats in
English with written ones, along possibly with emerging new forms
standing halfway between spoken and written language (e-mails..). I
would very thankful too for any mention of websites providing a wide
range of English language corpora, especially spoken language.

As a (retired) school inspector (modern language adviser) for the
French Educational System, I have observed that most of the teaching
(of foreign languages) carried out in high schools is based on the
assumption that the spoken language is somehow an adulterated form
of the written one, which accounts for the fact that the English used by
our students (for those who have some command of it) is usually
bookish. Few teachers are in fact aware of the way speech has a code
of its own or rather of the way it works. Of course, a great many
linguists have proved to the contrary over the last decades. But I have
the feeling that beyond the specificity of the features of the two codes
as such, some specific formats/schemas run parallel too as far as
language acts are concerned. In other words for example, the way you
confess (orally) to a friend is slightly different from its written
counterpart in a diary. Again the way you make an oral presentation
(academic though it may be) is based on language patterns that are
akin but different from the way you write an article, etc.

At the same time, I have the feeling that it is possible to establish some
rules enabling students to bridge the gap between specific written
schemas and their oral counterparts and the other way round. My
purpose is then to take advantage of a variety of corpora to analyze
the links between some speech schemas with what I deem to be
corresponding written schemas, unless some welcome research is
already available in this respect. But I have been unable to trace any
work based on systematic comparisons so far.

It would of course be worthwhile for language teachers to get a glimpse
of the ways they can get students to migrate from one type of skill to
the other and the other way round as a means to help them reconsider
their practice as well as a means to provide a new incentive for
students to the study of languages when it tends to erode after a few

If I manage to collect the relevant data, I will then try to write a
methodology book aimed at teachers including (a) a comparison
between the two types of code, (b) the analysis of the oral and written
features of the language used for similar or close formats, (3) a series
of suggestions to convert a genuine written (oral) format into an oral
(written) one. As you understand my work is not academic as such
(though it relies on the research previously carried out), but rather

Some examples of corpora I am looking for comparison purposes:
1.) Written: Tourist guide, set of rules or regulations, directions for use,
diaries, jokes, tales, advertisements, classified advertisements, news
items, biographies, entries in encyclopedias, newspaper column, letters
to the editor, leading articles, film or book reviews, letters, narratives,
serials, etc.
2.) 'Intermediate': e-mails, SMS, chatting over the Internet
3.) Oral: guided tour, travel account, news items, jokes, tales,
presentation, radio or TV advertisement, recorded testimony
(confession), debate, film or book reviews, political speeches,
speeches for the defense/the prosecution, phone conversation or
usual conversation (argument, exchange of information, report of
events, explanation, persuasion...).

Of course, ideally the corresponding varieties could address the same
event or theme.

Thank you very much,
Alain Jambin

Linguistic Field(s): Discourse Analysis
                            Text/Corpus Linguistics

Subject Language(s): English (eng)

Read more issues|LINGUIST home page|Top of issue

Page Updated: 07-Feb-2011

Supported in part by the National Science Foundation       About LINGUIST    |   Contact Us       ILIT Logo
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.