Publishing Partner: Cambridge University Press CUP Extra Wiley-Blackwell Publisher Login
amazon logo
More Info

New from Oxford University Press!


Language Planning as a Sociolinguistic Experiment

By: Ernst Jahr

Provides richly detailed insight into the uniqueness of the Norwegian language development. Marks the 200th anniversary of the birth of the Norwegian nation following centuries of Danish rule

New from Cambridge University Press!


Acquiring Phonology: A Cross-Generational Case-Study

By Neil Smith

The study also highlights the constructs of current linguistic theory, arguing for distinctive features and the notion 'onset' and against some of the claims of Optimality Theory and Usage-based accounts.

New from Brill!


Language Production and Interpretation: Linguistics meets Cognition

By Henk Zeevat

The importance of Henk Zeevat's new monograph cannot be overstated. [...] I recommend it to anyone who combines interests in language, logic, and computation [...]. David Beaver, University of Texas at Austin

Summary Details

Query:   Sound-File Formats for Speech Recordings
Author:  Mario Cal-Varela
Submitter Email:  click here to access email
Linguistic LingField(s):   Phonetics

Summary:   Regarding Query:

Dear Linguists:

Last July 24 I posted a query to the list regarding the adequacy of
different file formats for computerized speech analysis. This was the
original text of the query:

I'd like to compare digital speech samples collected from different
sources, including online radio and samples digitized by myself from
analogical sources. I'm specially interested in fundamental frequency and
formant position, as well as time-related aspects of segments (specifically
VOT and vowel duration). My questions are the following:

What features of the speech signal (and in what ways) may be affected by
the format of speech samples (MP3, WAV, stream audio...)? Are the results
of spectrographic analysis of samples with different file formats and
qualities comparable? Is there any relevant bibliography available on this

First of all, thanks very much to those people who responded to my
questions and provided very useful and relevant suggestions:

James L. Fidelholtz, Benemérita Universidad Autónoma de Puebla, MÉXICO
Mark J. Jones, University of Cambridge
Dominic Watt, University of Aberdeen
Damien Hall, University of Pennsylvania
Heriberto Avelino, University of California at Berkeley

Here is a quick summary of their comments:

Although measurements of duration and time-related aspects of the signal do
not seem to be affected by file format, for formant and F0 analysis the
consensus is that, among the usual formats, only .WAV and .AIFF files are
safe bets. Compression algorithms used for MP3, MiniDisc and similar affect
the signal in many different ways and basically degrade it.

On the other hand, James Fidelholtz comments that, if properly processed,
even very noisy speech can yield to acoustic analysis. For example, he
suggests using cepstrum analysis to get the formants and F0, following
these steps:
1) get the signal digitalized (if it is analogic); or get the
digitalized signal, if available. (= S)
2) do a computerized spectrum of the signal. [Sp(S)]
3) do a cepstrum of Sp(S) (spectrum of the spectrum--this will give you
the fundamental frequency F0 for each discrete sampling point along the
spectrum over time)
4) have the computer consider *only* the points of Sp(S) which are
'near'integral multiples of F0, and plot the result. This will give you the
formants, even for extremely noisy speech.

The topic seems to recur on discussion lists, so several respondents
suggest using search terms such as MP3, ATRAC, FORMANT, etc. on Google or
on discussion list search engines, for example on PHONET
( Mark Jones sends
the following, from Linguist:

The IEEE website is also mentioned by several respondents as a possible
source of further information (Institute of Electrical and Electronics
Engineers, Inc.

For an example of a major project where digitised speech was used, Damien
Hall mentions the Atlas of North American English, which incidentally used
only Wav files (more information at:

As for bibliography on the topic, there were also a few suggestions::


- Paul Foulkes and Catherine Byrne published an article a couple of years
ago in the International Journal of Speech, Language and the Law on changes
in formant frequencies (and I think F0) brought about by the signal
transmission properties of mobile telephone lines.

- Philip Harrison's work on the comparability and relative (un)reliability
of formant frequency measurements made using different software packages
(Praat, WaveSurfer/xwaves+, Sensimetrics, SpeechStation, etc.) is possibly
also relevant here.

- Some discussion on cepstrum analysis can be found in a chapter by
Liljenkrantz in The handbook of phonetic sciences (Blackwell), ed. by
Hardcastle & Laver, and probably also in Acoustic phonetics, by Kenneth N.
Stevens in MIT Press.

- On acoustics in Spanish there are books and articles by, for example,
Antonio Quilis or Borzone de Manrique. I'd also add Eugenio Martínez Celdrán.

Once more, thanks very much to the five kind respondents for all the useful
information and to the whole Linguist community.

Best regards,
Mario Cal Varela
University of Santiago de Compostela

LL Issue: 17.2233
Date Posted: 03-Aug-2006
Original Query: Read original query


Sums main page