Inherent Pitch Differences and Monontonous Speech
|Submitter Email:||click here to access email|
I am in need of advice from the linguistic community!
I am in the process of creating an experiment in which I need to use monotonous (or quasi-monotonous) natural speech stimuli.
First, let me say that this is a Statistical Learning experiment in which I am trying to see whether previous results using synthetic speech are replicable with natural speech. Given the nature of the experiment, I would like to preserve as much acoustic information in my stimuli as possible.
I have several syllable tokens that I am stringing together in order to create a ''monotonous'' stream of speech (e.g. ba, to, pi).
However, since the vowels in my syllables naturally vary in pitch depending on their height, some of my syllables still sound much more prominent than others in the speech stream (e.g. [pu] is much more prominent than [ba]). My range in pitch is [ba]=199.14 Hz; [pu]=240.31 Hz.
My mean pitch for all of my syllables is 216 Hz, with an SD of 12.941.
Here is my question:
I am struggling to maintain the integrity of the natural speech tokens as much as possible. However, I am wondering if anyone knows of a relatively sound protocol for manipulating the pitch of a range of vowels in order to create monotonous speech? I may be wrong, but I feel that changing the pitch of all vowels to some equal level (say, M=220 Hz), may not be the most methodologically sound way to go, given that vowels do naturally vary in pitch. I am wondering if, alternatively, there is a methodologically accepted way of bringing vowels closer together in terms of pitch in order to create monotonous sounding speech.
Alternatively, does anyone know of an article or other resource that discusses inherent pitch differences in monotonous speech (i.e. a resource that discusses values of F0 in high vs. low vowels in speech that is perceived by listeners as monotonous)?
Thank you all very much for your time.
N.B. The vowels in my tokens also differ along the parameters of loudness and duration, however, these differences are much smaller than the pitch differences and do not result in the addition of perceived rhythm in the speech stream.
Sums main page