A model
for determining consonantal features
We are developing a model of the process whereby human listeners
extract word sequences from running speech. The model assumes
that words are represented in memory in terms of sequences
of segments each of which is specified as a bundle of distinctive
features. The model aims to develop principles for variation
in the realization of these word sequences that accounts for
intra- and interspeaker variation as well as language-specific variation .
Constraints
and strategies in speech production
The primary aim of this project is to contribute to our understanding
of the neural mechanisms underlying speech production through
the coordinated use of neuro-computational modeling and a
variety of experimental techniques that we have developed
in our laboratory. To this end, we have derived a set of experimental
hypotheses based on a modeling framework that accounts for
a wide range of observations on speech production, including
speech sound acquisition, articulatory variability, motor
equivalence, coarticulation, rate effects and the influence
of auditory perception on speech production. To test these
hypotheses, we carry out experiments with speakers in which
we measure articulatory movements, speech acoustics, auditory
perception, and brain activation. In these experiments, we
manipulate speech condition, phonemic context and speech sound
class and to introduce transient and sustained perturbations.
We also perform simulation experiments, in which we adapt
a biomechanical model of the vocal-tract to the morphologies
of individual speakers. The research tests model-based hypotheses
about three inter-related issues: the nature of phonemic and
syllabic goals for articulatory movements, the role of feedback
and feedforward mechanisms in the control of those movements,
and movement trajectory planning in the concatenation of phonemes
and larger units.
Effects
of hearing status on adult speech production
In the Hearing Status project, we measure changes in speech
production that occur in response to changes in hearing. These
studies are focused on three populations of adults. The first
are participants with normal hearing. Second, are those who
learned to speak, then lost hearing and finally regained some
hearing with a cochlear implant. Third and last are patients
with bilateral acoustic neuromas, a few of whom lost their
hearing during the course of the project (This work was funded
by a separate NIH grant from 1991 to 1995.) As stated above,
we hypothesize that the internal model controlling speech
production is acquired in childhood with the use of auditory
feedback. In adulthood, it is used to program speech movements
essentially "open loop" - that is, without the speaker
being influenced by the sound of his or her own voice. However
such auditory feedback does come into play to make adjustments
in the internal model that are necessitated by changes such
as growth of the vocal tract or receiving dentures. Our studies
have shown that speech intelligibility remains remarkably
intact in postlingually deafened adults, even decades after
hearing loss - consistent with open-loop control. On the other
hand, we have also shown that speech does deteriorate somewhat
with hearing loss, and acquisition of some hearing from a
cochlear implant usually leads to some normalization of speech
parameters, including measures of speech respiration, vowel
and consonant spectra and voicing onset time. These improvements
are most observable in those cochlear implant users who show
improvements over time in speech perception.
Phonetic
modification of function words
The goal of this project is to compare the contextual modifications
that have been observed to affect content words (e.g. nouns,
verbs, adjectives and some adverbs) vs. function words (classes
of words like e.g. conjunctions, pronouns, prepositions etc.,
which often signal the function of a sentence’s content
words). It is commonly observed that these two types of words
undergo different types of phonetic modification in running
speech; monosyllabic function words in particular are subject
to severe reduction processes which do not seem to occur in
monosyllabic content words, such as apparent loss of a final
single consonant (cuppa tea), an consonant (give’em
a break) or an onset consonant plus nuclear vowel (he’s
done it). Extreme modifications can affect both words at a
boundary between two monosyllabic function words (gonna, wanna);
these examples also illustrate the reduction of the nuclear
vowel to a schwa, and the fact that such pronunciation variants
have been enshrined in the orthography in a way that is not
common for content word variation. We use careful acoustic
analysis of the type and degree of modification, combined
with syntactic and prosodic labels, to distinguish among three
possible accounts of these differences: 1) the Grammatical
Categories hypothesis, which posits that a different set of
modification mechanisms operate on function words vs. content
words, 2) the Prosody hypothesis, which posits that the same
set of modification mechanisms applies to both sets of words,
but affects function words more severely because they often
occur in prosodically weak contexts, and 3) the Frequency
hypothesis, which posits that both sets of words are modified
by the same mechanisms, but function words are more strongly
modified because they occur with such high frequency. We have
begun by assembling a quasi-complete list of the function
words of American English, and are currently quantifying,
for a spontaneously-spoken speech corpus that has been prosodically
labelled, the pattern of loss or change in the acoustic landmarks
which cue distinctive feature contrasts among words. Comparing
the patterns of acoustic landmark modification in function
words vs. content words in different prosodic structures will
allow us to quantify and characterize the nature of the differences,
and to ask whether monosyllabic content words undergo some
of the same extreme phonetic variations observed in function
words when they occur in similarly weak prosodic locations,
and/or when they occur with very high frequency.
Physiological
and acoustic studies of speech
We are studying the relation between three levels in the speech
chain: the discrete phonological representation of an utterance,
the acoustic pattern that results from the utterance, and
the articulatory gestures that create the link between the
phonological and acoustic representations. One area of research
is Quantal Theory, which states that the distinctive features
or contrasts that form the basis of the phonological representation
appear to be grounded, at least in part, in the physics of
human sound production and on properties of the response of
the auditory system to these sounds. Current projects focus
on the role of the compliant vocal-tract walls in shaping
the sound pattern for obstruent consonants, the nature of
the rapid spectrum change at the release of a nasal consonant,
and the role of the subglottal system in defining some basic
place distinctions for vowels and vowel-like sounds. A second
area of research is the variability that occurs in the process
of formulating an array of articulatory gestures from the
phonological representation of an utterance. This variability
arises in part from the introduction of context-dependent
enhancement gestures and from the overlap of gestures from
adjacent segments. Based on acoustic analysis, we attempt
to formulate principles that govern or constrain enhancement
gestures and gestural overlap. A third area of research is
developmental speech. In the early years of life, a child
is exposed to the variable acoustic pattern of speech and
to her own vocalizations, and from this and other experience
with the environment must uncover the units of the phonological
representation of language. Through acoustic analysis of utterances
produced by children in the age range 3-7 years we attempt
to provide a quantitative description and interpretation of
this development based on a set of acoustic measures that
describe a child’s emerging ability to produce patterns
of gestures for vowels, consonants, and prosodic units that
are derived from a phonological representation.
Segmental
and prosodic aspects of speech planning
Models of speech production planning have to deal with many
different aspects of the sound structure of spoken utterances,
including how the speaker retrieves the sounds of the intended
words from their long-term store in the mental lexicon, organizes
the words and sounds into appropriate intonational and rhythmic
structures, and determines the articulatory movements that
are required to produce the sounds in a fluent, coordinated
and natural-sounding way. We know that these characteristics
of an utterance require a planning process, because a sequence
of words in a given sentence structure does not specify them---instead,
any such sequence can be uttered in many different ways. In
this project we study several aspects of the utterance planning
process. First, we study the serial ordering process, which
(somewhat surprisingly) is required to re-order the sounds
of words into their correct locations for each new utterance,
as suggested by sound-level serial ordering errors such as
buddy moots for muddy boots. Second, we study the generation
of intonational contours and the alignment of these contours
with the words of the utterance. Finally, we study the generation
of hand and head gestures that accompany the speech. Studies
of sound-level serial ordering errors have shown that, in
American English at least, syllables are not commonly observed
as error units, while larger elements (such as morphemes)
and smaller elements (such as syllable onsets and rimes, or
individual segments) are; intensive experimentation is currently
addressed to the role of individual articulatory gestures
in these errors. Studies of the effect of prosodic structure
on systematic phonetic variation have shown that phrase-onset
vowels and pitch accented word-onset vowels are significantly
more likely to begin with non-modal phonation than are phrase-medial
and unaccented-word vowels, and that higher-level intonational
phrases show this behavior more than lower-level intonational
phrases, despite a striking degree of variation among individual
speakers. Studies of the alignment of gestures with spoken
prosody have shown that gestures with sudden sharp stops (termed
‘hits’) are aligned with pitch accented (i.e.
intonationally prominent) syllables. Moreover, hand hits align
with accented syllables more accurately than head hits, which
(perhaps because of the greater inertia of the head) tend
to align with the syllable just after the accented syllable.
Such findings provide evidence for the role of prosody in
the speech production planning process, help to distinguish
among competing models of the human speech production planning
process, and move us closer to the goal of synthesizing natural-sounding
speech from text.
|