Speech Communication GroupLink: Internal
This sound wave is actually of someone speaking the phrase "Speech Communication Group" as it is perceived by someone who is receiving it.
Link: About the Group Link: News Link: People Link: Research Link: Publications Link: Seminars Link: Course Schedule Link: Contact

Research

A model for determining consonantal features
We are developing a model of the process whereby human listeners extract word sequences from running speech. The model assumes that words are represented in memory in terms of sequences of segments each of which is specified as a bundle of distinctive features. The model aims to develop principles for variation in the realization of these word sequences that accounts for intra- and interspeaker variation as well as language-specific variation    .

Constraints and strategies in speech production
The primary aim of this project is to contribute to our understanding of the neural mechanisms underlying speech production through the coordinated use of neuro-computational modeling and a variety of experimental techniques that we have developed in our laboratory. To this end, we have derived a set of experimental hypotheses based on a modeling framework that accounts for a wide range of observations on speech production, including speech sound acquisition, articulatory variability, motor equivalence, coarticulation, rate effects and the influence of auditory perception on speech production. To test these hypotheses, we carry out experiments with speakers in which we measure articulatory movements, speech acoustics, auditory perception, and brain activation. In these experiments, we manipulate speech condition, phonemic context and speech sound class and to introduce transient and sustained perturbations. We also perform simulation experiments, in which we adapt a biomechanical model of the vocal-tract to the morphologies of individual speakers. The research tests model-based hypotheses about three inter-related issues: the nature of phonemic and syllabic goals for articulatory movements, the role of feedback and feedforward mechanisms in the control of those movements, and movement trajectory planning in the concatenation of phonemes and larger units.

Effects of hearing status on adult speech production
In the Hearing Status project, we measure changes in speech production that occur in response to changes in hearing. These studies are focused on three populations of adults. The first are participants with normal hearing. Second, are those who learned to speak, then lost hearing and finally regained some hearing with a cochlear implant. Third and last are patients with bilateral acoustic neuromas, a few of whom lost their hearing during the course of the project (This work was funded by a separate NIH grant from 1991 to 1995.) As stated above, we hypothesize that the internal model controlling speech production is acquired in childhood with the use of auditory feedback. In adulthood, it is used to program speech movements essentially "open loop" - that is, without the speaker being influenced by the sound of his or her own voice. However such auditory feedback does come into play to make adjustments in the internal model that are necessitated by changes such as growth of the vocal tract or receiving dentures. Our studies have shown that speech intelligibility remains remarkably intact in postlingually deafened adults, even decades after hearing loss - consistent with open-loop control. On the other hand, we have also shown that speech does deteriorate somewhat with hearing loss, and acquisition of some hearing from a cochlear implant usually leads to some normalization of speech parameters, including measures of speech respiration, vowel and consonant spectra and voicing onset time. These improvements are most observable in those cochlear implant users who show improvements over time in speech perception.

Phonetic modification of function words
The goal of this project is to compare the contextual modifications that have been observed to affect content words (e.g. nouns, verbs, adjectives and some adverbs) vs. function words (classes of words like e.g. conjunctions, pronouns, prepositions etc., which often signal the function of a sentence’s content words). It is commonly observed that these two types of words undergo different types of phonetic modification in running speech; monosyllabic function words in particular are subject to severe reduction processes which do not seem to occur in monosyllabic content words, such as apparent loss of a final single consonant (cuppa tea), an consonant (give’em a break) or an onset consonant plus nuclear vowel (he’s done it). Extreme modifications can affect both words at a boundary between two monosyllabic function words (gonna, wanna); these examples also illustrate the reduction of the nuclear vowel to a schwa, and the fact that such pronunciation variants have been enshrined in the orthography in a way that is not common for content word variation. We use careful acoustic analysis of the type and degree of modification, combined with syntactic and prosodic labels, to distinguish among three possible accounts of these differences: 1) the Grammatical Categories hypothesis, which posits that a different set of modification mechanisms operate on function words vs. content words, 2) the Prosody hypothesis, which posits that the same set of modification mechanisms applies to both sets of words, but affects function words more severely because they often occur in prosodically weak contexts, and 3) the Frequency hypothesis, which posits that both sets of words are modified by the same mechanisms, but function words are more strongly modified because they occur with such high frequency. We have begun by assembling a quasi-complete list of the function words of American English, and are currently quantifying, for a spontaneously-spoken speech corpus that has been prosodically labelled, the pattern of loss or change in the acoustic landmarks which cue distinctive feature contrasts among words. Comparing the patterns of acoustic landmark modification in function words vs. content words in different prosodic structures will allow us to quantify and characterize the nature of the differences, and to ask whether monosyllabic content words undergo some of the same extreme phonetic variations observed in function words when they occur in similarly weak prosodic locations, and/or when they occur with very high frequency.

Physiological and acoustic studies of speech
We are studying the relation between three levels in the speech chain: the discrete phonological representation of an utterance, the acoustic pattern that results from the utterance, and the articulatory gestures that create the link between the phonological and acoustic representations. One area of research is Quantal Theory, which states that the distinctive features or contrasts that form the basis of the phonological representation appear to be grounded, at least in part, in the physics of human sound production and on properties of the response of the auditory system to these sounds. Current projects focus on the role of the compliant vocal-tract walls in shaping the sound pattern for obstruent consonants, the nature of the rapid spectrum change at the release of a nasal consonant, and the role of the subglottal system in defining some basic place distinctions for vowels and vowel-like sounds. A second area of research is the variability that occurs in the process of formulating an array of articulatory gestures from the phonological representation of an utterance. This variability arises in part from the introduction of context-dependent enhancement gestures and from the overlap of gestures from adjacent segments. Based on acoustic analysis, we attempt to formulate principles that govern or constrain enhancement gestures and gestural overlap. A third area of research is developmental speech. In the early years of life, a child is exposed to the variable acoustic pattern of speech and to her own vocalizations, and from this and other experience with the environment must uncover the units of the phonological representation of language. Through acoustic analysis of utterances produced by children in the age range 3-7 years we attempt to provide a quantitative description and interpretation of this development based on a set of acoustic measures that describe a child’s emerging ability to produce patterns of gestures for vowels, consonants, and prosodic units that are derived from a phonological representation.

Segmental and prosodic aspects of speech planning
Models of speech production planning have to deal with many different aspects of the sound structure of spoken utterances, including how the speaker retrieves the sounds of the intended words from their long-term store in the mental lexicon, organizes the words and sounds into appropriate intonational and rhythmic structures, and determines the articulatory movements that are required to produce the sounds in a fluent, coordinated and natural-sounding way. We know that these characteristics of an utterance require a planning process, because a sequence of words in a given sentence structure does not specify them---instead, any such sequence can be uttered in many different ways. In this project we study several aspects of the utterance planning process. First, we study the serial ordering process, which (somewhat surprisingly) is required to re-order the sounds of words into their correct locations for each new utterance, as suggested by sound-level serial ordering errors such as buddy moots for muddy boots. Second, we study the generation of intonational contours and the alignment of these contours with the words of the utterance. Finally, we study the generation of hand and head gestures that accompany the speech. Studies of sound-level serial ordering errors have shown that, in American English at least, syllables are not commonly observed as error units, while larger elements (such as morphemes) and smaller elements (such as syllable onsets and rimes, or individual segments) are; intensive experimentation is currently addressed to the role of individual articulatory gestures in these errors. Studies of the effect of prosodic structure on systematic phonetic variation have shown that phrase-onset vowels and pitch accented word-onset vowels are significantly more likely to begin with non-modal phonation than are phrase-medial and unaccented-word vowels, and that higher-level intonational phrases show this behavior more than lower-level intonational phrases, despite a striking degree of variation among individual speakers. Studies of the alignment of gestures with spoken prosody have shown that gestures with sudden sharp stops (termed ‘hits’) are aligned with pitch accented (i.e. intonationally prominent) syllables. Moreover, hand hits align with accented syllables more accurately than head hits, which (perhaps because of the greater inertia of the head) tend to align with the syllable just after the accented syllable. Such findings provide evidence for the role of prosody in the speech production planning process, help to distinguish among competing models of the human speech production planning process, and move us closer to the goal of synthesizing natural-sounding speech from text.

Home / About the Group / News / People / Research / Publications / Seminars / Course Schedule / Contact © Massachusetts Institute of Technology
Link: RLE Link: HSTLink: EECSLink: MIT