Kenneth Stevens

Clarence J. LeBel Professor Emeritus of Electrical Engineering and Computer Science at MIT

RLE was saddened to hear the news of Ken Stevens passing. Professor Stevens was an important part of the RLE community, contributing many works to his field of research and inspiring many students, collaborators, and colleagues alike. He will be greatly missed.

The following is an excerpt from a 1987 article for RLE Currents:

Professor Ken Stevens, a Toronto native, came to MIT in 1948 as a Teaching Assistant in the Electrical Engineering Department after receiving his Master's in Engineering Physics at the University of Toronto. Since joining the RLE faculty in 1958, he has been central to the development of speech communication research at the laboratory.

What was the focus of your graduate research at MIT?
At first, I worked in the MIT Acoustics Lab. The speech work in the Acoustics Lab started in 1948 with Leo Beranek, who had an Air Force contract to study problems related to the intelligibility of processed speech. He worked on that project with some students over the years, and I became involved in that work as a graduate student i n 1951.

At that time, (about 1949 or 1950), Gunnar Fant visited MIT to study the acoustics of speech product ion. I became interested in the perception side of speech and worked with Leo Beranek and J.C.R. Licklider on the perception of speech-like sounds. I wrote my doctoral thesis on the perception of sounds that had speech-like characteristics. Beranek’s work, combined with Gunnar Fant's studies on the acoustics of speech production and my research on the perception of speech-like sounds, and some additional work on the intelligibility of speech, formed the beginning of speech work a t the Acoustics Lab.

Did you have a mentor?
I would say that it was Leo Beranek, who was one of the directors of the Acoustics Lab. He taught courses in acoustics, and one of his interests was speech. When I first came to MIT, I hadn't thought about going into acoustics, but Beranek needed teaching assistants in his acoustics course. Originally, my background was engineering physics, but not so much in acoustics.

Why did you choose to teach?
I really liked doing research with the graduate students here at MIT and so the teaching fit in with that. It was a good place to do research, and so I did some teaching.

What was the nature of your research as a Guggenheim Fellow from 1962-1963?
I worked in Gunnar Fant’s laboratory at the Royal Institute of Technology in Stockholm. One of the things that I studied was speech movements with cineradiographic (x-ray) motion pictures. Recently, we haven't collaborated, but he did visit here in 1982, and we do keep in touch with each other.

In the early days of RLE's speech communication research, what was the focus of its investigations?
Some of the work in speech at the Acoustics Lab became part of RLE. The Acoustics Lab had disbanded, and I remember talking to Professor Wiesner at the time about the possibility of this small group of researchers working i n speech coming under the umbrella of RLE. He thought that it was in line with the other communications work that was already going on at RLE. There was already some speech work being conducted at RLE, and a group of people met regularly to talk about the problems of speech. One of the individuals in this group was the Director of the Modern Languages Department, William Locke. Bob Fano was also part of this group.

If you look back at early RLE reports, you might find a section of linguistics with Noam Chomsky, and Morris Halle was there too. We've always had interact ion with Morris Halle, and I guess our work could be characterized by trying to find or quantify more closely the relations between the acoustic and articulatory events in speech and the linguistic descriptions that underlie speech events. Morris Halle had a strong influence on the early directions of the speech group, although his interests centered on the phonological aspects of speech. Morris Halle has always had a strong influence on my own thinking, and Gunnar Fant.

Even in those early days, we were interested in speech synthesis. So, apart from understanding the fundamental aspects of speech production and perception (which we are still doing), the application of speech synthesis was an early activity, even when Gunnar Fant visited in the early 1950s. That developed even further with Jonathan Allen’s and Dennis Klatt’s interests in speech synthesis. Allen and Klatt, together with the RLE students, brought the speech synthesis work to a culmination with some practical results. Then, within the last five years, there has been an increasing interest in speech recognition and the application of speech to computers. So, this brings to bear much of the basic information that has accumulated in various places over the years to the practical problem of speech recognition.

As people here work on the problems of synthesis and recognition, we realize that there are still some basic aspects of speech production and perception that we still don't under­ stand. An example is the recent work of Dennis Klatt. He found that although he could get reasonable naturalness in the synthesis of male voices, it was a problem to achieve good naturalness for female voices. So, it was necessary to go back and study in greater detail the properties of sounds that are generated by females. Then, that basic information could be used to improve the synthesis of female voices. Similar things have happened in speech recognition.

Also, as the speech recognition work continues, we realize that we must rely heavily on what the linguists are able to come up with - phonological representations of speech that bring to light, in a natural way, some of the modifications that occur in speech when we speak in a conversation. What happens when you put speech into context, and other kinds of modifications that are made in the sounds when speech occurs in a natural context.

How would you characterize your research in the acoustical aspects of speech production in contrast to other RLE research groups (auditory physiology, sensory communication, and digital signal processing)? What is the nature of your interaction with these different groups?
We began to look at how sounds were generated in the vocal tract and the actual acoustic mechanisms of sound production, and in fact, we are still continuing that work. We are interested in the l ink between what hap­ pens in the sounds and what are the underlying linguistic descriptions in terms of phonemes and features. Our goal has been to join the understanding of the sound and the linguistic description. One of the big influences over the years in this area has been the people in linguistics, particularly Morris Halle and jay Keyser.

In relation to auditory physiology, we are interested in the stages in processing of the sound, leading ultimately to a linguistic description. One of the stages through which sounds must pass is the ear, obviously. The shaping of sounds in the auditory periphery could form an initial step in t he chain of processes that produce a description in categorical terms. Our concern with auditory physiology is to keep i n touch with what the investigators are doing, and, where possible, to incorporate their research into our models.

In terms of digital signal processing, the speech signal has to be processed initially by digital methods. In fact, when Alan Oppenheim started on the faculty, he was in the speech communications group. Then, he branched out into digital signal processing, and it became an important field in its own right.

How would you characterize the diverse background of investigators who are attracted to the field of acoustic phonetics?
Many linguists are not concerned with the actual details of sound. Phonologists think of speech as being a sequence of sounds, and do not go beyond this characterization. They address the different kinds of regularities and constraints on patterns of sound; how a language is described in terms of constraints on the sequences of speech sounds that are allowed; and, how these sequences are changed when you place the words into context.

But, more recently, there is a group of phonologists who are becoming interested in phonetics. They are trying to explain some of these phonological regularities in terms of constraints on either the listener or the speaker, and the constraints of the actual mechanics of how these sounds are generated.

For example, certain sounds influence others. A classic example is "did" followed by "you" becomes "didju.” Phonologists would simply say that there's a rule that says /d/ plus /y/ will change to /j/. Now, people are trying to explain these changes in terms of the mechanics of the ear and the vocal tract. So, there has been a coming together of people who work on the speech area and those individuals who work in that part of linguistics.

Your ongoing research involves acoustic variability and invariance in speech production. Can you explain the nature of this investigation?
When different people say a particular sound, or when one individual says the same sound in different words or sentences, it appears as though the sound undergoes a lot of change from one person to another, and from one context to another. We are interested in exploring what is common between all those productions of the sound. In spite of the variability, there are some attributes that remain invariant. That’s what we pick up on when we listen to each other. It doesn't matter who says the sound, it doesn't matter what word the sound appears in, we still hear the same sound.

Our approach is to categorize these sounds by certain properties or features, and to discover what those properties are. We believe there is an inventory of properties or features that is an integral part of the human speech production and processing system. Different combinations of properties are used in different languages, but there is a fixed inventory of properties.

Can you describe the research that you and Dennis Klatt have con­ ducted on vocal tract modeling?
There are two sides to vocal tract modeling. One quest ion that we are trying to answer is: by developing complex models of the vocal tract itself (including the properties of the vocal tract walls and properties of the vocal cords) -can we further understand the mechanisms of the generation of individual speech sounds?

Then, there is the broader aspect of speech modeling (you might call it speech synthesis). How can we build a device that will take the printed words as an input, and put the words into speech? Not only do you have to know how to produce the individual sounds, but you also have to put these sounds together with the right sense of timing and intonation. That’s a problem that Jonathan Allen and Dennis Klatt have worked on for the last twenty years with some success.

Does your research also include the study of speaker verification and recognition?
It automatically comes out of some of our work. If you're looking for the invariants, you're also studying variability when you examine how one speaker differs from another. Over the years, I've had one or two thesis students in this area, but I haven't delved into it very much. This whole business of speaker verification using spectrograms, or by some other method, is a difficult area, and I’m not certain these methods will lead to reliable identification of speakers.

Data collection in the field of speech processing is a tremendously labor-intensive and time-consuming task. What are some of the scientific tools that help you to collect and analyze this large body of data?
With the capability to store large amounts of data in computers, it has been possible to record a database with large numbers of talkers and lots of sentences, and then label all of the sounds in that database. As a result, it is possible to access that database, request a specific sound, and perform some statistical analysis of the properties of that particular sound. Victor Zue and his group have assembled a large database for that purpose.

SPIRE is a basic tool that enables us to look at individual speech sounds in many ways - spectra and spectrograms, for example. The SEARCH program is an extension of SPIRE. It allows us to search a large database and plot distributions of different acoustic properties for speech sounds in different phonetic contexts.

Does your research involve speech aids for the handicapped?
I have worked on speech training aids for the handicapped, especially for children who must learn to speak, but cannot hear. One approach is to provide t hem with some type of feedback of their speech patterns by abstracting and displaying information from the spectrogram so that they can see when they speak properly. In my consultancy with BBN, we didn't use spectrograms because the technology wasn't available to generate it fast the time. So, we displayed simpler patterns like the pitch and timing of speech.

What is the nature of your consultancy at Bolt, Beranek, and Newman?
My more recent work with BBN was to develop methods for measuring people's hearing at very high frequencies, far beyond what is needed for speech. It is important to be able to do this because there are some invasive things that influence hearing. High-intensity noise or certain drugs, like aspirin, can influence certain peoples hearing if large doses are taken. In some cases, it influences hearing first at the very high frequencies. Then, it gradually spreads down into the lower frequencies. So, it is important to be able to measure those effects on hearing at very high frequencies.

Are you excited about a current project that you're working on?
In the past, we have tried to examine speech sounds and their properties as they occur in simple utterances (consonants, vowels, syllables, etc.). We are now currently interested in moving toward more natural types of speech, looking at similar properties to understand this whole process of how sounds become modified within natural speech. That's the thrust now, both in recognition and synthesis.

I'm enthusiastic about "rounding off" our previous work. We've learned a lot about individual speech sounds and how they are produced and perceived. There are still many loose ends to pull together before we move on to the next stage. At this moment, I'm interested in pulling together those loose ends and putting them in a book. Then, I would like to move on to the study of speech in a conversational context.

How do you measure success in terms of testing and developing your ideas?
One measure of success is whether the applications in speech synthesis or speech recognition can actually work and be used by people. In the case of synthesis, there has been some reason­ able success. In speech recognition, perhaps not so much. Another measure of success is that you understand the concept of how this whole speech process works, and you fill in the gaps of your knowledge of the process; gradually piecing together this jig-saw puzzle. Whether or not it leads to an application is not the point, but rather, if all pieces of the puzzle fit together.

So, you could say that one measure of success is if all of these different pieces of information-whether they be from speech physiology, speech acoustics, speech perception or phonology- fit together into a coherent picture. Obviously, we are still trying to build that picture, and I believe that it’s beginning to fit together. To some extent, we are happy about that, and to some extent we are frustrated because there’s still so much to learn.

What has been the most challenging project that you've worked on?
One of the most challenging things is to try to uncover the basic invariant properties from the speech signal, in spite of all of its variability. Particularly for some sounds, it’s been a real challenge. For example, what distinguishes a /p/ from a /t/ from a /k/? It’s the kind of question that we still don’t have a good answer for.

During your professional career, what do you consider to be the major breakthroughs or milestones that have significantly contributed to or changed the field of acoustic phonetic research?
There is no question that the ability to use the computer to look at data conveniently and quickly, and to perform signal processing, is a major breakthrough. The computers give us access to larger databases, and allow us to test hypotheses with a much faster turnaround time. The disadvantage is that it is too easy to test ideas, and we don't spend enough time thinking about them before they are implemented, because they are so easy to implement.

More broadly, I would say that Gunner Fant's work on acoustics and. the insights of Roman Jacobson into the linguistic description of sounds have represented major milestones. In the past decades, researchers have been trying to build on these ideas.

What do you see as the direction of future research in acoustic phonetics, or speech processing in general?
In the next decade or so, we will have to understand more about these phonological/phonetic changes that occur when we speak in conversational speech. We are getting to the point where we have exhausted the study of individual speech sounds or simple utterances. We now want to move into more conversational speech, where the sounds that we generate and the ones that we hear in normal conversation have been modified quite a bit. In other words, the listener perceives only a fragment of the original sounds that occur rapidly. What the listener picks up on is on l y some of the sound because of redundancy, and because the listener might know something about the topic that is being discussed. It is this area that we will have to work on. Up until now, acoustics and signal processing people have been the major components in speech research. To proceed further, we have to involve people from other disciplines more than we have in the past.

What do you like most about RLE?
The thing I like most is the proximity of colleagues who are in fields related to mine, and who are among the very best in the world - people who really understand hearing, people who understand linguistics and acoustics - and to interact with those people and with such very good students. That’s what makes the place exciting.



9 entries

  1. Haruko Kawasaki Fukumori says:

    I was a postdoc at the Speech Communication Group from 1982 to 1984. After 30 years I visited the lab just last July. As Stefanie Shattuck-Hufnagel showed me around, I could picture in my mind the faces of my former colleagues, and Ken coming out of his corner office. I am so sad that Ken is no longer with us.

    During my postdoc years, I had an extraordinary opportunity to work with Ken and Samuel Jay Keyser of the Linguistics Department. We had regular meetings to discuss what would develop into Ken and Jay’s theory on redundancy/enhancement features in speech. Our discussion was serious but relaxed, and the warm rapport between these two professors was incredible. What struck me most about Ken at these meetings was his being an exceptionally good listener.

    Ken was also a great person outside the lab. He loved Baroque music and played the harpsichord. There were “music evenings” when many of us brought our own instruments and sight-read music together. When Martha Danly wanted to test her new ice cream maker, we all played croquet in her yard while the machine was churning. Ken and I paired up in this event and won. Ken was at many such gatherings, and he seemed to genuinely enjoy them.

    Ken was so young at heart that he made us feel as if he were one of us. We would see him riding his bicycle everywhere, whether to his office or to Bread and Circus. One Christmas Eve, Sarah Hawkins and I heard the King’s College Choir on the radio, thought Ken would love it, and called him; he was already listening. One year Ken received one professional award after another in a brief time period. Shortly afterward I asked him whether he had received any more lately. “Well, this past month was rather dry…” was his reply, followed by a mischievous chuckle. Ken was someone you could have such conversations with. He was so approachable, friendly, and witty.

    I feel very fortunate to have known Ken. He was a great scholar and supervisor. Above all, he was a wonderful person.

    Haruko Kawasaki Fukumori

  2. Hwa-Ping Chang says:

    I would like to express my deeply appreciation for Ken’s support and instruction during my study at MIT from 1990 to 1995. He taught me not only the knowledge of research and science but also an attitude of life and work in my career. I always remembered his smile and patience to listen to my thoughts and plans although he might not agree on me. He is a real scholar we should respect. I am glad and lucky to be his students. Without his support and encouragement, I would not be able to obtain my degree at MIT and further development in my personal career.
    Ken, we will remember you all the time. Thank you very much for your instruction and support. You would be a scholar always in my mind and a great teacher in my career.
    HP Chang

  3. Tamás Bohm says:

    It was very hard to hear the news that Ken passed away. As I see, he was the absolute hub of the speech field and we just assumed he is always there — and actually, he was, as I do not remember any occasions when he was not ready to discuss questions and help out others. I learnt a whole lot from him during my year at MIT, not just about acoustic phonetics but also about how to be a great professor and a great person.

    Unfortunately, I cannot attend the memorial tomorrow. Instead, let me briefly share with you my first encounter with Ken. On my very first morning at MIT, I had a meeting scheduled with him, as he was my advisor during my year there as a Fulbright scholar. I was very excited, and to be honest, also quite nervous about meeting him — I have heard so much of him and read quite a few of his works, but I have never met such a famous professor before. I was making sure to arrive in time. So sure that I ended up entering the Infinite Corridor more than half an hour before the appointment, therefore I decided to have coffee before going to building 36. While waiting the line in the cafeteria, I noticed an elderly man ordering his decaf and, once told that he has to wait a bit (after he had already waited quite a long line), making a gentle remark to the lady at the counter that they still make the best coffee on campus. Later, when I showed up at the meeting, I realized that this gentleman was Ken and, as I later experienced, this episode was quite representative of his personality.

  4. Osamu Fujimura says:

    I first met Ken at MIT/RLE, on my first trip outside Japan, I think in November 1958. After sitting up for three consecutive nights in propeller planes from Tokyo to Boston, I was picked up by Phil Lieberman at Logan Airport and taken directly to MIT for an interview with Morris Halle. The next day, Arthur House came to escort me from Morris’ office to meet Ken, whose office was also in the old wooden barracks of Bldg. 20. I knew all their names because I read all speech papers formally published worldwide at that time. Ken was eating a sandwich that he had brought from home. As everyone knows, he was a very kind, gentle man. Morris and Ken had jointly invited me to come from Tokyo, offering a monthly stipend, presumably based on what I had contributed to JASA as coauthor with Shiro Hattori, a noted linguist with whom I worked on an experiment on vowel nasalization. I did not have a graduate degree, simply a BS in physics from the University of Tokyo. This must have been a bold decision for Morris and Ken to make. I had also just published a one-page full paper on my novel speech synthesizer in JASA.

    Upon my arrival, Ken suggested that I work on a high-speed motion picture study of lip movement in plosive articulations, using Edgerton’s new 5-!sec exposure stroboscope. This set my course of experimental work in speech production. I sent in a paper based on this work to the Physics Department of the University of Tokyo and was awarded a doctoral degree, while also submitting the manuscript for formal publication in the Journal of Speech and Hearing Research. Ken kindly worked on my English exposition, along with Arthur and Morris; I learned a lot from this lavish experience. I remember Ken’s comment, that he could revise further, but then the paper would have no trace of my personal character.

    This also was the point in time when Noam Chomsky published his monograph, Syntactic Structures. While Noam was away in NYC, Morris was teaching that subject at MIT and I sat in on his class. Hattori in Tokyo asked me to write a review of the book for the Journal of the Linguistic Society of Japan. Reading my draft, Morris said I could switch my course of study to syntactic theory if I wanted to, but I continued in speech research. If it had not been for Ken’s leadership, I might well have abandoned speech work at that point. It was clear to me that working for Ken at MIT on speech was not something I could afford to trade for anything else, given my serious interest in speech production as a student in experimental physics. This feeling was reinforced when Gunnar Fant visited Ken at MIT from Stockholm. Based on Ken’s recommendation, Gunnar invited me to KTH in Stockholm soon after I returned to Japan.

    Ken once invited the lab members home for a dinner party. Hiroya Fujisaki was there as a Fulbright student and Ken wanted us to make sukiyaki. We tried, but Hiroya and I could not agree on how to prepare it. After a long discussion, as in many matters, we ended up making sukiyaki in two different ways in parallel, to determine empirically which was better. The result, unfortunately, was to blow the fuses in the house, and it took us a lot of time to recover from complete darkness, with people’s hunger put aside. Ken was patient — he never complained…

  5. Patti Price says:

    I am sorry that I am not at the Ken fest today, but I am thinking of him here at home, and thinking of all of those able to make it to Cambridge and those who, like me, could not make the trip.

    I am sorry to see him gone and am thinking of his friends and family today with sympathy.

    But I am also thinking with gratitude of the gifts he gave the world. Those I benefited from most were his work in the acoustics of speech and speech production, and, more specifically, the interaction of linguistics with engineering in understanding speech perception and production. Perhaps his most important gift was fostering an encouraging environment. I remember once when I was bemoaning how much work it was dealing with a student assistant who was supposed to help on the project, but who needed a lot of assistance. He said, “Well, when you measure productivity and a student is involved, you have to count both products: the progress of the research and the progress of the student.”

    At a time when there were still very few female EE PhD candidates at MIT, Ken fostered an environment that attracted an extraordinary number of women students and postdocs. I am personally very grateful for the opportunity to do a postDoc under Ken at MIT. In terms of the knowledge from his classes, and those of others, as well as the interdisciplinary openness and collaboration, I couldn’t have been luckier. I hope we can all carry a little Ken with us. I miss him, and I miss you all.

  6. Pat Keating says:

    Everyone in the UCLA Phonetics Lab joins me in expressing our sadness at Ken’s death, and our appreciation of his achievements and contributions. His book on acoustic phonetics, his work on feature theory, his mentoring of phoneticians – so many linguists have benefited in these and other ways. Ken was even a visiting “member” of our group for a week in 1987-88, and his photo is still posted in the lab among those of all the other members that year. And of course we owe a debt to Ken for training Abeer Alwan, our UCLA colleague.

    On a personal note, I was a postdoc in the Speech group in 1979-1981, and I still benefit from what I learned then. What an incredible experience. I also remember playing music at Ken’s apartment – me on viol, Sarah Hawkins on flute, and Chris Shadle on piano – or sometimes, Ken and Chris each taking one hand of the keyboard part. I have no recollection of what music we played, but it was great fun.

    Our best wishes to Sharon and family, to Stef, and to everyone else who misses him.

  7. Björn Lindblom says:

    With deep gratitude we think of the example of excellence you set for us in the sixties – first at Gunnar Fant’s department at KTH in Stockholm, later in your own lab in building 20 at MIT. You will continue to inspire us as we now try to get used to your not being there for us anymore. Fond memories of you and your family will always relieve our sorrow.
    Björn & Ann Mari Lindblom
    Stockholm, Sweden

  8. Mario Svirsky says:

    Once I was showing Ken an electrodogram of a speech utterance. Electrodograms are the cochlear implant equivalent of spectrograms. In both cases the x-axis indicates time, and color (or grey scale shading) is used to show the amount of energy within a bin. The difference is that in a spectrogram the y-axis indicates frequency and in an electrodogram the y-axis indicates the position of the intracochlear electrode that’s being stimulated. Because there’s a monotonic relationship between frequency and stimulated electrode, patterns for the same acoustic input look similar in spectrograms and electrodograms. Even though Ken had not seen the latter type of display before, he was able to indicate how several acoustic cues were encoded in the electrode stimulation patterns. This was to be expected, of course, given his knowledge of acoustic phonetics and his ability to read a spectrogram. But I was still taken aback when at the end of our little session he took another look at the graph and he said: “You didn’t make these recording in a sound proof room, did you? This was just a regular room.” He was right.

  9. Sharlene Liu says:

    My time as a graduate student in the Speech Lab connote happy times. In addition to being a kind and patient mentor, he and I shared a liking for sweets. Whenever we had a Speech Group gathering, we had ice cream, brownies, or some delicious dessert. So I look back upon my graduate school years with sweet memories.


Leave a Reply

Your email address will not be published. Required fields are marked *

3 × 4 =