Awards & Honors
    Theory & Research


    Server Status

Speech Recognition

Considerations for Use in Language Training

 

Available with DynEd's most popular courses, Speech Recognition enhances the effectiveness of the lessons. This highly motivating feature helps learners improve their oral fluency.

DynEd uses sophisticated Speech Recognition technologies: Apple Computer's Plain Talk and ScanSoft's ASR Speech Technology.

by Norman Harris
DynEd's Manager for Europe, the Middle East, and Africa

Speech Recognition technology has finally come of age - at least for language training purposes for young adults and adults. Computer programs that truly "understand" natural speech, the Holy Grail of artificial intelligence researchers, may be a decade or more away, and today's SR programs may be merely pattern-matching devices, still incapable of parsing real language, of achieving anything like "understanding," but, nonetheless, they can now provide language students with realistic, highly effective, and motivating speech practice. In this article I shall try to provide a brief overview of the state of development of continuous speech, speaker-independent SR programs, and some of the ways that they have been adapted for use in language training.

Historically, Speech Recognition programs required mainframe computers, were very expensive, or in cheaper PC versions performed inadequately for serious language learning. Most early SR programs were limited to discreet speech (single words or short phrases carefully enunciated) and were usually speaker-dependent, requiring each user to train the program by reading a long list of specially selected words or phrases that familiarized the program with that speaker. These technologies were also limited by the dependence on a particular regional English accent, usually a fairly neutral American accent. These early programs allowed users to control their computers with simple oral commands, but for language training purposes, they were not ideal. The essence of real language is not in discreet single words -- language students need to practice complete phrases and sentences in realistic contexts. Moreover, programs which were trained to accept a speaker's individual pronunciation quirks were not ideally suited to helping students move toward more standard pronunciation. These technologies also failed if the speakers voice changed due to common colds, laryngitis and other throat ailments, rendering them useless until the speaker recovered or retrained the speech engine.

The solution to these problems came with the development of continuous-speech SR engines...

The solution to these problems came with the development of continuous-speech SR engines that were speaker- independent. These programs are able to deal with complete sentences spoken at a natural pace, not just isolated words. They require no special hardware, are small enough and fast enough to work on normal PCs, and importantly for the typical language training environment, do not require a training period -- they allow a variety of individual language learners working on the same computer to practice speaking English from the first moment they talk into the microphone.

As implied in their name, these speaker-independent programs do not adjust over time to the pronunciation profile of a particular speaker, so they can be counted on to always provide a "neutral" evaluation of what a student says, according to the criteria built into their code. Such parameters can be flexible enough to accept a broad range of accents, including British, American, and other non-English-accented pronunciations. Moreover, they can be configured to be relatively lenient, to the less-than-perfect pronunciation, slower speech, and even pauses and false starts of typical lower level language students. Such lenience is vital to the practical success of the programs with these students, whose willingness to practice speaking might otherwise be undermined. For higher level students, on the other hand, the programs can be "tuned" to require a higher level of fluency, such as fewer and shorter pauses, to provide such students the challenge they need to progress.

There are trade-offs, of course. Such flexibility with regard to pronunciation paradigms means that today's speaker-independent SR programs are not ideal for direct pronunciation practice. Nonetheless, exercises which focus on fluency and word order, and with native speaker models which are heard immediately after a student's utterance had been successfully recognized, have been shown to indirectly result in much improved pronunciation. Another trade off is that the greater flexibility and leniency which allows these programs to "recognize" sentences spoken by students with a wide variety of accents, also limits the accuracy of the programs, especially for similar sounding words and phrases. Some errors may be accepted as correct. Native speakers testing the "understanding" of programs "tuned" to the needs of nonnative speakers may be bothered by this, but most teachers, after careful consideration of the different needs and psychologies of native speakers and learners, will accept the trade off. Students do not expect to be understood every time. If they are required occasionally to repeat a sentence which the program has not recognized or which the program has misinterpreted, there may be some small frustration, but language students are much more likely to take this in their stride than would native speakers. On the other hand, if the program does "understand" such students, however imperfect their pronunciation, they typically experience a huge sense of satisfaction, a feel good factor native speakers simply cannot enjoy to anywhere near the same degree. The worst thing for a student is a program that is too demanding of perfection -- such programs will quickly lead to student frustration or the kind of embarrassed, hesitant unwillingness to speak English typical of many classrooms

Students are also far more likely to repeat exercises, substantially increasing their effectiveness.

Even if we accept that accuracy needs to be responsive to proficiency in order to encourage students to speak, we must, as teachers, be concerned that errors do not become reinforced. Higher levels of accuracy can also be expected if the task required is appropriate to the language level of the students, and if there is a language focus other than just speaking. Good SR programs use lesson types for which today's SR programs are optimized, i.e., lessons which focus on things like phrase discrimination, word order, key words, and/or syntax. Speech-enhanced exercises include answering and asking questions, fill-in the blank and sentence transformation grammar exercises, fluency reading, branching dialogs and role plays which can even be integrated with video sequences. Though many of these exercise types have existed in multimedia programs in the past, their transformation from mouse click to SR programs change them radically.

The new interface hugely increases interactivity, student motivation, and focus. Students are also far more likely to repeat exercises, substantially increasing their effectiveness. This increased level of practice helps students achieve real mastery of the material they are studying. Perhaps most important of all is that the safe, private, environment helps students develop confidence and encourages them to do something most of us find very difficult to get them to do in class: speak.•

Norman Harris is DynEd's Manager for Europe, the Middle East, and Africa.
He has extensive experience in language teaching in different parts of the world, especially the use of multimedia for instructional purposes.


 


Copyright © 2008 DynEd International, Inc. Privacy Policy | webmaster@dyned.com
  Contact Us:
Japan Greater
China
ASEAN South
Asia
USA, Canada &
Caribbean
Latin
America
Europe Africa