![]() |
|
|||||||||||||||||
![]() |
||||||||||||||||||
|
|
Speech RecognitionConsiderations for Use in Language Training
by Norman Harris Speech Recognition technology has finally come of age - at least for language training purposes for young adults and adults. Computer programs that truly "understand" natural speech, the Holy Grail of artificial intelligence researchers, may be a decade or more away, and today's SR programs may be merely pattern-matching devices, still incapable of parsing real language, of achieving anything like "understanding," but, nonetheless, they can now provide language students with realistic, highly effective, and motivating speech practice. In this article I shall try to provide a brief overview of the state of development of continuous speech, speaker-independent SR programs, and some of the ways that they have been adapted for use in language training. Historically, Speech Recognition programs required mainframe computers, were very expensive, or in cheaper PC versions performed inadequately for serious language learning. Most early SR programs were limited to discreet speech (single words or short phrases carefully enunciated) and were usually speaker-dependent, requiring each user to train the program by reading a long list of specially selected words or phrases that familiarized the program with that speaker. These technologies were also limited by the dependence on a particular regional English accent, usually a fairly neutral American accent. These early programs allowed users to control their computers with simple oral commands, but for language training purposes, they were not ideal. The essence of real language is not in discreet single words -- language students need to practice complete phrases and sentences in realistic contexts. Moreover, programs which were trained to accept a speaker's individual pronunciation quirks were not ideally suited to helping students move toward more standard pronunciation. These technologies also failed if the speakers voice changed due to common colds, laryngitis and other throat ailments, rendering them useless until the speaker recovered or retrained the speech engine.
As implied in their name, these speaker-independent programs do not adjust over time to the pronunciation profile of a particular speaker, so they can be counted on to always provide a "neutral" evaluation of what a student says, according to the criteria built into their code. Such parameters can be flexible enough to accept a broad range of accents, including British, American, and other non-English-accented pronunciations. Moreover, they can be configured to be relatively lenient, to the less-than-perfect pronunciation, slower speech, and even pauses and false starts of typical lower level language students. Such lenience is vital to the practical success of the programs with these students, whose willingness to practice speaking might otherwise be undermined. For higher level students, on the other hand, the programs can be "tuned" to require a higher level of fluency, such as fewer and shorter pauses, to provide such students the challenge they need to progress. There are trade-offs, of course. Such flexibility with regard to pronunciation paradigms means that today's speaker-independent SR programs are not ideal for direct pronunciation practice. Nonetheless, exercises which focus on fluency and word order, and with native speaker models which are heard immediately after a student's utterance had been successfully recognized, have been shown to indirectly result in much improved pronunciation. Another trade off is that the greater flexibility and leniency which allows these programs to "recognize" sentences spoken by students with a wide variety of accents, also limits the accuracy of the programs, especially for similar sounding words and phrases. Some errors may be accepted as correct. Native speakers testing the "understanding" of programs "tuned" to the needs of nonnative speakers may be bothered by this, but most teachers, after careful consideration of the different needs and psychologies of native speakers and learners, will accept the trade off. Students do not expect to be understood every time. If they are required occasionally to repeat a sentence which the program has not recognized or which the program has misinterpreted, there may be some small frustration, but language students are much more likely to take this in their stride than would native speakers. On the other hand, if the program does "understand" such students, however imperfect their pronunciation, they typically experience a huge sense of satisfaction, a feel good factor native speakers simply cannot enjoy to anywhere near the same degree. The worst thing for a student is a program that is too demanding of perfection -- such programs will quickly lead to student frustration or the kind of embarrassed, hesitant unwillingness to speak English typical of many classrooms
The new interface hugely increases interactivity, student motivation, and focus. Students are also far more likely to repeat exercises, substantially increasing their effectiveness. This increased level of practice helps students achieve real mastery of the material they are studying. Perhaps most important of all is that the safe, private, environment helps students develop confidence and encourages them to do something most of us find very difficult to get them to do in class: speak. Norman Harris is DynEd's Manager for Europe, the Middle East, and Africa. |
|||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||