Building machines that are capable of speaking and understanding speech have long been a goal of computer scientists working in the field of artificial intelligence. However, the effortlessness with which human beings use speech to converse belies the sophistication of processing that it requires.
Speech comprises around 40 basic speech sounds called phonemes that arrive at the ear at the speed of about 10-15 decibels per second.
These phonemes cannot be recognised in isolation but have to be decoded using the preceding speech context, and knowledge of the accent and characteristics of the speaker.
Often there may be other sounds in the environment, including other speakers, which need to be filtered out. Finally, consider that recognising words is just the beginning of the process. Understanding what they mean and forming an appropriate response is also necessary.
What’s on offer
Despite the complexity of the challenge, huge progress has been made in recent years and speech and language technology has now truly arrived.
In last 10 years, a number of breakthroughs in the fields of acoustic modelling, language modelling and machine learning coupled with the advent of GPU-driven computing and terascale and petascale data processing have finally brought performance to the point where the technology is truly applicable. The accelerating performance of speech technology has led to an explosion in commercial interest and opportunities for experts in this field.
There is a strong and growing demand for graduates with the highly-specialised multi-disciplinary skills required in speech and language processing (SLP), both as developers of SLP applications and as researchers of next-generation SLP systems. MSc courses in this area are typically designed as specialised extensions of computer science degrees.
Applicants will generally be expected to hold a strong honours degree in a relevant and related field, such as computer science, engineering, linguistics, psychology or mathematics, though some other subjects may also be considered. A good grasp of mathematics and experience of programming is desirable.
An SLP Master’s degree will typically provide a core training in areas such as phonetics and phonology, text and language processing, speech signal processing, statistical modelling and machine learning, and adaptive intelligence.
In addition to these core scientific areas, one may expect to study modules targeted at specific technologies such as speech synthesis, speech enhancement, speech recognition or machine translation and modules targeted towards application building such as software development for mobile devices or cloud computing, plus specialised programming modules.
Despite the current growth in commercial speech technology, the speech and language abilities of current systems are still primitive by human standards. With so much work to do, there are plentiful opportunities for mathematicians and computer scientists with experience in speech and language processing. Students searching for suitable courses should look at programmes with good links into the industry to gain valuable networking opportunities and input from visiting lecturers. An MSc in SLP is also an excellent introduction to the substantial research opportunities for doctoral – level study.
Space to grow
Speech and language research is a growth area in both academia and industry. The main international research conferences in the field, the international conference on acoustics, speech, and signal processing (ICASSP) and interspeech, have been rapidly growing in size in recent years. Each year there is a greater industrial presence at these events with representation from the big players such as Google, Microsoft, Apple, Amazon and Nuance, but also from small dotcom start-ups.
Speech processing is now a ubiquitous technology used in our daily lives, for example, powering personal digital assistants like OK Google and Apple’s Siri. In the last year, speech recognition has been introduced directly into our homes: Amazon’s voice-controlled Alexa was very quickly followed by Google Home and then Apple’s HomePod. These systems that listen to speech from a distance are likely to transform the way we live our lives. As they are used more and more widely, they will provide the data that allows learning algorithms to steadily increase in performance.
(The author is a professor at the University of