Speech-to-Text and Text-to-Speech Technologies| Secondary

Concept sheet | Science and Technology

This concept sheet will help you learn more about speech-to-text and text-to-speech, the role of AI in these technologies, and how they can help you learn.

Speech-to-text and text-to-speech are technologies that let you interact with computers and other devices.

  • Speech-to-text (STT) (also known as speech recognition) converts spoken words into text.
  • Text-to-speech (TTS) converts text into an artificial voice.
A teenager uses the speech-to-text function on her cell phone. A teenager uses the text-to-speech function on his computer.
Example

Here are just a few examples of uses for these technologies.

Speech-to-text features can be found in writing and note-taking apps, search engines and virtual assistants, and videos that provide automatic caption generation. Text-to-speech features can be found in screen readers and web pages, GPS navigation applications, video games, and telephone menus.

What Role Does AI Play in These Technologies?

Most speech-to-text and text-to-speech technologies use artificial intelligence (AI). We’ll explain how it works.

How Speech-to-Text Works

  1. A very large amount of royalty-free text and human speech data is stored in data centres.
  2. This data is used to train the AI to match sounds with written text.

    For example, AI learns that the words to and two aren’t spelled the same way. The more data the AI has to train on, the more accurate it becomes.
  3. Once trained, the AI follows a set of rules that allow it to make predictions. These are called algorithms.
  4. When you use speech-to-text, it takes your voice, runs it through the algorithms, and then predicts the text to write.
A diagram of how speech-to-text works in a search bar.

How Text-to-Speech Works

  1. A very large amount of royalty-free text and human speech data is stored in data centres.
  2. This data is used to train the AI to match written text with sounds.

    For example, AI learns that when there is a comma, the artificial voice needs to pause.
  3. Once trained, the AI follows algorithms.
  4. Text-to-speech analyzes text using algorithms and then predicts the sounds to generate with the artificial voice.
A diagram of how text-to-speech works in a web page.

Quick Q&A

Which voices are used for AI training?

Which Alloprof tools offer text-to-speech?

How Can These Technologies Help You?

Speech-to-text and text-to-speech are useful for everyone, but they’re especially helpful for people who have difficulty reading or writing, for all kinds of reasons. Here are some examples.

  • Vision impairments
    Example: A person with low vision can use text-to-speech to listen to the content of a web page.
  • Hearing impairments
    Example:  A person with hearing loss can read automatically generated captions in a video.
  • Temporary or permanent motor disabilities
    Example: A person with a hand injury can write text using speech-to-text.
  • Learning a new language
    Example: Two people who do not speak the same language can communicate using translation apps that include speech-to-text and text-to-speech.
  • Learning disorders (dyslexia, dysorthography, dyspraxia, etc.)
    Example: Using text-to-speech, a person can hear words as they write, which helps them catch spelling mistakes more easily.

     
High school students in a classroom. One student has a laptop on her desk.
Source: Xavier Lorenzo, Shutterstock.com

At school, technological tools can help you learn and demonstrate your learning. They can also reduce barriers related to learning disorders and other conditions.

The most commonly used school software includes WordQ (speech-to-text and text-to-speech) and Lexibar (text-to-speech). These tools don't prevent you from having to make decisions based on your learning, but they help, among other things, to:

  • increase the number of words written
  • decode words
  • listen to text at a pace that allows full comprehension

Along with speech-to-text and text-to-speech functions, AI technologies are also used to:

  • predict the next words in a sentence
  • detect spelling errors
     

References