Research Areas

2. Combining Audio Description, Audio Introduction and Text-to-Speech (Read-aloud) Technology


Audio Description (cf. Research Area 1) transfers the visual images and sound effects of feature films and other visual media into spoken language. Together with the original soundtrack, an audio described film or programme provides an additional narration track intended primarily for blind and visually impaired consumers. The production of Audio Description is extremely costly and time-consuming. Currently, Audio Description is produced in teams that include at least one collaborator with visual impairments.
By contrast, Audio Introduction requires significantly less effort. Also, Audio Introduction can be accessed at any time independently from the visual medium. It usually provides a summary of important background information as well as formal aspects relevant for understanding the programme. To date, Audio Introduction has hardly been addressed by the academic community (Hammer et al. 2015).

Research Area 2 will provide recommendations on when to opt for Audio Introduction as a substitute for, or in combination with, Audio Description. To this aim, a catalogue of requirements for Audio Introduction will be developed in collaboration with visually impaired university students. The findings of a series of case studies will inform the development of guidelines for Audio Introduction, in particular for documentaries and visual educational content.

Furthermore, Research Area 2 will examine the use of Text-to-Speech Assistive Technology in converting pre-existing subtitles of documentaries and visual educational content into spoken word.

Finally, this Research Area will deliver best practice guidance on providing text alternatives for visual information in lecture presentation slides and other teaching support materials. Currently, images, formulas and other non-text content are usually made accessible only through the use of captions, which can be vocalised by Text-to-Speech Technology for the benefit of users with visual impairments. However, the W3C1 has long been recommending detailed descriptions to be hyperlinked to non-text content. The textual structure and the linguistic properties of those descriptions, however, are yet to be clearly defined. Preliminary data will be collected through a survey of people with visual impairments.

The main objectives of Research Area 2 are therefore:

  • To test and evaluate combined use of Audio Description, Audio Introduction and Text-to-Speech to translate primarily visual educational content.
  • To carry out comparative studies on and gain insights into a) the effectiveness of each individual method in conveying information, and b) time and costs involved in the production of adequate products.
  • To develop guidelines for best practices in teaching and learning and make them available for use in higher education settings.



1. World Wide Web Consortium, Last access: 23.11.2017.