Veronica With Four Eyes

Creating Audio Description for Music Videos With YouDescribe

When Taylor Swift released the music video for the song “ME!”, one of my friends immediately sent me a message asking if I had seen the video and all of the interesting visual effects. Since I have low vision, I often have to watch music videos at least a dozen times to catch all of the details, unless audio description is available. After talking to my brother about the possibility of creating audio description for music videos with YouDescribe, we created an audio description track for “ME!” and have collaborated on this guide to creating audio description for music videos and best practices for combining music and audio description.

What is audio description?

Audio description, sometimes referred to as descriptive audio or described video, is an additional narrator track that provides visual information for people who otherwise would not be able to see it. Audio description may be provided live by a narrator or pre-recorded ahead of time using either a professional narrator or synthesized voice. Audio description may be provided during natural pauses in dialogue, or the describer may choose to pause the video to deliver extended description.

For streaming or online content, open audio description is used, meaning that the audio description automatically plays and does not require a special device.


What is YouDescribe?

YouDescribe is a website and iOS app that gives blind and visually impaired users the ability to request audio description for videos, as well as search for videos that have been described and shared on the platform. Volunteers create audio description tracks using the YouDescribe web application, describing videos of their choice or videos that have been added to the wish list/request list.

Users do not need to have any specialty equipment or software to create descriptions on YouDescribe, though I strongly recommend using a pair of headphones and a microphone for best results. YouDescribe descriptions can only be recorded on the website and not in the mobile application, and requires a free Google account.


Considerations for describing a music video

Here are some considerations for describing a music video that my brother and I would take into account when recording:

Not describing musicians

Instead of providing a visual description for Taylor Swift, my brother and I just used her name. Since she has a Wikipedia entry and there are other visual descriptions that share what she looks like, as well as her co-star Brendon Urie, we did not describe what they looked like, only what clothes they are wearing.

Prioritizing description over music

Audiences generally watch a music video for the visuals, not the music itself. Viewers have the option of listening to the music track separately, and are more interested in what is going on in the video than what it sounds like.

Audio description does not describe the audio of the video- captions or transcripts do this. If the audio was in a foreign language and included English captions as part of the video, these were read out loud.

Inline vs Extended descriptions

There are two types of descriptions:

Inline descriptions play concurrently with the video audio and do not require any pauses. Inline description can be used where dialogue or music is not important, during natural pauses in dialogue, or during transition scenes.

For extended descriptions, the video is paused while the audio description plays, and then continues when the description is finished. I recommend using extended description for videos where dialogue or music are important, or for longer descriptions. Both types of description can be used in a video.

For the “ME!” music video, we exclusively used inline descriptions because our descriptions were short and could be read out loud without breaking up the flow of the video.

Related links

How we wrote the description

When describing a music video, my brother and I would watch the video at least five or six times to ensure that we were taking note of all relevant music video details. Here is how this process looked:

  • First, watch the music video all the way through, without taking any notes. My brother used this opportunity to quickly check the video for any strobe/flashing lights to ensure I could watch it with him safely.
  • Then, watch the video again without pauses, taking notes of themes or other recurring elements. For example, we noticed there were lots of pastel colors, costume changes, and different locations
  • For the next watch through, the video would be paused every minute or so. Then, my brother would write a summary of what was in each section. This wasn’t always exactly a minute, but we would split the video into different segments based on setting changes
  • Finally, to write out the most precise details, we watched the video when pausing every ten seconds, to document any exciting “easter eggs” or visual elements
  • Before finalizing the script notes, the video would be watched in full one more time, while reading the notes out loud to ensure all relevant visual information is included

From when we first started watching the video to finishing the audio recording, it took about an hour and a half to create audio description for music videos like “ME!”

Related links

What to include in music video audio descriptions

On-screen text

Examples of on-screen text that should be included in audio descriptions include:

  • Video title cards or text that is not spoken/sung
  • English/translated subtitles for characters speaking in another language. For example, Taylor and Brendon speak in French at the beginning of the video, so the English subtitles were read out loud
  • If the lyrics are included as subtitles by default, or if they appear on-screen during the song, these are not read out loud or otherwise noted.

Video setting

Taylor Swift’s video includes multiple different scene changes, including a mansion, a colorful street, a rainbow vortex, and others. For one of the scenes, we used the description “Taylor steps out into the street” to indicate that the location was changing.

Costumes and outfits

Costumes and outfits play a significant role in this video, since Taylor uses a very specific color palette for a lot of her videos. Instead of delivering in-depth descriptions of costumes, I would use more general descriptions, with the assumption that users could find more descriptive information about the costumes after watching the video. As an example, the script includes the line “Taylor steps out into the street wearing a yellow business suit, surrounded by dancers wearing pink and blue suits.”

Even though we didn’t include shade names such as light pink or sky blue, this information is helpful to include when creating descriptions- there’s no need to describe what the color blue looks like in general, but shade names are helpful.

Movement and dancing

A lot of movement and action in music videos has limited environmental sound or other audio cues- for example, Brendon Urie hands Taylor Swift a kitten without saying anything, and the viewer would not know he handed her a kitten if they were just listening to the video. Include descriptions of on-screen movement that relates to the plot of the music video, or that people would be likely to talk about.

Camera movements or animations

When I was watching another music video, the view of the camera regularly switched from a zoomed-in version of just the singer to a wider shot of the singer with various dancers. Making note of these transitions is helpful, such as “camera zooms out over the city”

Related links

More resources for creating audio description for music videos

How my brother and I worked together to create audio description for a music video, using the song "ME!" by Taylor Swift and Brendon Urie and the free Youdescribe tool