
This presentation explored the Web Content Accessibility Guidelines (WCAG) related to video and audio accessibility, including requirements for captions, transcripts, sign language interpretation, and audio descriptions.
Amber broke down essential compliance requirements, shared best practices for testing media alternatives, and provided practical tips for implementation.
Thanks to Our Sponsor
Kinsta provides managed hosting services for WordPress. It is powering 120,000 businesses worldwide and based on the user reviews it is the highest-rated managed WordPress host on G2. It has everything you need, including an unbeatable combination of speed, security, and expert support.
Powered by Google Cloud and the fastest C3D and C2 servers combined with CDN and Edge Caching. Your sites are secured with Cloudflare Enterprise, protecting you from DDoS attacks. All plans include free migrations, and the first month of the starter plans is completely free, so you can try the service risk-free.
Watch the Recording
Read the Transcript
>> PAOLA: Welcome, everyone to WordPress Accessibility Meetup, Practical Advice for Meeting Caption, Transcript, and Sign Language Requirements with Amber Hinds. We have a few announcements today. We have our Facebook group where you can connect between meetups. You can ask anything and everything related to WordPress and/or accessibility, and you can answer questions as well. You can find our Facebook group at facebook.com/groups/WordPress.accessibility.
You can also find upcoming events and past recordings in one place at equalizedigital.com/meetup. Yes, our meetup is being recorded and you’re going to be able to see the recording in about two weeks with a corrected transcript and a recap as well. You can join our email list to get news and event announcements at equalizedigital.com/focus-state. We send an email every week with recaps of accessibility news and anything happening in that sphere. That’s where we also announce our meetups.
You can tune into our podcast at accessibilitycraft.com. That’s where we post the audio version of our meetups and other conversations that we talk about related to WordPress and accessibility. We are seeking additional sponsors for the meetup. The WordPress Foundation does not provide for captions or ASL interpretation. If this is something that you’re interested in supporting and sponsoring, you can let us know me or Amber during the meetup, or you can also email us at meetup@equalizedigital.com. Also, if you have any suggestions for the meetup or need any additional accommodations, you can use that email and form of communication to let us know.
We are the organizers of meetup equalizedigital. We are a mission-driven organization and a corporate member of the IAAP, focused on WordPress accessibility. We have a WordPress plugin called Accessibility Checker that scans for accessibility problems and provides reports on the post-edit screen to make building accessible websites easier. You can find it at equalizedigital.com. We would like to thank our live captioning sponsor for today. It’s Kinsta. Kinsta provides managed hosting services for WordPress. It is powering 120,000 businesses worldwide. Based on the user reviews, it is the highest-rated managed WordPress host on G2.
It has everything you need, including an unbeatable combination of speed, security, and expert support. Powered by Google Cloud and fastest C3D and C2 servers combined with CDN and Edge Caching. Your sites are secured with Cloudflare Enterprise, protecting you from DDoS attacks. All plans include free migrations. The first month of the starter plan is completely free. You can try the service risk-free. You can find them at Kinsta.com. We always encourage our attendees to please just post on X, Facebook or wherever on social you can find Kinsta. Thank them for sponsoring WordPress Accessibility Meetup.
We also want to tease a few upcoming events that we’re going to be having. We have From Concept to Code, communicating accessibility in the Design Handoff with Danielle Zarcaro. That is going to be on Monday, March 17th. There is a misspelling in there in the slides. It’s on Monday, March 17th at 7:00 PM Central. Then we also are going to be having Accessible Firebrand. Why can’t I use my brand color there, and then where? That is with Mark Alvis and Deneb Pulshiper. That’s going to be at this same time slot on Thursday, April 3rd at 10:00 AM Central.
You can also join Amber on her Accessibility Remediation YouTube Livestreams. They happen on every Thursdays except when we have meetup, of course. They happen at 11:00 AM Central. Just want to welcome today’s speaker, Amber Hinds. Amber is a founder and CEO at Equalize Digital, website accessibility consulting, accessible website development firm. She’s working to work a more equitable web for people of all abilities. She is also the organizer of WordPress Accessibility Meetup and WordPress Accessibility Day. Welcome, Amber. I am going to stop sharing so that you can share your screen. While we do that, I would like to remind everyone that if you have any questions during the presentation, please pop them in the Q&A box. It makes it so much easier for us to track them and easier for us to answer them in the Q&A portion of the talk. Without further ado, Amber, take it away.
>> AMBER HINDS: Thanks so much, Paola. Thank you, everyone, for tuning in. Obviously, we made a little switch of topics, but we are still going to talk a lot about captions and a few other things. I spent some time this morning and I was thinking, of course, I can’t replace Picha and exactly what she was going to talk about. What I wanted to do with this session was talk about caption, transcription, and sign language requirements for accessibility. I’m hoping to give you all some practical advice for meeting those requirements.
We’re going to start by looking at what the Web Content Accessibility Guideline requirements are for captions, transcriptions, sign language, and we’ll talk briefly about audio description. Then we’re going to talk about how to test media alternatives and captions to ensure that they meet the requirements that are necessary. Then I’ll share some practical tips for implementation. We’ll look at a few examples, and then we’ll do Q&A at the end.
What I want to do with the Web Content Accessibility Guidelines is I’m going to walk you through the ones that are relevant to video and audio. We’re going to see a lot of this if you look at– If you look at Web Content Accessibility Guidelines, there are things where they talk about synchronized media. That means things where there’s a visual and an audio that plays together. Basically, that’s a fancy word for video. There’s also different instances where they differentiate between pre-recorded and live.
Different guidelines apply to something you have recorded and then, for example, uploaded to your YouTube or your Vimeo or your Wistia account to either share socially or post directly on your website, embed in your website. That would all be considered pre-recorded. Live is, of course, like this. We’re in a Zoom call. In a webinar, that’s live, or if you did live streaming to social media, then that would also count as live. It’s literally something that’s happening right now.
A video or an audio file can potentially have different requirements at different points in time. Let’s talk through the different requirements that apply to video or audio. The first one is 1.2.1 audio-only and video-only pre-recorded. This criterion, for example, would not apply right now because we’re in a live video. It is a Level A requirement for any pre-recorded audio or video. Basically, it says that for all of them, you have to have a media alternative. It says except when the audio or video is already a media alternative for text, and it’s clearly labeled as such, for pre-recorded audio, you would have to have an alternative for time-based media.
Time-based media is the words are happening connected to a specific timing. For example, a podcast is an example of pre-recorded audio-only. An alternative for time-based media is provided that presents equivalent information for the pre-recorded audio-only content, which is a lot of words. It’s a fancy way of saying if you have an audio player on your website that plays words, spoken audio, let’s say, you have to have a transcript for it. You need to have something that is an alternative to that.
If you have pre-recorded video, so a video embedded on your website, then you have to, it says either an alternative for that or an audio track is provided that presents equivalent information for pre-recorded video-only content. A thing that is important to note about video content is that there is frequently information presented visually in videos that is very important, not just auditorily. There might be words on a screen. For example, right now, I’m sharing my slides and there are words on the slides.
If I wasn’t reading all of these, if I said, “Okay, and now we’re all going to read this for a second,” and then I’ll talk and I paused, that would be a problem because if somebody couldn’t see those words, then they would miss out. There might be actions that happen. If you think about a movie where people are fighting and the sounds that you hear during a fight probably wouldn’t actually tell you who was winning in that fight. That’s the concept. You want to make sure that the important information is present in a format separate from the video.
Really saying this success criteria is really about making sure that any information that relies on only one sense to be able to interpret, so hearing or sight, is translated into a different medium that does not require that sense, hearing or sight. The next one is 1.2.2, captions pre-recorded. This is also a level A. Captions are provided for pre-recorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such. The last success criteria also talked about this media alternative for text. What does that mean?
A media alternative for text is, for example, if you installed one of those plugins that takes your blog posts and uses AI to read out the content of the blog post, there is going to be an audio player at the top of your blog post. That is considered a media alternative for text because it doesn’t present anything new. All it does is provide a way to listen to the content that is on the page, word for word, verbatim. It’s not changing it at all. You note this is clearly labeled as such, so you would need to label this in a way that it makes it very clear that this player isn’t playing something different. It’s a way to listen to the text on this page.
Then, in that case, anything that says except for that media alternative doesn’t apply to it. Stepping back to this success criteria, basically, it says that you have to have captions for any audio content in synchronized media. Remember, that says a video, unless it is just a video that’s you reading out the text of the blog post. Otherwise, you need to always provide captions.
I’m only going to talk about one of the audio description success criteria, which is level A, because primarily I want to focus today on how we can make media content more accessible for people who are deaf or hard of hearing. There is a group of people who are deafblind, and that is really important to keep in mind. If you were to have captions on a video, they might not be able to read them. They might need a transcript or something else because they can’t see the captions on the video.
This success criterion that is a level A is 1.2.3, audio description or media alternative pre-recorded. It says, an alternative for time-based media or audio description of the pre-recorded video content is provided for synchronized media, except when the media is a media alternative for text and is clearly labeled as such. This is basically saying that you need to have any alternative for that video that is written in text, or it can be an audio description file. Someone can listen to an alternate version of that video that includes the extra audio to describe what’s happening in the video visually so that someone can hear and interpret that. That’s what this is about.
Captions live, that would, of course, apply to us. This 1.2.4, captions live. This is a AA, and it says captions have to be provided for all live audio content in synchronized media. Again, any spoken audio that is happening live in a video or in a live audio. There was a, I’m trying to think, Clubhouse, I think, was a social media platform for a while that got very trendy for maybe three or four months. I don’t know if anyone remembers that. Give me a thumbs up if you remember that platform, where you just showed up and you spoke. One of the biggest complaints about it was that there were no captions in that platform. It was a social media platform that was completely closed off to someone who was deaf or hard of hearing.
1.2.5 audio description prerecorded. We’re back to this is something that has been placed. This is a AA level requirement. It says audio description is provided for all prerecorded video content in synchronized media. If you’re not familiar with Web Content Accessibility Guidelines, there’s a ton of information on our website. You can also find additional talks in the library from Meetup presentations. The short of it is most people are going to need to meet anything that is A and AA. This is going to require that audio description is provided for all prerecorded videos, so that if there’s important visual content, it can be communicated to someone who is not able to see that video.
Now we reach the AAA criteria. The first one there is 1.2.6 sign language for prerecorded videos. Sign language interpretation is provided for all prerecorded audio content in synchronized media. This could be sign language for video. It could also be sign language for an audio-only file. 1.2.8, media alternative prerecorded level AAA. This is, an alternative for time-based media is provided for all prerecorded synchronized media and all prerecorded video-only media. This goes a little bit back to what we saw on that single A criteria, but extends it a little bit further.
Basically, you need to have some text alternative or non-video, non-audio alternative for that content. Then 1.2.9, audio-only live, another level AAA requirement. Says, an alternative for time-based media that presents equivalent information for live audio-only content is provided. This is the same thing, only it goes a little bit further because it’s going to live. This would go back to that Clubhouse thing that we were talking about, or if you were doing a Twitter spaces maybe as another example.
How do we meet all of these? I’ve just gone through a bunch of success criterions and spoken about them. A lot of that language, as I talked about I’m trying to explain it as I go, but it’s dense. What I have on the screen right here is a table that shows those success criterions that we have just gone through, what level they are, and ways that you can meet them or pass them. I’m going to read through this real quick just for everyone who cannot see it. The first one, that 1.2.1, audio-only and video-only pre-recorded level A, you can meet this by providing a audio description file or an extended transcript. Extended transcript is my terminology. This is what I call it.
There, a normal transcript that you would see for a video or an audio file might have timestamps and speaker labels and the literal words that were spoken in the video and maybe some important sounds. What a normal transcript would have, what it wouldn’t describe anything important that was shown visually. It just transcribes the actual spoken audio. An extended transcript is what I call this, where you intersperse key written descriptions into the transcript with the spoken audio to extend it and provide all the same information that would be communicated in a video.
Obviously, that doesn’t apply to an audio-only file, but it would apply to a video. 1.2.1, captions pre-recorded, that was level A. You can meet this by providing either closed captions or open captions, and we’ll talk about what those are in just a minute. 1.2.3, audio description or media alternative pre-recorded, is a level A. You can meet this by providing audio description. You can meet this by providing an extended transcript. In the instance that it is what WCAG calls a “talking head video.” A person who is talking with a static background, I’m a talking head video.
If it were just me and there were no slides, that would be like a talking head background because the background is irrelevant to the person and there’s no timing that matters between what’s being shown. With that, you can use static text that does not have any timing, that is a written description of the media and what happened, what was said, and what was shown, and that is also considered sufficient for meeting 1.2.3.
1.2.6, sign language pre-recorded, AAA. You would either want to have sign language in the video stream, you’ve seen that if you’ve come to other meetups here where we have sign language in a little video, or if you watch one of our recordings, there’s just a little square of a person there, and that is the sign language in the video stream. Or you can have a separate synchronized video that can be moved around and viewed but timed with the video that is being played that doesn’t have the sign language and they all match up, and I’ll show you an example of one of those in just a little bit.
Then 1.2.8, media alternative pre-recorded, AAA, that can be met with an extended transcript. 1.2.9, audio-only live, AAA, can be met with a transcript. How do you test media alternatives and captions? If you’re trying to figure out if something passes the– I don’t know, the sniff test for being a valid media alternative for videos, this is what you would do. One, you’d view the video-only content, so view the video while referring to the alternatives, so basically, you compare them to one another. Watch the video, look at the alternative that has been provided, and here’s what you do. Check that the information in the transcript includes the same information that is in the video-only presentation. Nothing’s missing.
If the video includes multiple people or characters, check that the transcript identifies which person or character is associated with each action described. You have to make it really clear that who said what, who did what, somebody throws a ball, it’s not just a ball was thrown. It’s Amber threw the ball, that kind of thing. Then you need to check that at least one of the following is true. A, either the transcript itself can be programmatically determined from the text alternative for the video-only content, or B, the transcript is referred to from the programmatically determined text alternative for the video-only content.
That’s a mouthful. Programmatically determined is a fancy way of saying we have lots of HTML semantics that makes it really clear what this is, and it’s marked up appropriately. Basically, we want to know that we have the text alternative, and can we get just a transcript out of that text alternative? Is it clear and are they connected? Are they labeled appropriately so we know what they are? Then they need to be referred– they refer to one another.
Then five, if the alternate versions are on a separate page, check for the availability of links to allow the user to get to other versions. This is an interesting thing to note. Web Content Accessibility Guidelines does not require that alternative versions, for example, an audio-described video as an alternative to a non-audio-described video, they don’t have to be on the same page, or a video, like a text alternative to a video, or even a transcript to a video does not have to be on the same page to pass WCAG.
As long as you have a link that is immediately adjacent or on the media that helps people figure out where to go to find that alternative, that would still be sufficient. Then, ideally, if you’re on the alternative, you’d want to link back to where they are. Now, there are a lot of reasons why we wouldn’t just want to have these on different pages. There’s a huge benefit to keeping your transcript on the page where you have a video from a search engine optimization and a positive user experience standpoint. There are instances where, for whatever reason, it doesn’t make sense or it might be difficult to have an audio-described video and a non-described video on the same page. You might link them to one another so people can go back and forth.
There are two kinds of captions that I mentioned are sufficient for meeting the caption requirements. There are open captions and closed captions. Open captions are captions that are embedded into the video file and cannot be turned off. These are usually created by content creators in their platforms, although sometimes, and we’ll talk about this in a minute, but AI might generate them. A lot of times, they’re created in whatever video editing software they are. You may have seen them because they’re always very obvious because you can’t turn them off. A lot of times, they’re decorative, meaning they might have interesting fonts. They might not be white sans-serif text on a black or dark blue background. They might instead be white text on a light orange background, whatever the content creator thought looked appealing to them.
When you are looking at open captions, there are a few extra things that you have to test. When you do it, of course, you want to turn off the closed captioning if that media file happens to have closed captioning because all you’re doing is reviewing the open captions, so the visible text on the screen. Then you check that captions of all the dialogue and important sounds are visible and accurate. There needs to be written text on the video describing all dialogue and any important sounds.
If there are multiple speakers, you need to confirm that speakers are labeled, meaning that when the caption starts to talk, it says, Amber, and then whatever I said. Or that captions are positioned in a way that the viewer can understand who is speaking. This is normal, and if you’ve ever watched television with captions turned on, you might notice that sometimes the captions move around and they position captions near people on the screen. That is another way to label captions or make it clear who is speaking, and either one is sufficient.
Then you want to confirm that captions pass a minimum of AA color contrast over the background. A thing that I don’t have written on the slide, but that I think is important to mention is there is highlighting, which I think is one of the things that Picha was going to talk about when she was going to give her presentation. There can be actions where captions get highlighted, individual words, almost like karaoke, if you think about that, as someone is speaking. This can be incredibly distracting or disorienting for people and is not recommended, and I have a resource that I’m going to link you to in just a little bit that has a whole bunch more information about that.
Closed captions are actually the preferred way of captioning videos. I think if you’re creating shorts for any social media platform, whether it’s YouTube or Facebook or Instagram or wherever that is, it might make sense to put open captions on them because it’s easier, it’s faster, and sometimes those can be– they can attract more attention to make people want to watch your video, which, depending upon why you’re creating the video, that might be the objective. You want to have as many eyes on it as possible. Potentially having open captions is a good solution there.
On your website, though, you really want to have closed captions because that gives people the ability to turn them off and on. It puts them in a standard format for the video player, so they’re all going to look consistent, and that is really the most user-friendly way to do captions on a website. How do you test them? You turn on the closed caption feature in the media player, obviously. A point to note on this is ensure that this can be done using the keyboard alone and does not require a mouse. You should be able to tab to a button that looks like CC. It’ll have two letter Cs, and you should be able to hit both the spacebar and the return key because it should be coded as a button in order to open and display the captions.
There might be a submenu where you select the language for the captions. That is totally fine, but you should be able to do all of this with your keyboard. It should not require a mouse. Then you need to watch the video with the closed captions turned on and check that captions of all dialogue and important sounds are visible and accurate. If there are multiple speakers, confirm that speakers are labeled, and in this case, it’s not going to be positioning usually. It’s going to be just like name, colon, at the beginning.
I did see Gary’s comment that speaker recognition should be considered as well, which is true. If you are able to test using a device like Dragon Speech and say turn on captions for video, then it should also function in that way. That is definitely more advanced accessibility testing, though. I wouldn’t expect most typical content creators to be able to do that. Is AI the answer? I know these days automated captions can be pretty easily achieved on any platform.
The answer that I have seen both in looking at accessibility lawsuits and some laws around the world is that frequently the answer is no. Is AI the answer? In some scenarios, maybe yes, but frequently, no. AI is not going to solve your video accessibility requirements. Why is this? Auto-captions frequently make mistakes. I tried to find it, but I couldn’t find it. My friend Meryl Evans, and I’m pretty sure she shared it in the talk. I just didn’t dig through that last talk that I’m going to link you to in just a minute here. She has this hilarious screen capture. She is deaf, so she always watches TV with captions on, of a famous actress sitting on a couch next to a male interviewer being interviewed.
Apparently, that actress was trying to say she likes salmon, the fish, but instead it said, I really like semen in the captions, which is hilarious, but also quite awkward, if you can imagine, for that actress or personality, whoever it was. Auto-captions frequently make mistakes, and that is something that you should really be aware of. In particular, I think when you’re thinking about content that you are putting on your website, you wouldn’t put typos on the text of your about page.
I don’t know why you would allow typos in your product marketing video, or your video that explains your services, or your customer testimonials. Why would you want that? You want your company name spelled right. You want your name spelled right. You want punctuation to be correct. Really, it’s not going to give you your best foot forward. Then, as I mentioned, case law in America, at least in the United States, say that auto-captions are not sufficient. If you want to learn more about this, you can look up the captioning lawsuits against Harvard and MIT.
They were, for years, putting up hundreds or thousands of videos on YouTube for lectures from classes as part of a way to freely disseminate information, which is a fabulous initiative. Many of them in the beginning had no captions. Then YouTube improved, and they all had auto-captions. The auto-captions made mistakes. What happened was they got sued, and they were told either you can put corrected captions on all of your media or you can take them offline. They chose to take it offline. That is a thing to be aware of, is that most laws around the world and most court cases don’t think that auto-captions are sufficient on prerecorded media.
Also in live. We have a live captioner here today for this webinar from Texas Closed Captioning, which is the vendor that we use, because, obviously, we don’t have the skills to type out all of the captions. We’ve done that because, in these webinars, people talk about very technical things. The accuracy of Zoom, even though their live captions can be very good, but the accuracy of Zoom to be able to correctly transcribe when someone’s talking about WordPress or coding or sharing some jargon is very low in comparison to what we get when we have a human being here who is listening and able to type corrected captions.
Then the other thing that is worth thinking about with those open captions, and I mentioned this briefly about color contrast and text, but readability can really become a problem with that. Some open captions that are added to video by AI, so you might be like, put my video in this video editing and just have it generate all the open captions and make them all pretty for me, or like the highlighting, that can cause flashing, which can be problematic for people with photosensitivity. They can have color contrast issues for people with low vision.
Really, I think you really want to use your best judgment as a human being to generate, and if you do generate things with AI, review them, make sure that it still meets the criteria. It’s not going to cause motion sickness or seizures from flashing around. It’s not going to move so fast that it’s not readable or try to cram so many words onto a screen that it’s not readable. There are very specific guidelines about what works and what doesn’t work for captions.
On that note, I am not the number one expert on captioning. I have a couple of resources linked in the slides, and we should have put a link to these slides in the chat, but if not, we can put another one up. I highly recommend on our website, Meryl Evans, who is an accessibility professional, who I mentioned previously, who is also deaf, gave a fabulous presentation called How to Create Accessible Caption Videos for WordPress Sites and Beyond. I highly recommend that presentation. It’s a great place to learn more.
There is also a resource here called Guidelines and Best Practices for Captioning Educational Video. This is from the Described and Captioned Media Program, which is an independent nonprofit organization that provides education on that. They have a lot where they’ll even get into, how do you caption music or certain sounds or all of that information? If you need that, I would look at their guidelines for a standard or a best practice.
Then there’s also a presentation that the FCC and the US put together in their video programming accessibility forum on online closed captioning. That is also a useful presentation to reference as well. Let’s talk about practical tips for implementation, because it’s very possible that you may be thinking, “Okay, I know I need captions, but this sounds like a lot of work or really expensive. I don’t know how to make it work or audio description or sign language or what do I do? How do I do that?”
What should you include in your media? At a bare minimum, all pre-recorded videos, so any video you are uploading to any place on the Internet needs accurate captions. I have highlighted ‘accurate’ because of all the reasons we’ve already talked about, it’s very important that they be accurate. You should provide transcripts for audio and video, not the extended with extra written, just the transcript. Usually whatever platform you’re using to generate your captions, you can very easily just export a transcript from, and that will do that for you, and then you can provide that.
You should provide written description of important visuals, or– and I don’t have this written up there, or do some self-audio description. You may have noticed during this presentation at times I’ve said, “On this slide is.” There are ways when you create videos that you can auto describe as you’re going, or if you’re creating just like a 30-second marketing video, just know anytime that there’s text on the screen, there should also be a voice reading that out, or maybe describing something if it’s really important.
Another one that comes up a lot that I see in WordPress land is we have tutorial videos where we’re saying, “Okay, now I click here, and then I go here, and then I click on this.” That’s not very helpful. Could any of you picture what I was talking about? No. Instead, you just want to think about while you’re creating those tutorial videos going and saying, “Now I go to the WordPress admin, I click on dashboard, and then I click Updates.” Just really being very specific in the wording that you’re using when you’re creating videos, you can self-describe as you’re creating that video so that you don’t need to have separate audio description.
That said, if there is something very important, consider providing just like a written blurb that explains the video as well. In that same scenario, I’m a plugin developer, and I’m creating a YouTube video that shows how to do this thing in my plugin, I should also have written step-by-step instructions down below it on the documentation page on my website. Just putting the video– I’m going to tell you right now, I don’t know if anyone else agrees with me, but the moment all I see is a video for how to do something in a tutorial for software, and I’m instantly like, “No, thumbs down.” I want to be able to just read it and find what I want. I don’t want to watch a video. That’s going to support a lot of different people, like people with different learning styles, not just someone who is blind.
Then based upon your budget and audience, what you might also consider is, do you have audio description, which could be a separate track that can be played at the same time. It could be a separate version of the video that includes more audio description. There are different ways to do that, but potentially you might want to have audio description if you know that you have a lot of people who are blind or have very low vision, who might be using Shroom Readers or benefit from that. Extended transcripts that include audio description is another nice to have that might make sense for your audience, and also sign language interpretation.
Particularly in government, sign language interpretation becomes very important. Sign language is not the same as captions. Many people who are part of the Deaf with a capital D community, so people who are culturally deaf is the way that they frequently speak about it, they may have learned sign language as their first language. Having sign language interpretation it’s almost like if I were a native Spanish speaker and English were my second language. It’s the same thing. Having that sign language interpretation can be very beneficial to people in those communities.
What does this cost if you want to do this? These are numbers that we pay to give you an idea or that our customers have paid. Live captioning, $2.50+ per minute. Pre-recorded captioning/transcription, $1.25+ per minute. Sign language interpretation, $150 to $300+ per hour. A thing that’s interesting to be aware of with sign language interpretation is that if it is a quite long live presentation, not a pre-recorded thing where someone can take breaks as they’re creating the sign language track, you will need two interpreters.
For the WordPress Accessibility Day conference that I run, we have two interpreters for every one-hour session, and they interpret for about 15 minutes, and then they switch so that they can have a break because it’s physically exhausting to do interpretation. It does become a lot more expensive to do that. Audio description can be $15 to $30 per minute, maybe more depending upon what vendor you’re hiring. Now, there’s another price on here, which is DIY transcription software, $12+ per month. Depending upon your budget and your volume of videos, it can be much less expensive to do some of this work yourself. You might just want to buy a software program.
Now, you can use– YouTube has a tool that goes through and generates with AI a transcript and then you can go through and edit it. I don’t think theirs is the best, it’s free. There are definitely tools I think that are more accurate or have easier like correction ability. Like, “Oh, they misspelled my company name here,” and I could, in the tool we use, Descript, I could say, “Okay, fix it everywhere,” and it will just fix it everywhere. I also think Descript actually learns. It’s like it only did it once and then it never did it wrong again. That is something to maybe think about in your budget as well.
Should you DIY? Here’s some questions that I would ask on that front. First of all, is time or money more important to your organization? Because we all know time is also money. Do you have internal interest in creating transcripts or captions? Maybe there’s somebody who thinks it’s interesting or fun for your prerecorded videos to create those. How fast of a turnaround do you need? I’ll talk in a minute more about how we’ve broken all this down, but there are scenarios where it just wouldn’t make sense because I am running fast and I am making a video today that I want to put on YouTube today.
I don’t have a captioning company on command that will be able to deliver a video transcript and caption file to me 30 minutes after I create a video and say, “Okay, it’s ready to be captioned now.” As a result, I have to do that work because I need it right now. That is something to keep in mind. Then on hiring a vendor, are there scenarios where auto-captions or auto-craptions, that’s what I put there with an exclamation point, because that’s what Meryl likes to call them, are sufficient? There might be cases where because of budgetary or other reasons you know you can’t meet that AAA for live videos, for example. You are going to use auto-captions on a live video.
Then maybe after the fact, when you reshare it, it’s going to have corrected captions, but while you’re live, it’ll be auto-captions, for example. You’ll need to talk about that. Like what risk are you available to? Who are you open to? What is the audience for that live video or that live podcast feed? Would auto-captions in that situation be sufficient? Are they certified? This is something that’s really important that I’ve noticed a lot.
There are vendors out there that, or freelancers, you can find on, I don’t know, Fiverr or Upwork that say they can do captioning, but you might get very poor quality back because all they’re really doing is using one of those AI tools and barely correcting it. They don’t know things about what the length should be or how to caption certain sounds. I would say if you are going to hire a vendor, I would find a vendor that uses certified captioners for sure for live, but also potentially for prerecorded because you’re going to get better quality and it’s going to require less QA on your part.
How do they deliver files or captions? This is just a good question to ask them. Particularly in live, there’s different ways to handle this. Right now, our live captioner is typing directly in Zoom. This is my preferred method of doing that. The last couple of years for our WordPress Accessibility Day conference, our live captioners were typing in a separate software that used an API feed to connect with Zoom. We are going to revisit that this year because the biggest piece of feedback that we got from people was that the captions were delayed. Adding that separate feed and API connection instead of just directly typing in Zoom, I think added to that delay.
Thinking about some of those things, what are their methodologies? Are they going to be able to work with you? There are some vendors that are like, “Oh yes, we can just upload them all to a Google Drive folder. You send me the prerecorded videos, we can set up a whole process and do that.” Whatever works for you, but just making sure how they’re going to deliver the files or the live captions is going to work best for what your needs are.
Then I would always ask how much accuracy they guarantee in a live setting. I have been in video meetings or webinars that were captioned by a human that were incredibly inaccurate. I mean, really inaccurate. We’ve been really happy with Texas Closed Captions because of their accuracy. That’s part of why we’ve stuck with them a lot, because I’ve seen some others, and there were a few that we experimented with in the beginning. I would definitely ask, and this goes a little bit along with how experienced are your captioners? Are they certified? How much accuracy do you guarantee? It’s never going to be perfect. There are going to be mistakes, particularly in a live setting.
I do think trying to find a vendor that works for you, that can be more accurate is important. What we do. Our company is committed to accessibility. We try to follow best practices, but we also are a very small entity. We have a small team. We’re not a huge budget organization. This is what we have done and what has worked for us. We have live captions at the meetup for the most part. You heard a commercial for one of these at the beginning. These are covered by sponsors. There are occasionally months when we don’t have a sponsor, and we said we’re committing to having live captions at the meetup, so we will pay for that cost.
We have figured out a way to try and generally offset it by sponsors. That’s how we’ve been able to make that affordable for us. That might be something that works for your organization, depending upon what your organization is. We pay a vendor to provide corrected captions and transcripts of meetup. After this meetup is over, Paola will edit the recording, get it down to just the part we want. She sends that video off to a vendor. They send it back. We do a seven-day turnaround. They can do shorter, but it costs more money if you want to do shorter.
They send it back in seven days– Well, not the video, but they send over a– I can’t remember if it’s a VTT or an SRT file, but they send us over the caption file that we can upload to YouTube, and a transcript file that we can copy and paste into the website. These are long. They’re 90-minute webinars. They’re sometimes highly technical. For us, it just didn’t make sense to do that internally. What I have generally found is that creating captions myself might take about two times the duration of the actual video file, to use AI to generate it, go through, correct it, make sure all the speakers are labeled, and all of that stuff.
We have decided that it makes the most sense to pay a vendor to do that work for us for these meetups. Occasionally, we have American Sign Language, which I’ve abbreviated to ASL, if it is paid for by a sponsor. We just don’t have the budget to have sign language. I would love to meet that AAA requirement all the time, but unfortunately, it’s not something that we’re able to do. I think that’s the reality sometimes of small organizations. You have to do what makes sense and what you can afford to do. We do try very hard to get sponsors and we occasionally have sign language and we occasionally– but more frequently, do not.
All of our podcasts and other marketing content, so any other pre-recorded video that we put out there or upload to our YouTube channel and our podcast transcripts, these all have captions or transcripts that we create in-house. Either myself or Paola or my partner, Chris, will use a tool called Descript to generate the captions. Now, what I like about Descript, I mentioned before, it has things where it’ll correct things throughout. It also, you can use it to edit. With the podcast, we record the podcast, we download that video, we put the video in Descript, and it will do things like clean up filler words.
Every time I’ve said “um” in this video, that will be in the final video. In our podcast, we remove those, and you can remove them by having Descript remove filler words and it removes it from the transcript, but it also removes it from the video. Very handy, I like it. I don’t feel like it’s that expensive on a monthly basis and it’s super worth it. For us, this was where we had to draw the line on, yes, it would save us more time if we paid a vendor to transcribe these. Since we found a tool that would also help us a little bit with the video editing, and we knew that we weren’t going to also pay a vendor to edit our videos because we’re not that fancy yet, it just made more sense for us to do that transcription and captioning ourself.
That’s pretty much the way any of our little marketing videos work as well. I have started, as Paola mentioned, doing some YouTube and other social media live-streaming. Those are all currently using automated captions because we don’t have a budget to do live captions, but we’re doing automated captions with the goal that anytime any one of those videos were to get added to our website before we would do that or promoted heavily beyond the live day of, we would go back and we would create corrected captions and upload those first.
That is how we have approached this as an organization. I figured it might be helpful to talk about that as an example. I wanted to share– I have up on the screen the actual budget this year for the WP Accessibility Day Conference on the accessibility services side, because I thought it would be interesting. What does this look like for a bigger event, not a incidental, I have a 20-minute video I want to put up kind of thing?
For live captioning services for that event, and that event, so everyone– Actually, let me take a step back. That event, so everyone has the same basic understanding or framework is, it is a one-day, 24-hour straight event. It runs for a full 24 hours with no stop. It has a session every hour. There’s 24 sessions. It’s only one track. There’s not like eight different sessions happening at the same time. It’s 24 hours straight of video. We decided a couple of years ago when we made it a nonprofit organization that we were going to strive for AAA accessibility with everything that we do, not just with the event, but we are trying to do that with the website and other things so that we can be an example, like a best practice example of accessibility.
We do live captions, and we do sign language interpretation during the event. The live captioning services for this year, our budget for that is $4,336.50. That’s for 24 hours of live captioning. Plus like, I don’t know, a 20-minute buffer before the event starts. Sign language interpreters are $6,756.75. We have a miscellaneous accessibility vendor fees of $400 because there’s occasionally– we need to have additional meetings with them before the event or practice sessions just to make sure we’re all on the same page. There’s some other stuff.
We put in a line item there in case we want to ask. They’re always like, “No, we don’t need these.” As us, as the event organizers, we’re like, “We need them. I know you don’t need them. You’re going to show up and do great, but I need them to feel confident about how everything is working.” It’s one of those things that we’re like, “Okay, we’re just going to budget for this because it’ll make us feel more comfortable if we have them.” Then we do pay that same vendor to do post-event video transcription. That is taking the edited videos and creating those corrected captions and transcripts. That is $1,638.
Now I have two other items on here, which is neat. The dollar amount that we have budgeted for these, I’m going to explain it. Something we haven’t talked about really today is language accessibility. One thing that you might consider doing, depending upon your audience, is having captions for videos or transcripts for videos or both that have been translated from whatever the native language of the video is into the language of your audience.
If you know that you serve a bilingual audience and you only did a webinar in English, but you really want it in Spanish, you can’t necessarily change what the speaker said, but you can provide translated captions on that video. Someone could turn it on, listen to the English, or maybe even mute the English, I don’t know, but listen to the English and read the Spanish version, which might help them to better understand the English if they are English as a second language.
We spend a lot of time getting quotes for this from companies that do this professionally. I’m going to tell you it was something like for a one-hour video, like $800 or something. It was really expensive. What we figured out was that we decided to budget $80 per video for both Spanish and French. We have a goal this year of translating 48 videos into Spanish and French because we think that this is a great way to help get more accessibility knowledge out to people who speak those languages natively.
This number, I left it here because I think it’s interesting and it’s a thought and something you might think about, but is definitely only accurate if you have volunteers. This is what I’m going to call it. Basically, we have people who volunteer and are willing to translate for us at this price. You might be able to find people on Upwork or Fiverr who would do it for you at this price. I’d be a little cautious. If you don’t speak that language, you can’t check what they’re doing. You just have to trust them or get reviews or something, right? If you go to any of the really big media companies, they’re not going to give you a price like this. It’s going to be way higher.
I want to show just a few examples with transcripts. This is one of the things that people always ask us and it comes up a lot when we are doing remediations on websites. People will say, “Where do I put the transcript? Do I really have to have transcripts?” and we always say, “Yes, you should always have a transcript.” “How do I do this on my website in a way that makes it look nice?”
Let me pull out of my slides for a second here and I’m just going to show you a few different examples. On our Accessibility Craft podcast website, it’s not fancy. This is super basic. I built it while I was at a WordCamp a couple years ago.
We just put the transcript. We don’t even have written summaries for these episodes. It’s a goal that we have, but because of time and capacity, we just haven’t done it yet. Basically, we have a media player and we have a transcript just in plain text on the page.
These ones, we are not providing timestamps. Timestamps can be very helpful, but you don’t necessarily always have to because we’re providing this as a– because we don’t have a written description of this. Somebody might just want to read this and I don’t know if they actually care.
They might. We’ve debated that. Do we want to do timestamps? Then you could be like, “Oh, I read this. I actually want to listen to it instead.” You could go and move the player forward to that point. We haven’t yet, but this is an example. Just put it right on the page.
Another example on the Equalize website, if you were to look at any of our past recordings. This is the Law of Accessible Websites and Applications with Richard Hunt. We have the YouTube video here. We have a heading, so people can’t miss it, that says, “Read the Transcript,” nice and big.
We have an accordion that shows and hides the transcript. This is keyboard functional, I can tab to it. I can use my keyboard to open and close it. We have all kinds of other texts on this page. We don’t necessarily want a literal transcript from the video on the page. We still want to provide it, so we’ve done it in that collapsible section like that.
The benefit of doing this is, all these words, Google can see all these words too, either one of these scenarios. The accordion that’s just visible on the page, it helps Google understand what this post is about. It’s good for SEO.
There’s a couple of examples on the WordPress Accessibility Day websites that I think are worth looking at. This is our keynote session from 2023, which you could look at if you went to 2023.wpaccessibility.day.
There is a video player. This is using the Able Player WordPress plugin, and we’ve slightly modified how it works for us in our theme. That is a free plugin you can find on wordpress.org. This theme is also open source, so if you wanted to look at what we’re doing with it or search the code for it, you can check that out on the wpa11yday GitHub, that’s our username on GitHub, and find it there.
In this instance, the player has a transcript down below. I’m going to play this video, but I don’t expect anyone to be able to hear it. I’m actually going to just turn the volume way down here. I’ve jumped ahead. There we go. We’ll just mute it. We don’t even need the mute.
The transcript in this player has an option to auto scroll and to highlight where it is, which can be helpful for some people. I could turn off auto scroll if I don’t like that feature potentially turning off. The other thing that is interesting about these is these all function like buttons.
If I got down here and I said, “Oh, I actually want to hear this,” I’ve clicked a line in the transcript and it skips the video forward. This is a synchronized transcript that controls where you are in the video.
You could potentially come here, not play this video at all, read through the transcript and be like, “I was in this session. I remember Jennison saying this thing about what they do at LinkedIn, but I can’t remember exactly what it is,” or I want the exact quote, right?
You could come here, scroll through or even use– Another plus out of having it on the page is I could use my browser search. I could hit command F and type for the thing I think I thought he said and it would find it. Then I could jump to that part of the video using the transcript, which is super nice.
This one, for example, has been translated into French. I can change the language in the player to French and that provides for me a French transcript that I can reference alternative to what is visible on the page.
If you want to see something with sign language, if you look at the 2024– Oh wait, let me go back, sorry. On the 2023 video, you can see we have our sign language. This is an example of in-stream. In the bottom right-hand corner of the video presentation, there is an interpreter who is signing. This has been edited onto the video before we created it for transcription and before we uploaded it to YouTube.
On the 2024 website, moving forward, we’re doing something different, which is we’re not putting the sign language on the video. The reason for that is that it actually– Sometimes you might want the sign language to be positioned in a different spot, or you might want the interpreter to be much bigger. Maybe that small little rectangle in the bottom right corner doesn’t work for you.
What we have done with these is provide a way for sign language to open in a different section. It can actually be moved. You can use arrow keys or you can drag and drop it. We have a space where you could position it if you wanted to. We’ve just left that, but I could even say, “Hey, I actually want this to be–“
Let’s see. I’m going to move it. I actually want this to be really big. Wait. I don’t know if you guys can see this, now that I’m thinking about it. It might have opened in a new window, but you can go play around. Paola, we can see it.
Okay, good. You can drag this to be really big. We’ve had people tell us that they have put it on a whole second monitor. It’s full screen and they have the interpreter full screen. Let me jump ahead in the video.
We can see, for example, that our interpreter is much bigger. This is the other way that you could handle if you wanted to have sign language interpretation. I think this is the direction we’re going because we like that it gives users more control over positioning, size, and all of that stuff.
On the transcript side, there are also some players that are doing a really nice job of embedding them. One that I think in particular has been really improving a lot is Vimeo. I’m on a website for New York eHealth Collaborative. They have this program that they call SHIN-NY and they have a video about it in English, Spanish, Chinese, and Russian.
The text button that opens these, in one, it just looks like the video. Other ones uses both English and the language with the characters. These are all from Vimeo and they all open in modals. It becomes a little more difficult when a video, we didn’t build this, but we’ve been remediating it, opens in a modal.
Do you put the transcript down below? You could maybe, but what happens on a mobile phone? There’s a lot of different conversations there. They happen to use Vimeo, and Vimeo has a feature where transcripts can be included in embeds. It also has a search feature. It also will synchronize and jump forward.
This is a really nice feature that I’ve been enjoying and appreciating a lot in Vimeo that they have been working on. This does work with a keyboard as well. We have keyboard tested it, so that is another option.
Wistia has an API that allows you to do this. YouTube, unfortunately, does not have anything that adds the transcript into the player yet, but you can use Able Player like we were doing on the other one. Those were all YouTube-hosted videos with the Able Player player. Again, that’s a free WordPress plugin.
Those were a few examples. I’m going to stop sharing. I am happy to take any questions that people have about how to implement captions, sign language, any of those other things. Of course, you can always reach me if you want. I’m most commonly on X, but I’m getting on Bluesky a lot more. I think I’m just Amber Hinds on Bluesky. LinkedIn or email me or in our Facebook group. There’s lots of ways to connect.
>> PAOLA: Thank you so much, Amber, for your presentation. I’m going to pop back in for the Q&A portion. We did have a few questions come in while you were presenting. We can get started with this one. If I have a video covering a specific topic and it would be difficult to have captions that would make the video accessible, like you show some software and it would be hard to explain all the UI, could you instead have a blog post as an alternative media presenting the same topic in a text-only format, maybe with images including alternative text?
>> AMBER: Yes, you can. If you’re embedding the video on that page, then I would very clearly label that this video is a media alternative to the text on the page to make it clear that watching the video doesn’t miss it.
You might still get complaints about it not having captions. It probably ought to still have captions for your spoken audio, even if it doesn’t have full description of everything that is being shown.
Wherever that video is, let’s say it’s on your YouTube channel, in the description of the video on YouTube, very high up in the description, hopefully before it collapses, I would put, “Find a written alternative for this video,” and a link to where they can find it on your website.
>> PAOLA: That’s a great answer. Gary asks, “Is the research or documentation that closed caption is preferred over open captions? Just curious.”
>> AMBER: Yes. There is quite a bit of research on that, and generally closed captions has been shown to be preferred by users. Meryl probably has some of that in her talk that she gave that I linked in the slides.
Also, if you go look at the website resource that I provided for the caption and description project, they have some research as well into what– When they talk about their best practices, it is all research-based.
>> PAOLA: Great. Can you clean up the language in the transcript? Very broad question, but you can go ahead and answer that.
>> AMBER: You do not have to caption or transcribe a video exactly word for word. Even if you don’t edit your ums out in a video, you do not need to include those in your transcript, so you definitely can. Basically, they call verbatim transcripts. It’s something that is verbatim, includes every mistake or filler word that you possibly made.
A clean verbatim is pretty much verbatim, but it’s been cleaned up to not include repeated words, filler words like um or uh or like. Then there’s a whole nother kind of transcription, and I feel like I can maybe, we’ll see how fast I can do this, find that talk that Empire Captions did for us once, where they talked about transcription for– Hold on.
>> PAOLA: I found it already. I posted it in the chat.
>> AMBER: You’re so much faster than me.
>> PAOLA: As soon as you started talking about it, I looked it up.
>> AMBER: Daniel Sommer from Empire Captions, and that’s the vendor that we used for WP Accessibility Day. They do sign language interpretation and a lot of that stuff. They also do some live captioning for students in classrooms.
They have a whole different style of captioning when they’re doing that, where it’s not verbatim. It’s modified in a way that makes it easier for a student to learn. It’s almost like writing notes for the students to a degree. I don’t know, it’s very interesting, but he presented about that.
When he did that presentation, we actually had two captioners that day. We had one doing our normal verbatim or clean verbatim captioning, which is what we’re doing right now. One of his captioners also captioned in that, so you can see the comparison between the two different styles of captions on the same talk. That’s an interesting reference.
>> PAOLA: You can also see those two different transcripts in the link in the talk.
>> AMBER: Yes. The one other thing I would say about cleanup language, because I’m not certain if this is what it references, but for example, you might ask, “If someone says an F word, should I type the F word out?”
This happened to us once in a meetup where we got Zoom-bombed, right before we switched to webinars. That was when we switched to webinars. There were a lot of F-bombs, a lot, and our captioner typed them all out. Not F word, F word, but actually what the word was.
The thing is, on that, cleaning up language is, if the word is being heard in the spoken audio, I don’t know, you shouldn’t hide it from someone who is deaf. If you’re going to bleep it out in the spoken audio, then of course it would be bleeped out in the transcript or the captions. You should give an equivalent experience.
For example, if our captioner hadn’t started typing random F-bombs in the middle of that, a deaf person wouldn’t know why I was making a weird face, and the other speaker was making a weird face, and all just being like, “What do we do?” right?
They would have been confused about what was happening, so it is really important to include that and not– If your video is for adults and it can have adult language in the video, then the caption should also have the same adult language because deaf people aren’t children.
>> PAOLA: Yes. At the end of the day, you want to recreate the same experience for someone that’s hearing the presentation to someone that’s reading the presentation. Going to the next question, should you always caption lyrics and songs even when showing the video as a visual example of some idea you’re discussing?
>> AMBER: Yes. Song lyrics should be captioned unless it’s overlaid with spoken audio. In movies, sometimes there’s background where there is lyrics, but what’s important is what the speakers are saying. Then it might just say something like the tone of the music and it might describe the music playing in the background, but it wouldn’t caption all the words of the lyrics.
>> PAOLA: The next question would be, “I have a client who uploads meeting recordings to SoundCloud. There are no captions. They have no idea how to provide captions and there is no budget for captioning. If captioning cannot be provided for audio-video files, would it be best not to provide the audio recordings at all?”
>> AMBER: Yes. If you’re uploading something to SoundCloud and you are not providing any text alternative, then it would be better to not provide the SoundCloud file from a legal and an ethical standpoint.
Harvard and MIT, I mentioned earlier, and they took thousands of videos off their YouTube because they said, “We don’t have the budget or the time, man hours, to caption all of these lectures.” They took them down.
Now, I think where I would question on that without knowing about this client and what they’re doing is, is it for a very specific audience that potentially they could be password protected? They’re not public. Maybe they don’t have to be public. If it’s for their members and they have already talked to all of their members, and their members say, “Hey, we don’t need captions,” they literally asked, then they could not publicly post these.
They could post them on their website in a password-protected area, potentially. There are exceptions in the federal requirements in the United States that say, for example, if a professor is posting something for all of his students, but not publicly, and he or she knows that their students don’t need that, then it’s fine.
I think that’s how I would look at it, but they’re really missing out. I think I would potentially talk to them about, if they don’t have budget to pay a vendor, can they do some of that transcription themselves?
Honestly, if I had to choose between nothing and I really needed it up, then this might be a situation where auto captions are okay for a time. It depends on how risk averse they are.
>> PAOLA: Yes. Just a clarification of something that you mentioned earlier, extended transcriptions. What are they?
>> AMBER: Extended transcriptions are transcripts that contain all of the spoken audio and important sounds, and also descriptions of anything important that was shown visually. It’s not just a written summary, it’s a transcript that maybe even has timestamps and has it written out in the same order that it has in the media itself.
>> PAOLA: Okay. I do see one question in the chat. “We have some music performance videos that are instruments only. Any guidelines for how to navigate accessibility for that?”
>> AMBER: If it is instrumental only, then you would just start at the beginning in the captions and caption music or type of music. If it’s one song, then you could probably just say the name of the song. If it’s multiple, then you would want to re-insert in the caption each time the song changes, like, let’s say, it was a whole concert.
You can also describe tone for music if you want. I think it depends on if it’s literally just a video of an orchestra playing or if there’s other things happening where the tone of the music might– Knowing that might help explain the facial expression of the person, right? Like, “It’s scary now,” or something.
I would look at that. Definitely look at that reference for best practices because that’s going to have a whole bunch on captioning and transcribing music.
>> PAOLA: I think with that, that is all the questions that we had in the queue. Thank you, Amber, for your presentation today. It was very last minute, but it was a great presentation regardless. Do you have any closing thoughts and would you want to let everyone know where to find you?
>> AMBER: Sure. I’ll close and say that we are going to try and follow up with Pesha and get her back on the schedule when she is feeling better. I know someone had asked about that. If you have any questions and you want to follow up with us, either our WordPress Accessibility Facebook group, or you can always email meetup@equalizedigital.com. That goes to me and Paola. Thank you, everybody, for coming in and sitting through my impromptu presentation.
>> [01:22:18] [END OF AUDIO]
Links Mentioned
- Audio Description: If Your Eyes Could Speak: Joel Snyder
- Descript
- WP Accessibilidy Day
- AblePlayer
- Transcription Types for Real-time Settings and Recorded Media: Daniel Sommer
About the Meetup
The WordPress Accessibility Meetup is a global group of WordPress developers, designers, and users interested in building more accessible websites. The meetup meets twice per month for presentations on a variety of topics related to making WordPress websites accessible to people of all abilities. Meetups are held on the 1st Thursday of the month at 10 AM Central/8 AM Pacific and on the 3rd Monday of the month at 7 PM Central/5 PM Pacific.
Learn more about WordPress Accessibility Meetup.
Summarized Session Information
In this session, Amber Hinds provides guidance on meeting accessibility requirements for captions, transcripts, and sign language interpretation. She breaks down the WCAG success criteria related to video and audio content, explaining the differences between pre-recorded and live media and how to ensure compliance at different levels (A, AA, AAA).
Amber walks through methods for testing media alternatives, including how to evaluate transcripts, open and closed captions, and audio descriptions for accuracy. She also addresses the limitations of AI-generated captions and emphasizes the importance of human review.
The session includes practical implementation tips, covering cost-effective strategies for incorporating live captions, transcripts, and sign language interpretation based on budget and audience needs.
By the end of this session, participants will have a clear understanding of how to create and test accessible media content to meet WCAG requirements and improve digital inclusivity.
Session Outline
- WCAG requirements
- How to Test Media Alternatives and Captions
- Practical Tips for Implementation
- Examples
WCAG requirements
Amber Hinds began the presentation by outlining the Web Content Accessibility Guidelines (WCAG) relevant to video and audio content. She explained the distinction between pre-recorded and live content, noting that different accessibility requirements apply depending on whether the media is pre-recorded and shared on platforms like YouTube, Vimeo, or Wistia or whether it is being broadcast live via Zoom or social media streaming.
1.2.1 Audio-only and video-only (prerecorded)
This Level A requirement mandates that all pre-recorded audio and video must have a media alternative unless the audio or video is already a media alternative for text and clearly labeled as such. For example, podcasts must have a transcript available.
1.2.2 Captions (prerecorded)
This is another Level A requirement that states captions must be provided for pre-recorded audio content in synchronized media unless the media is a media alternative for text and clearly labeled as such. This means a video embedded on a website must include captions unless it is solely an audio representation of existing text content.
1.2.3 Audio description or media alternative (prerecorded)
This Level A requirement requires an extended transcript or an audio description for any pre-recorded video content where crucial visual information is conveyed without audio cues. Captions alone are not enough to convey all information to users who are blind or have low vision.
1.2.4 Captions (live)
For live content, 1.2.4 Captions (Live) at Level AA requires captions for all live audio in synchronized media, such as webinars or live streams. Platforms that fail to provide captions, such as the now-defunct Clubhouse, effectively exclude people who are deaf or hard of hearing.
1.2.5 Audio description (prerecorded)
This Level AA requirement states that an audio description must be provided for all pre-recorded video content in synchronized media. This is essential for conveying crucial visual information to users who cannot see the video. If there’s critical visual content, it can be communicated to someone who cannot see that video.
1.2.6 Sign language (prerecorded)
This mandate requires sign language interpretation for pre-recorded synchronized media at Level AAA. Sign language interpretation is provided for all pre-recorded audio content in synchronized media. This could be sign language for video or an audio-only file, ensuring accessibility for users who rely on sign language for comprehension.
1.2.8 Media alternative (prerecorded)
This is another Level AAA requirement where an alternative for time-based media is provided for all pre-recorded synchronized media and all pre-recorded video-only media. This means there must be a fully equivalent text alternative or a non-video, non-audio alternative to ensure access to all users.
1.2.9 Audio-only (live)
This is also a Level AAA requirement stating that an alternative for time-based media that presents equivalent information for live audio-only content is provided.
How to meet WCAG success criteria
Success Criterion | Level | How to meet |
1.2.1 Audio-only and Video-only (Prerecorded) | A | Audio Description Extended Transcript |
1.2.2 Captions (Prerecorded) | A | Closed Captions Open Captions |
1.2.3 Audio Description or Media Alternative (Prerecorded) | A | Audio Description Extended Transcript Static Text (“talking head videos”) |
1.2.6 Sign Language (Prerecorded) | AAA | Sign language in video stream Separate synchronized video |
1.2.8 Media Alternative (Prerecorded) | AAA | Extended Transcript |
1.2.9 Audio-only (Live) | AAA | Transcript |
How to test media alternatives and captions
How to test media alternatives for videos
Testing media alternatives involves ensuring that users who cannot see or hear the video still receive all essential information. To do this, one must:
- View the video while referring to the provided alternatives, such as a transcript or audio description.
- Confirm that all spoken dialogue and significant visual elements are captured accurately in the transcript.
- Ensure that if multiple speakers are involved, the transcript properly identifies who is speaking and describes their actions when necessary.
- Check that any provided text alternatives are programmatically accessible, meaning they can be understood and navigated by assistive technologies.
- If alternative versions of the content are hosted on a separate page, verify that links to them are clearly visible and correctly labeled.
WCAG does not require alternative versions to be on the same page, but they must be easily discoverable through adjacent links.
While keeping transcripts on the same page improves user experience and SEO, linking to them from the video is still considered compliant.
How to test open captions
Open captions are embedded directly into the video and cannot be turned off. Testing them involves:
- Turning off closed captions, if available, to focus only on open captions.
- Checking that all dialogue and important sounds are visible and accurately transcribed.
- Ensuring proper speaker labeling through explicit name tags or positioning captions near the corresponding speaker.
- Verifying color contrast and readability, ensuring the captions stand out against the background and adhere to AA contrast requirements.
- Avoiding distracting animations or effects, such as excessive text highlighting, can disorient some viewers.
While stylized captions may be visually appealing for social media, they should prioritize accessibility over design.
How to test closed captions
Closed captions, which can be toggled on or off, are generally preferred for accessibility. When testing closed captions:
- Activate the closed captioning feature in the media player, ensuring it functions using only a keyboard (i.e., users can tab to the “CC” button and press the spacebar or enter to enable captions).
- Watch the video with captions on, verifying that all dialogue and key sounds are accurately transcribed.
- Confirm proper speaker identification, with names or clear formatting, to distinguish different speakers.
- Test synchronization, making sure captions appear at the right time and stay on screen long enough to be read comfortably.
Proper closed captions enhance usability by allowing users to adjust settings according to their needs, such as font size and background opacity.
Is AI the answer
With advancements in AI-generated captions, these are some things to consider:
- AI-generated captions often contain errors.
- Legal precedent has determined that auto-captions are insufficient. In the Harvard and MIT lawsuits, institutions were required to either provide corrected captions or remove the content entirely.
- Live AI captions, like those in Zoom, struggle with accuracy for technical terminology. For highly specialized topics, relying on a human captioner ensures better comprehension.
- AI captions may lack punctuation and speaker differentiation, making transcripts harder to follow.
While AI can be useful as a starting point, human review and correction are necessary for compliance and professionalism.
Recommended resources for learning
To further improve captioning and transcription skills, these are recommended several resources:
- How to Create Accessible Captioned Videos for WordPress Sites & Beyond: Meryl Evans’ presentation that provides in-depth guidance from a deaf accessibility expert.
- “Guidelines and Best Practices for Captioning Educational Video” from the Described and Captioned Media Program
- FCC Video Programming Accessibility Forum – Online Closed Captioning: includes best practices for captioning across different media formats.
Practical tips for implementation
To provide accessible media, these are recommendations on what you should include:
- All pre-recorded videos need accurate captions.
- Provide written descriptions of important visuals or incorporate self-audio descriptions when creating videos.
- For instructional videos, ensure clear verbal instructions accompany on-screen actions (e.g., instead of saying “Click here,” say “Click the Dashboard tab in WordPress”).
- Consider budget constraints when deciding between DIY captioning or hiring professional services.
- Use captioning software like Descript for in-house transcription and editing.
- For live events, hire certified captioners rather than relying on automated live captions, which often struggle with technical terms and jargon.
- Explore sponsorships to cover the cost of live captions or sign language interpretation, as done for the WordPress Accessibility Meetup.
Accessible media costs
Regarding costs, these are estimates based on industry rates:
- Live captioning: $2.50+ per minute
- Pre-recorded captioning/transcription: $1.25+ per minute
- Sign language interpretation: $150–$300+ per hour
- Audio description: $15–$30 per minute
- DIY transcription software: $12+ per month
It’s important to balance budget and accessibility needs. Organizations can prioritize captions and transcripts before considering sign language or audio descriptions.
Examples
Some examples of how transcripts can be incorporated into websites effectively:
- Plain text transcripts: displayed directly on the page, such as on the Accessibility Craft podcast site.
- Collapsible transcript sections: used on the Equalize Digital website, where transcripts are hidden within an expandable section to keep the page clean while making the text searchable by Google.
- Synchronized transcripts in video players: implemented on the WordPress Accessibility Day site using the Able Player plugin, allowing users to click on transcript text to jump to specific points in a video.
- Sign language interpretation options:
- In-stream interpretation (small embedded window within the video)
- Separate movable and resizable video windows for user control
- Vimeo’s built-in transcript feature: enables embedded transcripts and a searchable text feature within the video player.