NOTE: Much of my writing published here has taken a broad look at the industry and with a fairly philosophical tone. This post is going to be super geeky - it’s about specific technical systems for live stream video captioning. If it’s not your jam, feel free to gloss over - but you might like the photo at the end. Even if this isn’t your cup of tea, I hope you’ll come back to see my future writings - I’ll be jumping between tech-heavy and more general postings.
In late October, I was working with a long-time client on the latest virtual iteration of their recurring live developer conference series. This is a major tech company that has a strong commitment to web, product, and event accessibility. I’ve been involved in event and video accessibility for almost a decade. Having a client who shares this commitment makes this client a pleasure to work with. I’ve been designing live closed caption workflows for this client for about 5 years. The scope and systems have evolved over time, but their commitment to accessibility overall and captioning specifically is admirable.
For this event broadcast, we were airing a mix of live and pre-recorded content, which was all being captioned live by a human stenographer. When designing the streaming and captioning workflow, we had two key criteria - one, we needed to have redundant internet signal paths from the studio all the way to YouTube, and two, we needed to have the ability to ingest live closed caption data as CEA 608 or CEA 708 metadata.
Brief definition time - “Closed Captions” are so-called because they are off (closed) by default and the viewer can choose to turn them on. They are metadata that is rendered by the player or viewer’s television. “Open Captions” are an always-on part of the video itself and are not user controllable.
For this event, the requirement was Closed Captions - open captions were only a fallback.
With YouTube in particular there are two ways to ingest live captions - Broadcast Standard CEA608/CEA708 caption metadata in the stream, or YouTube’s caption embedding API.
Based on prior experience, YouTube’s built-in caption embed API has proven unreliable with inconsistent display timing, so based on that experience we chose to use CEA 608 captions.
Regarding the redundant signal paths, YouTube, like many other streaming services and CDNs offers the option for a Primary and Backup RTMP ingest URL. We wanted to be sure that we fed the Primary ingest URL via hardline fiber internet for robust connection, and then that we fed the backup ingest URL from a separate encoder on a Cellular LTE connection. This way if the network went out, we could continue streaming and YouTube would auto-negotiate the failover.
To embed the CEA 608 Closed Captions on the live stream while maintaining signal and ISP redundancy, I engineered a stream and caption signal path that used two instances of EEG’s cloud-based caption encoder, Falcon. This system ingests caption data via EEG’s iCap protocol, and combines an incoming RTMP stream with the caption data and then pushes it on to the RTMP destinations of choice.
See the signal flow drawing below for more visibility into how we made that possible.
And a bonus photo of my home studio/WFH setup for this event
Tim Kerbavaz (he/him) is the founder and Technical Director of Talon Entertainment Audio Visual as well as a freelance Technical Producer with over a decade in corporate event production after an early career in live music production management. As both a live event technology professional and a creative geek, Tim serves as an event technology sherpa, guiding clients through production and technology decision making and delivering events to production bliss.