All Posts
Post-Production15 min read

Closed Captions and Subtitles for Film Delivery: Formats, Standards, and Mistakes to Avoid

Computer screen displaying subtitle editing software with caption timing tracks in a post-production workflow

The Caption File That Blocked a Netflix Release

A post supervisor delivers a completed indie feature to Netflix. Video, audio, and metadata all pass QC. The subtitle file -- provided by a captioning vendor as a standard SRT -- fails compliance. Netflix requires closed captions in a specific TTML format (Timed Text Markup Language) with pixel-based positioning, specific font encoding, and an embedded frame rate declaration. The SRT file has none of those elements. The vendor who provided the SRT insists it is "industry standard." Netflix's QC system disagrees.

Re-delivery takes 11 days. The release date slips. The cost of the corrected captions -- $800 from a vendor who actually understands Netflix's spec -- was not in the budget.

This scenario is extraordinarily common, because "subtitles" and "closed captions" are treated interchangeably in production even though they're technically and legally distinct, and because platform delivery specifications differ enough that a caption file that works on YouTube will fail on Netflix, Amazon, and most broadcasters.

This post covers the major subtitle and caption formats, the specific requirements of the platforms that matter most to indie filmmakers, and the workflow for creating and converting caption files without introducing timing errors.

Caption format specifications referenced in this post are drawn from the Netflix Partner Help Center Open Content Delivery Specification, the Amazon Prime Video Delivery Specification, the W3C TTML standard (TTML2), and the SMPTE Timed Text standard (SMPTE ST 2052-1).

Closed Captions vs. Subtitles: The Distinction That Matters for Delivery

Closed captions are a transcription of all audible content in the film -- dialogue, sound effects, music descriptions, speaker identification. They're designed for viewers who are Deaf or hard of hearing. In the US, the Americans with Disabilities Act (ADA) and the 21st Century Communications and Video Accessibility Act (CVAA) require closed captions on video content distributed over the internet. Streaming platforms comply by requiring closed caption files from distributors.

Subtitles are a translation or transcription of dialogue only, designed for viewers who can hear but don't understand the spoken language -- or for situations where the dialogue is in the same language as the viewer but hard to follow. Subtitles typically don't include sound effect descriptions or music cues.

The practical difference for delivery: Netflix requires closed captions (SDH -- Subtitles for Deaf and Hard of Hearing -- in their terminology) for all English-language content. These must include sound effect and music descriptions. A dialogue-only subtitle file will fail their QC. Most other platforms follow similar requirements for US English content.

Caption and Subtitle Format Reference

FormatFull NameExtensionUsed ByPositioningStylingNotes
SRTSubRip Text.srtYouTube, Vimeo, most playersLimitedNoneSimplest format; no styling or precise positioning
VTTWebVTT.vttWeb browsers, YouTube, VimeoYesLimitedHTML5 video standard; supports positioning and basic CSS
SBVSubViewer.sbvYouTube (legacy)NoneNoneYouTube-specific legacy format; largely replaced by SRT/VTT
TTMLTimed Text Markup Language.ttml or .xmlNetflix, Amazon, Apple TV+Yes (pixel)FullW3C standard; supports complex styling and pixel positioning
IMSC1Internet Media Subtitles and Captions.ttmlNetflix (preferred)Yes (pixel)FullSubset of TTML; Netflix's preferred spec
SCCScenarist Closed Captions.sccUS broadcast, legacy systemsYesLimitedBroadcast legacy format; required by some US broadcasters
EBU-TTEBU Timed Text.xmlBBC, European broadcastYesFullEuropean broadcast standard; similar to TTML
CAP / STLVarious broadcast formats.cap / .stlBroadcastersYesLimitedPlatform-specific; requires specialist software

The critical insight from this table: SRT is the most widely readable format but the least capable. Netflix and Amazon don't accept SRT as a final deliverable. They accept TTML or IMSC1. YouTube accepts SRT as the simplest option for upload but supports VTT for better control. A caption workflow that produces only SRT will fail on professional distribution platforms.

Three Real-World Caption Delivery Scenarios

Example 1: Short Film Festival Circuit

A 34-minute documentary short submitting to 25 festivals including Sundance, SXSW, and DOC NYC. The filmmaker used a transcription service to create captions and received an SRT file.

Platform requirement: Most festival submission platforms (FilmFreeway, Withoutabox legacy, Cinando) accept SRT files for the submission screener. Festivals that screen digitally typically use their own caption system and may not use the filmmaker's file at all for in-theater screening. However, a DCP submission requires burned-in subtitles (hard subtitles encoded into the image) for a captioned theatrical screening, not a separate file.

Action taken: The filmmaker kept the SRT for screener submissions. For the DCP, the post house burned the captions into the picture as hard subtitles using DaVinci Resolve. The SRT was also converted to VTT using the Captions Converter for Vimeo private screener links where programmers requested an accessible version.

Key lesson: SRT is sufficient for the festival circuit screener process. The DCP theatrical version requires hard subtitles burned at the DCP stage -- not a separate caption file -- unless the DCP is being screened with a specialized caption system like Sony's Access Caption system.

Example 2: Feature Film, Netflix Delivery

A 102-minute English-language feature delivering to Netflix. The post supervisor commissioned closed captions from a broadcast captioning house expecting standard output. The vendor delivered an SCC file (Scenarist Closed Captions, the legacy broadcast format).

Netflix requirement: Netflix accepts IMSC1 (TTML) with specific attributes including: xml:lang metadata, tts:fontFamily declarations, tts:fontSize values, pixel-based tts:origin and tts:extent attributes for positioning, and a ttp:frameRate declaration matching the video frame rate.

Conversion process: The SCC file was converted to TTML using EZTitles software. The conversion required manual verification of: timing accuracy (SCC stores timing as frames; the conversion must correctly handle 29.97fps drop-frame vs. non-drop-frame), sound effect and music cue descriptions (the original SCC was dialogue-only and required 2.5 hours of additional work to add SDH content), and positioning (default centered positioning was acceptable for Netflix's spec, but side positioning for two-speaker scenes required manual adjustment).

Total additional cost: $1,400 in vendor labor beyond the original SCC deliverable. The lesson: specify the exact platform format requirement to your captioning vendor before commissioning work, not after receiving a non-compliant file.

Example 3: Multi-Language Documentary, Amazon Prime Video

A 76-minute documentary acquiring Amazon Prime Video rights in five territories: US, UK, Canada, France, and Germany. Each territory requires a separate subtitle file -- English SDH, English for D/HH (UK variant), French translation, French SDH, German translation, and German SDH.

Amazon requirement: Amazon Prime Video accepts TTML for all subtitle and caption deliverables, plus SRT as a fallback for some territories. Each file must include the correct language tag in the xml:lang metadata.

Workflow: The production hired a localization company that works natively in TTML. Translation files were built in TTML from the start rather than being translated to SRT and then converted up -- avoiding the double-conversion timing errors that accumulate when SRT timestamps are parsed and re-encoded. All six files were QC'd against the locked picture in the localization company's preview tool before delivery.

Total caption cost: Approximately $4,200 for all six files across five territories. This was budgeted in the post-production budget from the start, based on a per-minute rate of approximately $12-18 per language per minute for TTML deliverables.

Step-by-Step: Caption Workflow from Transcript to Delivery

Step 1: Create your master transcript before editing is locked. A verbatim transcript of all dialogue is the source document for all caption work. Generate it during the editing process -- not after -- by having the editor or a transcription service (Rev, Verbit, Otter.ai) create a time-stamped transcript from your cut. This transcript is then timed to picture in the caption session.

Step 2: Choose your caption format based on primary delivery platform before commissioning caption work. If Netflix is the primary distributor, specify IMSC1 TTML from the start. If Amazon, specify TTML. If the delivery is YouTube-only, SRT or VTT is sufficient. If broadcast delivery is planned, confirm with the broadcaster which format they accept -- SCC for US broadcast legacy systems, EBU-TT for European broadcasters.

Step 3: Time captions to locked picture, not a rough cut. Caption timing is frame-accurate. Any change to the edit after captions are timed -- even a single frame trim -- requires the captioning vendor to re-time every caption after the trim point. Commission caption work only after picture lock is confirmed.

Step 4: Verify that the caption file includes sound effect and music descriptions if SDH is required. A standard subtitle file for a hearing audience includes only dialogue. An SDH or closed caption file must also include: significant sound effects ("[GUNSHOT]", "[THUNDER]"), music descriptions ("[TENSE ORCHESTRAL SCORE]"), and speaker identification when speakers are off-screen or unclear. Review the complete file against the picture before delivery -- not just the dialogue lines.

Step 5: Convert formats using the [Captions Converter](/tools/captions-converter) with frame rate verification. When converting between formats (SRT to VTT, SRT to TTML), confirm that the source file's frame rate matches the target frame rate. An SRT file created from a 25fps edit must be re-timed if the delivery version is at 23.976fps. The Captions Converter handles common frame rate conversions automatically, but verify by spot-checking 3-4 timing values against the picture on playback after conversion.

Step 6: QC the final file against the locked picture in real-time playback. Import the final caption file into your NLE or a media player that supports your caption format (VLC supports SRT and VTT; Resolve supports TTML and SRT via the subtitle track). Play through the entire program with captions active. Confirm that every caption appears at the correct frame, reads at a comfortable pace (typically 17-21 characters per second for SDH), and disappears before the next caption appears.

Pro Tips and Common Mistakes

Pro Tip: For a production that will deliver to multiple platforms, create a master TTML file and derive SRT/VTT versions from it. TTML to SRT conversion is lossless for timing data -- the conversion strips styling and positioning but preserves the text and timestamps. SRT to TTML conversion requires adding formatting attributes that aren't present in the source. Working from the richer format downward is safer than working from the minimal format upward.

Pro Tip: Netflix's caption spec requires a minimum caption duration of 5/6 of a second (approximately 20 frames at 24fps) and a maximum of 7 seconds. Very rapid dialogue may produce captions shorter than 20 frames. Netflix's QC will flag these as violations. When commissioning captions for Netflix, specify these minimum and maximum duration rules explicitly to the captioning vendor. Most professional vendors know this; smaller vendors may not.

Pro Tip: Build caption delivery cost into your post-production budget as a percentage of total post spend. A rough guide: English SDH captions for a feature typically cost $400-900 depending on the format and vendor. Each additional language adds $500-1,500 for translation plus captioning. A film delivering to five territories in three languages should budget $4,000-8,000 for caption and subtitle work as a standard post line item, not an afterthought.

Common Mistake: Submitting SRT files to Netflix or Amazon because "subtitles are subtitles." Both platforms have automated QC systems that reject non-compliant formats at intake. The rejection notification arrives with a list of errors that can be opaque to anyone unfamiliar with TTML specifications. The fix requires a specialist vendor, not just a format conversion tool. Specify the correct format in your deliverables checklist before production ends and the post budget is closed.

Common Mistake: Using video-player burned-in captions (where captions are encoded into the video image rather than delivered as a separate text file) for streaming platform delivery. Burned-in captions cannot be turned off by the viewer, cannot be repositioned for accessibility, and do not meet ADA or streaming platform closed caption compliance requirements. Burned-in subtitles are appropriate only for: DCP theatrical screenings where a separate caption system is unavailable, and foreign-language subtitles in film markets where on-screen text is the expected format. For streaming delivery, always deliver caption data as a separate text-based file.

Frequently Asked Questions

What's the difference between SDH subtitles and closed captions?

SDH (Subtitles for Deaf and Hard of Hearing) and closed captions serve the same audience and contain the same content -- dialogue, sound effects, and music cues. The technical difference is in the delivery mechanism: traditional closed captions (as defined by the EIA-608 and CEA-708 standards) are encoded as data in the video signal's blanking interval and can be decoded by a television's built-in caption decoder. SDH subtitles are text-based overlay files delivered separately from the video, displayed by the media player's subtitle engine. For streaming platform delivery, both terms refer to the same deliverable: a text-based file containing full transcription including non-dialogue audio information.

Why does converting from SRT to TTML sometimes introduce timing errors?

SRT stores timing as HH:MM:SS,mmm (hours, minutes, seconds, milliseconds). TTML can store timing as either clock time or frame-accurate timecode. When an SRT created from a 25fps edit is converted to TTML for a 23.976fps delivery, the millisecond values don't convert to whole frames at 23.976fps -- there's a small rounding error at each caption entry. Over 1,500 captions in a feature film, these rounding errors accumulate. The safe workflow is to have captions created natively at the correct frame rate from the start, or to use a conversion tool that handles the 25fps-to-23.976fps remapping explicitly rather than a simple format conversion.

How many characters per line should a subtitle have?

The broadcast and streaming industry standard is a maximum of 32 characters per line for most delivery formats, with two lines maximum per caption entry. Netflix's spec allows up to 42 characters per line in some cases. Reading comfort research suggests that at 17-20 characters per second reading speed, a 32-character line lasting 1.6 seconds is at the boundary of comfortable reading. For captions targeting wide audiences including lower literacy viewers, 28-32 characters per line at a 14-character-per-second reading speed is safer. Always prioritize readability over caption efficiency.

What software do professional captioners use?

Professional broadcast captioners typically work in Lexi, CaptionMaker, EZTitles, or Cavena. For streaming platform deliverables, Ooona and EZTitles both support TTML output to Netflix and Amazon specs. For independent filmmakers doing their own captions, DaVinci Resolve has a built-in subtitle track that exports SRT, WebVTT, and basic TTML. Aegisub is a free, cross-platform tool popular in the subtitling community that supports SRT, ASS/SSA (used for anime), and basic conversions. The Captions Converter handles the common format conversions (SRT to VTT, SRT to TTML) without requiring dedicated caption software.

When do I need to deliver captions in multiple languages?

Caption requirements by territory vary: US distribution requires English closed captions (ADA compliance). UK distribution requires English SDH. French distribution requires French closed captions (legal requirement under OFCOM and CSA regulations). German distribution is required for publicly accessible content. Most streaming platform agreements specify which territories require localized captions as part of the deliverables package. If your distribution deal covers multiple territories, review the deliverables appendix carefully -- missing a required language is a contractual breach, not just a QC issue.

The Captions Converter converts between SRT, VTT, and TTML formats with frame rate handling, letting you produce platform-compliant files from a single master source without specialist caption software. For understanding the full post-production deliverables chain of which captions are one part, the post on audio delivery standards for film and television covers the parallel requirements for audio file formats, LUFS targets, and channel configurations. The Post Production Timeline Estimator helps you schedule the captioning step correctly -- after picture lock but with enough lead time before your delivery deadline to allow for QC and corrections.

Conclusion

Caption format compliance is not a technical nicety -- it's a legal requirement in the US and a contractual requirement in virtually every distribution deal. The cost of getting it wrong is a delayed release and an unbudgeted re-delivery. The cost of getting it right is a line item in the post-production budget and a clear format specification in the captioning vendor brief.

This post covers single-language and multi-language text-based caption files for digital delivery. Audio description tracks (a separate narration channel for visually impaired viewers) are required by some broadcasters and larger streaming platforms and involve an entirely different workflow beyond the scope of this guide.

Have you had a caption file fail QC at a major platform, and was the issue a format problem, a timing problem, or a content compliance issue?