DvdSubML draft specification
My original subtitle format looks a bit like HTML, so not surprisingly someone suggested that I go all the way and come up with a proper XML-based subtitle format. The specification below is the result. It's quite a bit cleaner than my original format, and to my surprise it's also nearly as terse; lack of terseness was the main reason I didn't go XML in the first place.
This spec is currently not implemented, and before I implement it I'd like to get some feedback from people who are interested in writing or reading this format. Please use the form at the bottom of the page. Thanks!
Here's an example of a DVDSubML document: [I need to come up with a better name...]
<?xml version="1.0"?>
<dvdsub>
<styles subaspect="4:3">
<font name="dlg" face="Arial" size="27" weight="bold" spacing="-3"/>
<font name="dlg_i" face="Arial" size="27" weight="bold" spacing="-3" italic="yes"/>
<color name="dlg" text="#FFFFF00" halo="#F000000" highlight="#F00FF00"/>
<box name="dlg" left="64" right="656" top="40" bottom="440" align="2L"/>
<stream name="main" lang="en" id="21" vts="02" basefield="0" scale="24to25"/>
</styles>
<body stream="main">
<div font="dlg" color="dlg" box="dlg">
<p on="0000" off="1000">This is an auto-wrapped line of dialogue.</p>
<v on="1000" fadeout="1200" off="1300">
<l>This line of dialog is</l>
<l><i>not</i> auto-wrapped,</l>
<l>and will fade away.</l>
</v>
</div>
</body>
</dvdsub>
A document consists of one or more styles elements, which define font faces, colors, and the like, followed by a body element which contains text to be rendered and timing information.
- A styles element contains zero or more font, color, box, and stream subelements. It does not contain any character data. The subelements are empty and (therefore) cannot be nested.
- All subelements of styles have a required attribute name, which specifies a name which can later be used to refer to the style. The different style types live in different namespaces, so you can declare a font style and a color style with the same name, for example.
- A font element declares a font style. It has two required attributes besides name: face, which specifies the typeface, and size, which specifies the font size in pixels. The optional attributes are weight (which if specified must be "bold"), italic (which if specified must be "yes"), and spacing (which behaves like the linespacing directive in the original spec).
- A color element declares a color style. You can specify any combination of text, halo, and highlight colors. The text color is used to draw the text itself; the halo color is used for the border around the text; and the highlight color is used for karaoke highlighting (see below). If any color is omitted it defaults to transparent. [Should I be using the leading "#", given that the color format isn't really the same as HTML's?]
- A box element declares a text placement style; see the textbox directive in the original spec. An omitted boundary defaults to 0, 720, 480, or 576 as appropriate. The alignment cannot be omitted. [Should alignment be specified here, or somewhere else?]
- A stream element declares a subpicture stream "style". The attributes are:
- lang (optional): Specifies the subpicture language as a two-letter ISO country code. If omitted the language is unspecified.
- id (optional): Specifies the ID of an existing subpicture stream to be replaced. If it is omitted a new stream will be added to the existing streams.
- vts (required): Specifies the video title set.
- angle (optional): Specifies the angle to which subtitles will be applied in a multi-angle DVD. If omitted the default is 1.
- basefield (optional): Specifies an offset in video fields from the beginning of the title set to the field which will be considered as number 0. If omitted the default is 0, natch.
- scale (optional): Specifies a scaling factor to be applied to the video field numbers. (The scaling factor is applied before the basefield offset is added.) Possible values are:
- 24to25: Converts 23.976fps NTSC film offsets to 25fps PAL film offsets (by multiplying by 4/5).
- 60to50: Converts 59.94field/sec NTSC video to 50field/sec PAL video (by multiplying by 1001/1200).
- 25to24 and 50to60: the reciprocals of those. [Should arbitrary scaling factors be allowed?]
- If there is more than one styles element, the attributes of each element are used to choose the most appropriate one, and the styles in the others are ignored. The attributes are:
- subaspect: Specifies the aspect ratio of the subpicture overlay, either 16:9 or 4:3. This is not necessarily the same as the aspect ratio of the video stream. For example, when displaying 16:9 video on a 4:3 screen, some players apply the subpictures before adding the black bars (subaspect="16:9") and others apply them after (subaspect="4:3").
- [I haven't really hashed this part out yet. What other attributes should there be? It should be possible to make a single subpicture file work automatically on different DVD releases of the same movie. What else? Hearing impaired/non? How should the user choose a style set when that's necessary?]
- The body element contains a description of what text to render and when. It contains zero or more div, p, or v elements.
- The body element and all its subelements may have any of the following attributes where they make sense (the box attribute would not make sense on an l element, for example):
- on and off specify the first field at which the affected text will become visible and the first field at which it will no longer be visible, respectively.
- fadein specifies the first field at which the affected text will be fully visible; if this is specified the text will smoothly fade in between the on and fadein times.
- fadeout specifies the first field at which the affected text will no longer be fully visible.
- highlight and unhighlight specify the field at which the text color will be replaced by the highlight color and the field at which the original color will be restored. This is intended for highlighting in karaoke subtitles. [Does anyone actually need/want this? It's easy enough to implement.]
- font, color, box, and stream set the corresponding styles by reference to a name declared in the styles section. New styles cannot be created in the body section.
- These attributes are inherited by all subelements unless overridden. The box attribute is "consumed" at the p or v level, while the others apply to individual characters of text. (Yes, even stream. I'm not suggesting that this is useful.)
- The subelements of body are:
- A div element is simply a container for more div, p, or v elements; its purpose is to define inheritable attributes (see above).
- A p element contains character data which will be wrapped if necessary to fit in the width of the screen. Along with character data it can contain span and i elements. The contents of different p elements are rendered independently.
- A span element is simply a container with attributes, like div. It can contain character data and span and i elements.
- i is like span, except that it adds "_i" to the name of the current font style if it didn't have it already, or removes it if it did. This is just a shorthand for what is likely to be the most common use of span -- namely, italicizing or de-italicizing part of a line of text.
- v contains one or more l elements, each of which is rendered on its own line. The contents of l elements are not auto-wrapped. They may contain span and i elements, just like the contents of p. ["v" comes from "verse", as in poetry, which typically has hard line breaks. Anyone have any better naming ideas for these one-letter tags?]