API tutorial: Let’s import pysubs2

This tutorial will show you how to use most of what pysubs2 library has to offer. If you are familiar with Python and Aegisub, you will hopefully find the API quite intuitive.

If you want to follow along in your REPL, make sure to have Python 3 and pysubs2 installed.

Reading subtitle file

Let’s settle on a simple subtitle file first.

>>> SIMPLE_FILE = """\
... 1
... 00:00:00,000 --> 00:01:00,000
... Once upon a time,
...
... 2
... 00:01:00,000 --> 00:02:00,000
... there was a SubRip file
... with two subtitles.
... """
>>> with open("subtitles.srt", "w") as fp:
...      fp.write(SIMPLE_FILE)

Now that we have a real file on the harddrive, let’s import pysubs2 and load it.

>>> import pysubs2
>>> subs = pysubs2.load("subtitles.srt")
>>> subs
<SSAFile with 2 events and 1 styles, last timestamp 0:02:00>

Note

By default, pysubs2 uses UTF-8 encoding when reading and writing files, with surrogate pair escape error handling. This works best if your file is either:

  • in UTF-8 encoding or

  • in a similar ASCII-like encoding (line latin-1) and you don’t need to work with the text (only convert subtitle format, shift time, etc.).

Use the encoding and errors keyword arguments in the pysubs2.SSAFile.load() and pysubs2.SSAFile.save() methods in case you need something else, or you can do the processing yourself and work only with str using pysubs2.SSAFile.from_string() and pysubs2.SSAFile.to_string().

If you use the default settings, you can get the input bytes for a particular subtitle using:

>>> subs[0].text.encode("utf-8", "surrogateescape")

Now we have a subtitle file, the pysubs2.SSAFile object. It has two “events”, ie. subtitles. You can treat subs as a list:

>>> subs[0].text
"Once upon a time,"
>>> for line in subs:
...     print(line.text)
Once upon a time,
there was a SubRip file\\Nwith two subtitles.

Individual subtitles are pysubs2.SSAEvent objects and have the attributes you’d expect, like start, end and text. Notice that the second subtitle text doesn’t contain a newline, but literal “backlash N”, which is how SubStation represents newlines. There could also be override tags like {\i1} for italics.

Tip

If you don’t entertain SubStation, there is also a pysubs2.SSAEvent.plaintext property which hides override tags and translates newlines for you. Be warned, however, that writing to this property throws away any override tags.

>>> for line in subs:
...     print(line.plaintext)
Once upon a time,
there was a SubRip file
with two subtitles.

Working with timing

Let’s have a look at the timestamps.

>>> subs[1].start
60000

That is 60,000 milliseconds, or one minute. pysubs2 uses plain int milliseconds for time. Since you probably don’t want to convert all times to milliseconds by hand, there is a handy function called pysubs2.make_time(). You can use this function to give times in minutes and seconds, but also in frames.

>>> subs[1].start == pysubs2.make_time(s=2)
True
>>> subs[1].start == pysubs2.make_time(frames=50, fps=25)
True

Tip

pysubs2.SSAEvent objects define ordering with respect to time, meaning you can sort them chronologically. There is pysubs2.SSAFile.sort() method for this purpose.

Let’s write a function to retime a subtitle file by adding a constant to all timestamps!

>>> def shift(subs, ms):
...     for line in subs:
...         line.start += ms
...         line.end += ms
shift(subs, 500)

Well, it turns out the library can already do this with pysubs2.SSAFile.shift(), which takes the same arguments as pysubs2.make_time(). Let’s shift the subtitles back.

>>> subs.shift(s=-0.5)

Note

You can have negative timestamps in your subs. They are assumed to be zero for purposes of export.

Working with styles

As you’ve seen already with the newlines, pysubs2 works with SubStation, meaning our SRT file actually has a “Default” style associated with its subtitles.

>>> subs.styles["Default"]
<SSAStyle 20.0px 'Arial'>

Let’s add one more style, with italics, and let the second subtitle have it.

>>> my_style = subs.styles["Default"].copy()
>>> my_style.italic = True
>>> subs.styles["MyStyle"] = my_style
>>> subs[1].style = "MyStyle"

Notice that the subtitle object (pysubs2.SSAEvent) and the style object (pysubs2.SSAStyle) aren’t really connected. Instead, styles are referred to by their name in the pysubs2.SSAFile.styles dictionary.

You can also create a pysubs2.SSAStyle directly, specifying the attributes you want, while the other attributes will use values from the default style:

>>> top_style = pysubs2.SSAStyle(fontsize=10, alignment=pysubs2.Alignment.TOP_CENTER)
>>> subs.styles["Top"] = my_style
>>> subs[1].style = "Top"

Tip

Renaming a style is a little difficult, because you also have to fix all references to the old name. The pysubs2.SSAFile.rename_style() method does what’s needed behind the scenes.

Saving subtitle file

Now that the second subtitle uses “MyStyle”, it should appear in italics. Let’s export to SRT again to see if that’s the case!

>>> modified_srt = subs.to_string("srt")
>>> modified_srt
"""\
1
00:00:00,000 --> 00:01:00,000
Once upon a time,

2
00:01:00,000 --> 00:02:00,000
<i>there was a SubRip file
with two subtitles.</i>

"""

Indeed it is. Of course, since SubRip has no concept of styles, the italics will get converted to inline tags and styles will be lost if we load this exported file:

>>> modified_subs = pysubs2.SSAFile.from_string(modified_srt)
>>> modified_subs[1].text
"{\\i1}there was a SubRip file\\Nwith two subtitles.{\\i0}"
>>> modified_subs[1].style
"Default"

It’s better to save the file as ASS so that style information isn’t lost.

>>> subs.save("modified_subtitles.ass")
>>> with open("modified_subtitles.ass") as fp:
...      print(fp.read())
[Script Info]
; Script generated by pysubs2
; https://pypi.python.org/pypi/pysubs2
WrapStyle: 0
ScaledBorderAndShadow: yes
Collisions: Normal
ScriptType: v4.00+

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,20.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100.0,100.0,0.0,0.0,1,2.0,2.0,2,10,10,10,1
Style: MyStyle,Arial,20.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,-1,0,0,100.0,100.0,0.0,0.0,1,2.0,2.0,2,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:01:00.00,Default,,0,0,0,,Once upon a time,
Dialogue: 0,0:01:00.00,0:02:00.00,MyStyle,,0,0,0,,there was a SubRip file\Nwith two subtitles.

Example: Creating top/bottom bilingual subtitles

Let’s say you have English and Italian subtitles for the same movie and you wish to create combined subtitle file with Italian subtitles located at the bottom and English at the top, with different colors for each language. The following code creates this combined file and saves it in SubStation Alpha format (this is important, you need to use a format that supports positioning):

import pysubs2
from pysubs2 import Alignment, Color, SSAFile, SSAStyle

subs_en = pysubs2.load("subs.en.srt")  # read input subtitles in SRT format
subs_it = pysubs2.load("subs.it.srt")

subs = SSAFile()
subs.styles = {
    "bottom": SSAStyle(alignment=Alignment.BOTTOM_CENTER, primarycolor=Color(255, 255, 0)),
    "top": SSAStyle(alignment=Alignment.TOP_CENTER, primarycolor=pysubs2.Color(0, 128, 128)),
}
for e in subs_it:
    e.style = "bottom"
    subs.append(e)
for e in subs_en:
    e.style = "top"
    subs.append(e)

subs.save("subs.ass")  # write subtitles in ASS format (supports formatting)

And that’s it! Now you should be a little familiar with pysubs2. Have a look at the API Reference to see what’s there.

Some final thoughts, in no particular order:

  • The library tries its best to read given file. Format detection and actual parsing is rather benevolent.

  • Only basic SubRip/MicroDVD tags are supported.

  • If you are unsure about SubStation, get familiar with the Aegisub subtitle editor. You can also use the SubStation specification for reference.

  • When working with MicroDVD, you sometimes have to specify the fps argument when loading and saving. There is a convention to specify framerate in the first subtitle, which pysubs2 handles transparently.

  • If your goal is to create complex effects with frame-perfect timing, you may want to check out the PyonFX library which is focused on this use case. This discussion of timestamps vs. frames may also be relevant to you.