15 : Creating a Screencast

Screencasts are an increasingly common way of explaining software products. I probably preferred Turbogears to Django because of the 20 minute wiki screencast by Kevin Dangoor – http://files.turbogears.org/video/20MinuteWiki2nd.mov. So, this month, we will create a movie. We have chosen a programmed slide show as a simple illustration. The ideas can be expanded and generalised for creation of an effective and compelling screencasts. The same concept can transform your digital images into an exciting audio-video treat for your parents.

A movies is really a sequence of images displayed at a predefined rate. The showing of the images is synchronised with audio. A group of images and the corresponding audio are grouped together and regarded as a scene in a movie. Independently created scenes can be pieced them together to create the illusion of a longer movie.

Defining the Screencast

In this tutorial, you will take a sequence of screen shots you wish to explain. You will write a script for each of the screen shots. The application, festival or espeak, will be the actor which converts the script into a voice. Each scene will be the displaying of a screen shot for as long as it takes to speak the associated script.

Create a set of screen shots for the product you wish to explain in a directory, numbering them sequentially, e.g. PhotoApp00.png, ..., PhotoApp05.png.

You could write the script in a separate text file for each or in a single file. In this tutorial, you will write it in a single file, a header line followed by the script and a blank line for each of the slides in order. The first two characters of the header will be the image number.


Start the python application my_photos from the terminal


A new image will be displayed to you.


Type the text you would like to appear as a caption in the text box.


Once you press enter, the text will be displayed on the image as you can see.


Now, click on the save and next button. The image will be saved.


And you will be shown the next picture.

Repeat the steps until all the photographs are processed.

Note, that if you do not wish to put a caption on a picture

and save it, you can press the next button.

The Implementation

The core logic of your application will be:

#!/usr/bin/env python

import os, sys

import wave

import Image, ImageTk, ImageDraw

script_file = open('Script.txt')

# iterate over each scene

for scene_id, image, text in scene_data(script_file):

duration = text_to_speech(text)

# create frames assuming 25 frames per sec

for frame_no in range(25*duration):

image.save(scene_id + "%03d"%frame_no +".jpg")

# convert the frames into a scene

os.system('mencoder -audiofile ' + scene_id + 'text.wav -oac mp3lame "mf://' \

+ scene_id +'*.jpg" -mf fps=25 -o out_' \

+ scene_id + '.avi -ovc lavc -lavcopts vcodec=mpeg4')

# Create an animated scene to end using the last image


# combine the scenes into a single film

os.system('mencoder -ovc copy -oac mp3lame -o output.avi out_*.avi')

The script file is opened. It is best to use the scene id to be a numeric string of fixed number of digits. That will ensure that the order of scenes is easily maintained.

An image and the corresponding text are selected. The text is converted to a speech file. The image is copied as many times as the number of frames will be needed for the duration of the speech file.

The speech file and the images (using the mf://xx*.jpg url) are combined and converted into an avi file by using mencoder. The sound file is converted to mp3. If you are familiar with ffmpeg, you may use that instead of mencoder.

Finally, all the avi files are combined into a single avi file.

The code for the generator for fetching the image and the text file will be:

def scene_data(script_file):

while True:

# the first two characters in the script are the scene id

scene_id = script_file.readline()[:2]

# readline will return an empty string after EOF

if scene_id.strip() == '':


# the images are png files in screencast subdirectory

im_file = 'screencast/PhotoApp' + scene_id + '.png'

image = Image.open(im_file)

frame = image.resize((640,480))

# read lines until an empty line

text = ''

while True:

line = script_file.readline()

if line.strip() == '':


# Append replacing new line by a space

text += line.replace('\n', ' ')

yield scene_id, frame, text

The script file structure was explained above. The code keeps reading the script file until there is no more data. The first two characters of the header line are the screen id. The images must be named as per a fixed format with two characters being the scene id.

The image is resized to a fixed size. The generator yields the values of the scene id, the resized image and the text associated with that image.

Next step is the code to convert the text to speech.

def text_to_speech(text):

# uncomment ESPEAK or FESTIVAL command and system call

#ESPEAK = 'espeak -w text.wav -s120 "%s"'

#os.system(ESPEAK % (text))

FESTIVAL = 'echo %s | text2wave -o text.wav -F 44100 -scale 2.0'

os.system(FESTIVAL % (text))

win = wave.open('text.wav')

# modify the wave file to add a short silence

# before the start and at the end

wout = wave.open(scene_id + 'text.wav','w')

# create the wave file with same parameters in the input file




# half a second of silence

silence_frames = win.getframerate()/2

# mono 16bit sound frames

silence_data = silence_frames*'\x00\x00'


data = win.readframes(win.getnframes())



# divide the number of frames by frame rate

duration = float(wout.getnframes())/wout.getframerate()



return duration

The code to get a wave file of the speech is a mere two lines. You can use the espeak command or the text2wave command from the festival package. The latter's voice quality is better. (I needed the frequency of the wave file to be 44100 for the sound for various scenes to be synchronised after conversion to mp3 audio.)

You can use the wave module to improve the presentation by inserting short silences at the start and the end of the wave file. This makes the presentation more natural.

A Little Animation to End

On the final image, a red square moves from left to right with the text 'The' written on it. You also create a green circle which moves from right to left with the text 'End' written on it. The two merge at the centre. The image is frozen for a second. You add the logout sound of the desktop to the scene. As it is the final image, you can ignore differences in the duration of the sound file and the video.

def animated_scene(bg_image):

"""A square with 'the' and a circle with 'end'

float across a background image from opposite sides

and merge.


duration = 2

nframes = 25*duration

box_size = (100,100)

x_step = (640 - 100)/(2*nframes)

# create a red square 100x100

im_square = Image.new('RGB', box_size)

draw_s = ImageDraw.Draw(im_square)

draw_s.rectangle([(0,0), box_size], fill='RED')


# create a green circle with diameter 100

im_circle = Image.new('RGB',box_size)

draw_c = ImageDraw.Draw(im_circle)

draw_c.ellipse([(0,0),box_size], fill='GREEN')


# create a mask to show only the (green) circle

r,mask,b = im_circle.split()

# create the frames

scene_id = '99'

x_s = 0

x_c = 640 - 100

for frame_no in range(nframes - 1):

image = bg_image.copy()

image.paste(im_square, (x_s,200))

image.paste(im_circle, (x_c,200), mask)

x_s += x_step

x_c -= x_step

image.save(scene_id + "%03d"%frame_no +".jpg")

# freeze the final frame for a second

for frame_no in range(nframes, nframes + 25):

image = bg_image.copy()

image.paste(im_square, (270,200))

image.paste(im_circle, (270,200), mask)

image.save(scene_id + "%03d"%frame_no +".jpg")

# convert the frames into a scene. Use the logout sound

os.system('mencoder -audiofile /usr/share/sounds/logout.wav \

-oac mp3lame "mf://' + scene_id +'*.jpg" -mf fps=25 -o out_' \

+ scene_id + '.avi -ovc lavc -lavcopts vcodec=mpeg4')

Even a trivial animation takes some code. The core concept is that you create the foreground images which will appear to move. Make a copy of the original image, which will serve as the background. Paste the foreground images at a new location and save the resulting image as a frame.

In our restless world, time is at a premium. So, go ahead and create 30 second just-in-time tutorials for your application and it is sure to be a hit.

<Prev>  <Index>