7 : Programming in Python for Mobile Gadgets using the Web

The web can be viewed from two perspectives – making information available on the net and consuming the information available. In this article, let us look at the latter.

Everything on the web is expected to be accessed through the browser. If you are restricted to the screen size of a smart phone, browsing is not all that much fun. Most of the web pages are not designed for the small screen. Navigating for what you need is hard. Hence, little applications which you can use to extract and display just what you want can be very useful. Applications like the stock ticker are available for many stock exchanges. However, you may not find a little applet for your needs. For example, you may have invested in several several schemes from several mutual funds and may wish know the net asset value of each in a simple table. So, you should be able to develop one.

Probably, the first smart phone to offer Python for developing applications on the smart phone was the Nokia S60 family, http://wiki.opensource.nokia.com/projects/Python_for_S60. The site http://www.maemo.org provides the open source platform and tools for the Nokia 770/800 tablets.

Openmoko does not come with Python interpreter by default but can be customised to include it, http://wiki.openmoko.org/wiki/Application_Development_Crash_Course .

Once the interpreter is available with the required modules, you just need to copy your python source on the phone/device and run it.

Getting Started

If you need to extract information from the web, the first thing your application will need to do is to access a web page. After a little research, you will find the urllib2 is the appropriate module. The method you need is urlopen. After establishing the connection to a website, you will want to read the page. So, start python and try the following code:

>>> import urllib2

>>> dir(urllib2)

>>> lfy_home=urllib2.urlopen("http://www.lfymag.com")

>>> dir(lfy_home)

>>> lfy_home.read()

The above code is equivalent to looking at the page source after going to “http://www.lfymag.com”. You need to parse the page source so that you can extract only what you need.

The obvious option is htmllib. But htmllib uses sgmllib. If you are not really interested in formatting the page or following links, then sgmllib is the easier option. It has a test feature also.

Do the following. Save a page in which you are interested, e.g. http://google.co.in, one of the simplest pages on the web. You can get started with understanding the structure and content of the page by trying the following:

$ python /usr/lib/python2.5/sgmllib.py Google.html

Replace lib by lib64 on x64 os and the appropriate python home directory in case it is not python2.5. You will see a lot of output!

Extracting What You Want

Your first job is to find the tag and the data in which you are interested. Then find a suitable pattern so that you can select it using a program.

In all probability, you are most likely to want information from a financial or sports site. But let us take a simple example. You love movies and would prefer to decide your evening plans after knowing the films on television. So, you can write an application to extract just the film name and the starting time from the web page.

Go to the URL of a channel's current schedule http://www.utvworldmovies.com/WeeklyListing.php and save this page as WeeklyListing.html. The local page will help you understand the content and the fields you need – the name and the time of the show.

Use an html editor, e.g. Quanta, to examine WeeklyListing.html. Combine that by looking at the page and the output of the test mode of sgmllib to identify what you need. Usually, the data you are interested in is in an html table and td tags, possibly enclosed in a div tag. In this case, the div tag with id list0 contains the schedule for the current day.

You are now ready to write your code. The nice thing is that all your development can be done on the desktop and then moved to the device. You can do some testing using the device image and running it on the desktop using Qemu. Write the following code in film_schedule.py:

from sgmllib import SGMLParser



class selector(SGMLParser):

def reset(self):

SGMLParser.reset(self)

self.wanted = False

def start_div(self, attrs):

if ('id', 'list0') in attrs:

print "Found the div"

self.wanted = True

def end_div(self):

if self.wanted:

print 'End of div'

self.wanted = False

def page_test(html_page):

f = open(html_page)

parser = selector()

parser.feed(f.read())

parser.close()

SGML parser initially calls the reset method. If there is a method start_tagname, it will call that method at the start of a tag named tagname. The parameters in the tag are passed as a list of name & value pairs. You will need to look at other tags once we are in the desired block. So, use a flag self.wanted. Set it to true once the desired div starts and reset it to false once the end of that tag is reached.

While testing, you may feed the parser the saved html file. Later you will call the actual web page using urlopen. Now you can try this code as follows:

>>> from film_schedule import *

>>> page_test('WeeklyListing.html')

Found the div

End of div

>>>

So, there is only one occurrence of the div in which you are interested. The film name and time are the data in td tags with class listcontent01. So, you will need to handle td tags but only within the desired div. Each row can be identified by the tr tag. Further, you will need to capture the data by in a method handle_data. So, your code in film_schedule.py should look like:

from sgmllib import SGMLParser

class selector(SGMLParser):

def reset(self):

SGMLParser.reset(self)

self.wanted = False

self.pick_data = False

self.films = []

def start_div(self, attrs):

if ('id', 'list0') in attrs:

self.wanted = True

def end_div(self):

if self.wanted:

self.wanted = False

def start_tr(self,attr):

if self.wanted:

self.film = []

def end_tr(self):

if self.wanted and self.film:

self.films.append(self.film)

def start_td(self, attrs):

if self.wanted:

if ('class','listcontent01') in attrs:

self.pick_data = True

def handle_data(self, data):

if self.pick_data:

self.film.append(data)

self.pick_data = False

def page_test(html_page):

f = open(html_page)

parser = selector()

parser.feed(f.read())

parser.close()

return parser.films

Handle_data is a method which will for processing the data between the tags. Now, run the following code:

>>> from film_schedule import *

>>> for film in page_test('WeeklyListing.html'):

... print film

...

['8:15 am', 'Three Colours Red']

['10:30 am', 'The Triangle 1']

['12:30 pm', 'The Triangle 2']

['2:15 pm', 'The Triangle 3']

['5:45 pm', 'Sophia Loren\xe2\x80\x99s Birthday: Boccaccio 70']

['8:30 pm', 'Sophia Loren\xe2\x80\x99s Birthday: A Special Day ']

['11:00 pm', 'Liven Up Nights: My Girl']

>>>

The desired data is now very compact.

Working with the Web Data

You will now want to read directly from the web. So, add the following method in film_schedule.py:

import urllib2

def get_films(url):

page = urllib2.urlopen(url)

parser = selector()

parser.feed(page.read())

parser.close()

return parser.films

Now, run the program:

>>> from film_schedule import *

>>> for film in get_films('http://www.utvworldmovies.com/WeeklyListing.php'):

... print film

...

['8:30 am', 'Animation Attack: Rock-A-Doodle']

['10:15 am (World Movies Platinum Collection)', 'World Movies Platinum Collection: Leon']

['12:45 pm', '50 Movies To See Before You Die- Mahesh Bhatt\xe2\x80\x99s Choice: The Great Dictator']

['4:00 pm', 'World Movies for World Peace: The Great Land Of Small']

['6:00 pm', "World Movies for World Peace: Winky's Horse"]

['8:30 pm', 'World Movies for World Peace: Viva Cuba']

['11:00 pm', 'World Movies for World Peace: Iberia']

>>>

The results differ because the test file was saved on an earlier day.

Is this a perfect solution? Of course not. The site may change the page logic and your program will stop working. However, a little programming effort is worth it if you are browsing on a small screen with a slow connection. If a site offers an api to access some data, that would be the better option.

You can display the results using gui options available on the specific mobile environment. It is important to realise that conceptually, it is no different from programming on the desktop, except that the screen real estate is a serious constraint.


<Prev>  <Next>

Comments