Web Dynamism w/Python+jQuery - Part 3  April 26th, 2008

This is part 3 of a 3 part series. Check out part 1 and part 2.

The blurb and talk of requirements at the top of part 1 also apply to this article.

The Example

Let's do something more interesting this time. Remember hit counters? Years ago they were the way to measure traffic to your site. These days having a hit counter makes you look... dated, to say the least.

Today we have several options to keep track of site traffic:

None of these are perfect solutions. Google Analytics works well enough for this blog and most of the websites I've designed for others, but an option lighter than Google's and heavier than Apache access logs wouldn't hurt. That's what we'll do now.

The Requirements

The requirements for this are the same as the previous articles with one difference and one addition. We'll use SQLite to store our usage information, this will require either Python 2.5 or a separate library. This site is hosted by BlueHost and they only offer Python 2.3, so I had to install PySqlite. As a result, the code is ever so slightly different. I'll be listing the Python 2.5 version since that's really the better way to go, you might not even have the ability to install third party Python libraries so check with your web host for those details.

Also, as you might have guessed, we'll need SQLite itself. BlueHost supplies binaries already, so I didn't have to install it.

The Setup

Since we'll be using a database to store the data we'll need to make sure that one exists. Write a little Python script (which you can run as a CGI if you don't have shell access to the server) that looks like something like this:

You'll notice there's now the file stats.sqlite3 wherever you ran the script. So if you ran it from the shell go ahead and copy it into your cgi-bin (or wherever, really, you can supply a relative path to the connect method).

Now you can either delete this script or leave it there, since it will only create the table if it doesn't already exist there's no serious security concern here.

The Python CGI

Now to the real work. There's two approaches here: Write two scripts, one to log visits and one to return results (on our 'reporting' pages) - or you can just stick it all in one script. It's not especially long so I'm just going to put it all in one script.

First thing to do is handle the POST condition, when we're logging a visit:

Pretty straight forward. Do some imports of stuff we'll need and connect to the database. Then make sure we got a page name via the POST, and insert a row into the table. We just return a string which is what we saved to the table, users wont see this because we will simply discard the result of the call on the client side.

There is a potential problem here, if a user just keeps refreshing the page (or worse, constructs their own page to do the call 1000 times) we could end up with garbage data. We could help deter this kind of abuse by making sure that this IP address doesn't already have a record in the database on this timestamp... a malicious user could still get around it by faking their IP, but it would help.

Now that we've got it logging hits we can actually test it. Create or edit a page. Include jQuery and the following in the head:

Hit that page in your browser and you should get an alert box with a page name, a large number (the timestamp) and your IP address. Success! This data is actually in the database now, of course we have no way of getting that information and viewing it, and that's the bulk of the rest of the example. Remember that when putting this code on a 'real' page make sure to remove the success: callback (and the trailing , on the data: line).

The really useful part of web usage statistics like Google Analytics isn't seeing log files and noting that "Oh, someone went to index.html on the 23rd at 6pm... How interesting!" It's being able to visualize that data over time. You want to be saying things like "Ah! My new content has increased my visits by 15% over the last month!"

The Visualization

Luckily for us, we're using jQuery already, and there's a really nice plotting library for it called flot. Go get it, and include it along with jQuery into a new page called something like view_stats.html.

We won't be able to visualize the data unless we actually have the data, so add the following code to your CGI which will (on a GET request) return JSON describing the data in our database table.

This code is a bit more complicated. First it gets a list of DISTINCT timestamps in the database (eg. no duplicates). Then it uses that to get a count of the number of visits on that timestamp (timestamps are normalized to dates, and don't include times as per the call to date.today() in the previous code). It appends that data to a list which gets output as JSON to the client.

Now that we can get the data out of the database we can write some JavaScript on our stats page to get it and use it. Our page should have a head that looks like this:

If you care about your visualization page being usable in Internet Explorer you'll need to fake support for the canvas element using the excanvas JavaScript implementation supplied with flot.

So first we have the data variable being set according to the flot API, json is the actual JSON object we got back from Python, so we can refer to it like any JavaScript object. .data is the list of data we generated in the CGI. Then we set some options, nothing too exciting, and do the plot in an element with an id of plot... Oops, better create that.

Flot requires a hard coded size for it's containing element, so we make it 500×250.

The Extras

Done! Going to the page should produce a graph of all data collected. Unfortunately since there's only one data point (the one we just generated in the test) the graph will be quite ugly and useless. We can do lots of dynamic things with flot such as constraining (or expanding) the xaxis, changing how it looks and much more. If you want to test it with some 'real data' you could use a Python script along the lines of:

This will insert 1000 datapoints in the table with the timestamp being a random number of days before today (including 0, aka, today) within the past 3 months or so. Now when you view the graph it should look a lot more... like a graph.

The Rest

This sort of thing should be faster than Google Analytics and more useful than Apache access logs. There are limitations, however. Simultaneous write hits to the database will cause Python exceptions ('database is locked', generally), but since this is an asynchronous request it wont cause anything worse than a JavaScript error. You could reinforce this a bit by putting the insert into a try...except or something and either wait and try again or just discard the hit.

Also, you'll quickly notice that if there were 0 visits for a day, the graph doesn't show a line going to the bottom, it just skips that datapoint and draws a line to the next. The system could easily be modified to accommodate this. The idea would be to skip selecting distinct timestamps from the database and just loop through all dates in the range you care about (say, 3 months) outputting the number of records found with that timestamp (meaning that '0' would be a valid number of records). This would probably be at least a little bit slower. Maybe a better approach would be to add a feature to flot, to optionally treat missing datapoints as 0.

That said, this is supposed to be a simple, lightweight system for small websites (like most of these articles) that just need some painless 'dynamism'. If you need crazy full-fledged reporting on site usage then there's already services like Google Analytics. I think something like this could be useful as a bit of a happy medium.

Some social stuff: