Web Dynamism w/Python+jQuery - Part 3 April 26th, 2008
This is part 3 of a 3 part series. Check out part 1 and part 2.
The blurb and talk of requirements at the top of part 1 also apply to this article.
The Example
Let's do something more interesting this time. Remember hit counters? Years ago they were the way to measure traffic to your site. These days having a hit counter makes you look... dated, to say the least.
Today we have several options to keep track of site traffic:
- Sign up for Google Analytics. This has the advantage of offering tons of different metrics, but it can be overkill and the JavaScript they stick on your page has a reputation of being a bit slow.
- Check the web server logs yourself. Obviously not ideal because there are much better ways to spend your time than sifting through raw text logs. You could also write a program to get the relevant data out of the logs for you, but it still lacks the 'cool factor'.
- Use whatever your web hosting provider supplies you with. Personally, I find those tools more painful than scrolling through logs.
- Completely ignore your site usage statistics. This one is probably not the best idea.
None of these are perfect solutions. Google Analytics works well enough for this blog and most of the websites I've designed for others, but an option lighter than Google's and heavier than Apache access logs wouldn't hurt. That's what we'll do now.
The Requirements
The requirements for this are the same as the previous articles with one difference and one addition. We'll use SQLite to store our usage information, this will require either Python 2.5 or a separate library. This site is hosted by BlueHost and they only offer Python 2.3, so I had to install PySqlite. As a result, the code is ever so slightly different. I'll be listing the Python 2.5 version since that's really the better way to go, you might not even have the ability to install third party Python libraries so check with your web host for those details.
Also, as you might have guessed, we'll need SQLite itself. BlueHost supplies binaries already, so I didn't have to install it.
The Setup
Since we'll be using a database to store the data we'll need to make sure that one exists. Write a little Python script (which you can run as a CGI if you don't have shell access to the server) that looks like something like this:
You'll notice there's now the file stats.sqlite3
wherever you
ran the script. So if you ran it from the shell go ahead and copy it into
your cgi-bin
(or wherever, really, you can supply a relative
path to the connect
method).
Now you can either delete this script or leave it there, since it will only create the table if it doesn't already exist there's no serious security concern here.
The Python CGI
Now to the real work. There's two approaches here: Write two scripts, one to log visits and one to return results (on our 'reporting' pages) - or you can just stick it all in one script. It's not especially long so I'm just going to put it all in one script.
First thing to do is handle the POST
condition, when we're
logging a visit:
Pretty straight forward. Do some imports of stuff we'll need and connect
to the database. Then make sure we got a page name via the POST
,
and insert a row into the table. We just return a string which is what we
saved to the table, users wont see this because we will simply discard the
result of the call on the client side.
There is a potential problem here, if a user just keeps refreshing the page (or worse, constructs their own page to do the call 1000 times) we could end up with garbage data. We could help deter this kind of abuse by making sure that this IP address doesn't already have a record in the database on this timestamp... a malicious user could still get around it by faking their IP, but it would help.
Now that we've got it logging hits we can actually test it. Create or edit
a page. Include jQuery and the following in the head
:
Hit that page in your browser and you should get an alert box with a page
name, a large number (the timestamp) and your IP address. Success! This data
is actually in the database now, of course we have no way of getting that
information and viewing it, and that's the bulk of the rest of the example.
Remember that when putting this code on a 'real' page make sure to remove
the success:
callback (and the trailing ,
on
the data:
line).
The really useful part of web usage statistics like Google Analytics isn't seeing log files and noting that "Oh, someone went to index.html on the 23rd at 6pm... How interesting!" It's being able to visualize that data over time. You want to be saying things like "Ah! My new content has increased my visits by 15% over the last month!"
The Visualization
Luckily for us, we're using jQuery already, and there's a really nice
plotting library for it called
flot. Go get it, and include
it along with jQuery into a new page called something like
view_stats.html
.
We won't be able to visualize the data unless we actually have the data,
so add the following code to your CGI which will (on a GET
request) return
JSON
describing the data in our database table.
This code is a bit more complicated. First it gets a list of
DISTINCT
timestamps in the database (eg. no duplicates).
Then it uses that to get a count of the number of visits on that
timestamp (timestamps are normalized to dates, and don't include times
as per the call to date.today()
in the previous code).
It appends that data to a list which gets output as JSON to the client.
Now that we can get the data out of the database we can write some
JavaScript on our stats page to get it and use it. Our page should have a
head
that looks like this:
If you care about your visualization page being usable in Internet Explorer
you'll need to fake support for the canvas element using the
excanvas
JavaScript implementation supplied with flot.
So first we have the data variable being set according to the flot API,
json
is the actual JSON object we got back from Python, so
we can refer to it like any JavaScript object. .data
is
the list of data we generated in the CGI. Then we set some options, nothing
too exciting, and do the plot in an element with an id of
plot
... Oops, better create that.
Flot requires a hard coded size for it's containing element, so we make it 500×250.
The Extras
Done! Going to the page should produce a graph of all data collected. Unfortunately since there's only one data point (the one we just generated in the test) the graph will be quite ugly and useless. We can do lots of dynamic things with flot such as constraining (or expanding) the xaxis, changing how it looks and much more. If you want to test it with some 'real data' you could use a Python script along the lines of:
This will insert 1000 datapoints in the table with the timestamp being a random number of days before today (including 0, aka, today) within the past 3 months or so. Now when you view the graph it should look a lot more... like a graph.
The Rest
This sort of thing should be faster than Google Analytics and more useful
than Apache access logs. There are limitations, however. Simultaneous write
hits to the database will cause Python exceptions ('database is locked',
generally), but since this is an asynchronous request it wont cause anything
worse than a JavaScript error. You could reinforce this a bit by putting the
insert into a try...except
or something and either wait and
try again or just discard the hit.
Also, you'll quickly notice that if there were 0 visits for a day, the graph doesn't show a line going to the bottom, it just skips that datapoint and draws a line to the next. The system could easily be modified to accommodate this. The idea would be to skip selecting distinct timestamps from the database and just loop through all dates in the range you care about (say, 3 months) outputting the number of records found with that timestamp (meaning that '0' would be a valid number of records). This would probably be at least a little bit slower. Maybe a better approach would be to add a feature to flot, to optionally treat missing datapoints as 0.
That said, this is supposed to be a simple, lightweight system for small websites (like most of these articles) that just need some painless 'dynamism'. If you need crazy full-fledged reporting on site usage then there's already services like Google Analytics. I think something like this could be useful as a bit of a happy medium.