Akismet Spam Stats

Sunday, November 23, 2008 0:28 | Filed in Scams & Spams

Akismet Stats, and where to find 'em (flickr)

I’ve just noticed a new feature on my blog, the Akismet spam stats.

From looking at the Akismet blog, I’ve realised that it’s been on there for about a month before I got round to noticing it, but it has pretty graphs on it.

It shows a breakdown of your comments by spam (unwanted comments); ham (genuine comments); spam missed by Akismet (stuff you manually marked as spam); and genuine comments marked as spam which you later saved.

Akismet claims to have an overall accuracy rate of 99.718% for my blog, which is pretty darn impressive, but it does also claim not to have had any false positive readings this year, when as far as I can recall, there have been at least two comments initially identified as spam which I later rescued (although I guess it’s possible that these were mis-identified by one of my other spam filters, so maybe I shouldn’t blame Akismet).

The sheer numbers of spam caught are quite impressive:

Akismet Stats: Daily Spam (flickr)

Again, I’m not 100% sure of these statistics; they would seem to suggest that I’m averaging somewhere in the region of 100 spams per day (with a drop more recently to around 50/day). This could just about be possible — I’ll have to investigate my spam filters thoroughly next time I unclog ‘em, but it seems like it’s too many.

Akismet Stats: Ham-to-spam ratio (flickr)

It’s also interesting to see the ratio of ham-to-spam (ham, for those uninitiated into geekspeak, is used to represent ‘legitimate messages’, as the sort of an opposite to spam).

However, one of the things that is interesting about this is that it claims my ratio is 9.69% ham and 90.31% spam, approximately what I would have estimated. However, and this is where it gets a little weird, in total it has found 58,223 spam and 2,519 legitimate comments, so the overall ratio is something like 4.1% ham to 95.9% spam.

Fair enough, it says that the pie charts just represent stuff caught since May 2008, so it’s possible that the ratio has changed. Yet if I look at the stats it shows me…

Month Spam Detected Legitimate Comments
November 2008 1,385 89
October 2008 4,179 84
September 2008 4,563 90
August 2008 4,547 64
July 2008 2,604 90
June 2008 2,708 70

Now if you can be bothered to do the maths, this would show you that since the start of June, there have been 20,473 comments posted in total, of which only 473 have been legitimate. My mental arithmetic would give me a figure of “around 2.5% ham” by approximation, and indeed by using a calculator I come out with a figure of 2.37% ham.

So where did the figure of 9.69% come from in the pie chart? Are the akismet stats completely and utterly buggered up if you use any additional spam filters, or are they just buggered up in the first place? Do they maybe just produce some random, pretty looking graphs, and are they secretly waiting to see whether or not anyone has actually noticed that they are incorrect?

Who knows? And who cares, as let’s face it, they do look pretty

You can leave a response, or trackback from your own site.

No comments yet.

Leave a comment