SuperMondays: Databases

Tuesday, September 29, 2009 7:53 | Filed in Local Interest, Technology

Super Mondays

Super Mondays is a “strong and vibrant IT community based in the North East Of England”. At least, that’s what it says on the tin. But having not been to a Super Mondays event before, I didn’t know that myself for sure. Which was one of the reasons (the others being networking, and an interest in the subject) that I decided to go along to the Super Mondays event on the 28th Sept about databases.

I had been booked to attend one of these things before, but had to cancel quite late on, and so with due trepidation I set off to make my way to the Bedson Teaching Centre at Newcastle University for the SuperMondays event on databases. Unlike the last two events I attended in Newcastle, I neither had any problems finding the location, nor did I discover at the last minute that the location had changed, and so I actually managed to get there without difficulty and soon found myself in a lecture theatre with around 60 or so other people, and coffee and sandwiches.

Eschewing (rather than chewing) the sarnies, I plumped for a nice cup of coffee and was soon sat down at one of the lecture tables waiting for the thing to start. Looking around the room around me, I saw I wide variety of people, ages, and professional appearance: from men in suits to men in hats, probably-still-in-college to approaching retirement, and of course the usual disparity of the sexes, with only about 10% women.

And I couldn’t help but think to myself: I wonder who these people are. What do they do? Since I’m now self-employed, are we looking at people who are games designers, entrepreneurs, investors, or (shudder) potential competitors?

Another thing I didn’t know about these events is how technical they are. Would I be presented with some form of high-level overview, or were people quickly going to get into the technical details of the different forms of databases? And what were other people expecting — and hoping — to get out of the events?

One thing I did find surprising was that when the event started, I was the only one with a laptop out. Was no-one else planning to blog about this? Were they all iPhoned up? Or did they just not bring their gadgets along to this sort of thing. Call themselves techies…

Anyway, ’twasn’t long before the event proper kicked off… as usual for this kind of thing, this is my impressions of what was said, with a few random thoughts of my own thrown in. It’s always possible I may have misunderstood what someone has said…! You may also want to read Daniel Swan’s take on it… or Steven Woods’, who reveals that there was in fact at least one more laptop…

A History Of Databases — Ross Cooney

Where did databases come from? Why are new databases needed and what is the theory behind them?

  • In the 60s, databases were huge and monolithic, only used by government and large corporations
  • The 70s saw the development of much database theory
  • The 80s saw the development of SQL, and stuff such as Dbase III and IV
  • The early 90s saw more complex systems with less competition
  • The late 90s saw the .com boom, with massive investment and stuff like .asp, .cfm and .php launched
  • The 00s (post boom) saw investment down but data use up: now pretty much sewn up by 4 providers — IBM, Oracle, Microsoft and mySQL

Then there was some talk about the idea of the ACID database which is generally what most people think about when they think about databases: it is Atomic (something is either written or not: there is no ‘nearly’ state); data is Consistent; data is Isolated and it is Durable (once written, it can be re-read multiple times unless removed by a later query).

The idea is that this type of database is solid and reliable — they were written for banks and governments.

What it doesn’t do as well is online transactions on a massive scale — it is simply not scalable enough. This brought us onto the CAP Theorem which basically states that you can only have any two of these:

  • strong consistency
  • high availability
  • partition tolerance

Coping without high availability is not a problem for banks (indeed, they frequently have overnight downtime); but it is a problem for high volume global applications: the likes of eBay, Facebook, Amazon and Google. It can be solved in different ways — apparently Facebook use a combination of almost 2000 mySQL databases to run — but in particular for those where database consistency is less important, you have BASE databases.

In one of these, uptime and scalability are the important factors, but database consistency not so much… they have eventual consistency

Having said that, the actual degree of scalability is important: for example, with the right hardware infrastructure, mySQL and other enterprise systems can handle very large quantities of queries (> 100,000 day). But for the big big scale stuff (Amazon, Google etc), it ain’t good enough. What BASE databases are for is “hugely scalable but pretty much disposable data”.

And then there was a statement which seemed to apply…

If you are a startup…

Yes.

…and have plans of taking over the world…

Yes.

…and plans of being the next google…

Bugger off! You think I’m wasting my time answering people’s questions? No, they can damn well do what I tell ‘em, and like it. I was more planning to go for the hollowed-out-volcano-in-the-Pacific-with-death-ray-missiles route to world domination…

But maybe I’ll wait and see what the next speaker has to say first

Amazon SimpleDB and the Google App Engine — David Lavery

This looked as though it was going to start off with a minor technical hitch: it appeared that the projector was only willing to show the left-hand half of each slide, but fortunately this was resolved before we started. David started off by saying he was going to tell us how he got there. Well, I dunno about him, but I got the bus.

But then he brought up a screen shot for his next slide, and it was all I could do not to applaud. It was a screenshot of some COBOL code, showing the Identification Division down through to the Working Storage Section. For those of you who had their first introduction to computing for business with COBOL, you’ll probably be feeling a warm glow of sentimentality at this point. And then if you think how clunky the language was, possibly nausea.

But it would appear that David’s route through databases was much the same as mine: starting from COBOL keyed files, through CODASYL databases (in my time at Northern Rock, we invariably used IDMS), with all the data definitions and Bachman diagrams
Interestingly enough, David’s route through databases seems to be much the same as mine: start with COBOL files then CODASYL databases (or as we knoew them in my Northern Rock days IDMS), with all the DDL stuff and so on. Bachman diagrams. But still, we ain’t seen nothin’ yet.

From there to relational databases — the use of SQL with simpler concepts to the whole CODASYL type stuff (foreign keys instead of child and parent records). From here to open source databases, with web friendly, open licencing — mySQL, PostgreSQL and so on.

And that’s where we are. But they are still relatively monolithic so cause backup/continuity issues, escalating costs, difficult schema changes and so on when you try to scale up significantly. Which is why some people are looking to use other options.

So what’s attractive about Amazon Simple DB and Google App engine?
  • Schemaless
  • Simplified feature sets
  • Some mimicry of RDBMS — sql type queries
  • distribution of data
  • automatic indexing
  • scalable
  • pay per use with free quotas
  • high availability

Google supports a full stack, partially-ACID system, with a few different data types, and with APIs for Python, Java, Ruby and stuff. Data may be stored anywhere. In comparison, SimpleDB offers European storage as a possibility (worth noting from the point of view of data protection?), is non-ACID (offering eventual data consistency, supports anything with an HTTP type call, and everything is string data.

And this then took us to the final database piece for the evening…

RAQUEL Database Management System – David Livingstone

I don’t know whether David has a Phd, but it seems reasonable that given his role at Northumbria University for him to do so, so it’s Dr. Livingstone, I presume. You know, someone who turns up and gives some of their time to telling the rest of us about their database system deserves so much better than that, don’t they?

RAQUEL is an open source project, so follows the standard plan:

  1. produce an interesting (not necessarily fully working or bug-free) prototype
  2. publish prototype
  3. grow user-developer group to evolve into commercially successful project
  4. set up company to support users of the project (software still open source however)

David explained that they had finished the second stage, and were just moving onto the third: they are actively seeking contributors to the user-developer group. Of course, like no doubt many people in the room, I’m thinking what’s in this for me? Why should I join this developer group — go ahead and convince me that skills in RAQUEL will prove to be more commercially (or ‘out of personal interesty’) useful than standard SQL type stuff.

So then, David moved on to telling us what RAQUEL was all about.

  • It’s a database programming language based on relational algebra
  • It sticks strictly to relational theory (in a way SQL doesn’t always
  • It is a database management system, composed from lego-like building blocks
  • It is intended to be far more powerful and simpler than SQL to develop greater DBMS functions and greater programmer productivity
  • It is intended to have a flexible architecture, so you can tune the deployment to your needs, and it should have reduced DB maintenance
  • All well and good, in theory. I was thinking that statements such as “far more powerful and simpler than SQL” seemed to smack a little of arrogance, when you’re looking at something with a massive community, which has developed over time and has some massive companies involved in it, and here’s Northumbria Uni saying “we can do better”. Mind you, I’m not saying they can’t do better, merely that it would be quite some piece of work to convince people of this…

    RAQUEL has only three things: operators, assignments and relations. David showed us some sample pieces of code.

    Insert:
    Parts < --Insert {some value}

    Retrieve
    s < -- Retrieve Parts Restrict[PType=1] Project[PName]

    RAQUEL code examples

    At this point I briefly looked around and noted that there were still no other laptops on show. Nor does anyone appear to be tweeting regularly. Tcch. You may be techies, but your geek quotient is obviously way down. There did not even appear to be anyone dressed as a Vulcan (note, while not a trekkie, I still qualify for full-on geek points as I am a fan of a different science fiction programme instead). Although to be fair, Ross does seem to at least be recording the event on some little gadgetty thing precariously balanced on the front desk.

    David is obviously quite inspired by his RAQUEL tool, telling us various things which it offers — different join types and so on. Unfortunately, I found myself unable to get similarly inspired about this: yes, I know they’ve gone to great trouble to develop it, it can’t be easy to develop something like this from scratch, and I’m sure they’ve done a fine job, but I can’t really get excited finding out that they have developed a feature which I would expect as standard in pretty much any database system I wanted to use.

    But just because I’m not enthused doesn’t mean other people might not be. They are looking for:

  • Users with interesting data storage problems
  • People who would like to test the prototype
  • Developers with C++ skillz for new modules, or to review, test and debug existing ones
  • Experience of opensource/source forge

Again, forgive my cynicism, but this is sounding to me like “come and develop our application for us, so we can set up a company to make money out of it when it is finished”. I’m sure that wasn’t the intention, particularly since David seems to be having a focus very much on the “strictly relational theory” bit to the extent that it sounds at times more like an academic exercise than an attempt to develop a commercial product.

There were also a few questions from the audience.

Have you implemented an equivalent to LIKE for where-type clauses?

No, but there’s no reason why this can’t be implemented later: it is just a small prototype at present.

What’s the motivation? Why build an entirely new system from scratch instead of layering it on top of some existing system?

[Can't remember the exact answer, but it was along the lines of 'pure relational theory is much more powerful than SQL']

That’s as maybe, but when you consider that everyone already uses SQL, you’re going to have a significant amount of intertia: the project is going to have to be pretty kick-ass tto encourage people to use it instead. And it’s not yet making me sit up and take notice.

Do you see this more as an academic exercise or something that would be used in the real world? [I had been wondering the same thing myself]

It is seen as something to be used commerically at least in the long run: the language is simpler and more powerful, and it is also more modular.

And that was about it for databases. The final topic was to tell us about SuperMondays itself: where has it been, and where is it going?

Supermondays Now And Next

This was the 12th SuperMondays event in 12 months: it is growing and becoming more popular. There are challenges: how to keep it fresh; how to keep feeding the event, but it’s important to reference everything back to the original goals:

  • to provide a meeting place for techies to meet up, come together and learn
  • to act as an advocate for open source software

[that second point is interesting: I'm quite happy to use open source, and I will indeed recommend some open source products, but this is on a product-by-product basis, rather than assuming open source is inherently better than closed-source products -- so it's worth noting that there is likely to be an open source bias at these events]

Alex and Ross took the time to thank people who had contributed over time: speakers, who had given up their time, sponsors, who had provided sandwiches, Newcastle University for providing the venues, and of course the people for turning up (as the events wouldn’t have worked quite so well if no-one turned up).

The plans for the future are to:

  • run more events around the region
  • run different events
  • run larger events
  • get national and international speakers

Obviously, this sort of plan requires resources in terms of events management (which is why my old mates at Sailor Girl are now involved), marketing, PR and so on — it all needs money. Sponsorship requires a corporate entity (much as Ross and Alex would presumably be quite happy for companies simply to hand them a pile of cash), and structure and direction are required for stability and continuity.

So it’s been set up as a not-for-profit Community Interest Company (CIC). It’s a limited company, specifically for community interest and financially has ‘asset lock’ (i.e. monies can only be used for the community interest, not to provide beer for the board). There are currently two directors and four further board members but at present there does not appear to be any representation from the public sector in the North East, which is a bit of a downer — yes, there may be a vibrant IT community with entrepreneurs and games designers in the private sector, but there are quite a lot of people with quite a lot of skills in the public sector and it’s a shame to leave this untapped.

Having said that, this doesn’t appear to be the fault of Super Mondays: there did not seem to be anyone attending the event directly from the public sector. Why not? Everyone else has given up their time to come — I can’t believe that no-one from the public sector would be willing to come, so that would rather suggest that maybe it’s just not as well known there.

SuperMondays are going to be running an open book on the finances. I did have a mental image that they would be offering odds on how well the company’s finances were, but it would appear that instead they are simply going to be putting the account finances online at some point.

The costs to host the event are currently quite low (around £300 for the grub) as Newcastle University were not charging for the venue. Other events at more commercial venues would obviously cost more…

SuperMondays are held on the last Monday of every month. Do keep your eyes and ears open.

[Note to SuperMondays people: I have hotlinked to your image. If there is any problem with this, just let me know and I'll remove it]

You can leave a response, or trackback from your own site.

8 Comments to SuperMondays: Databases

  1. Ross Cooney says:

    September 29th, 2009 at 10:08 am

    Hi Jack,
    Thanks for coming along and writing this detailed review of the event, would you mind if I used some of the content in this post on the supermondays.org website? I will obviously reference you as the author.
    Ross Cooney

  2. Steve Woods says:

    September 29th, 2009 at 11:14 am

    I too expected a zillion laptops and thought I was the only one who’d brought one. I was fully kitted out with a PDA, eePC, digital camera … I did walk faster than usual back to the car park I have to say :)

    Excellent overview – well written :)

  3. Zack says:

    September 29th, 2009 at 5:31 pm

    Jack, well written summary – I feel like I really have the flavour of discussion even though I wasn’t there. Sounds like a really interesting event with plenty of real-world discussion – data model meets data centre is always a fun time and always the hotspot of a delivery, whether greenfield or upgrade. This is generally because the two are considered in isolation until far too late in the cycle, although this is changing (I like to think in part because I’ve done so much shouting about it over the years.)

    Because I wasn’t there, I’ll have to interject here instead…

    “Coping without high availability is not a problem for banks”

    Untrue as we approach the Teenies (if this is the Noughties…) for two reasons.

    Firstly, there is an increasing demand for day+1 or day+0 data, and eliminating the batch window altogether is an aspiration for many financial organisations.

    (Or at least it was, before the focus changed to hoarding cash – the perception is unfortunately changing back to seeing “IT” as a pure cost rather than an opportunity to manage other costs and generate value. A cynic might say this was due to blind panic in the boardroom.)

    Secondly, there are two issues of scale. Moore’s law brings down cost-per-TB and brings up general capability, but in data warehousing there is a strong tendency for Parkinson’s law to work in the reverse direction, and so the batch window stands still like the Red Queen. We still have 2-4 hours in most organisations and rather than the time shrinking, the window is treated as the constraint and there is a desire to do as much data movement as possible in the time.

    The other issue IS one of good old fashioned “backup.” If you have 20TB of data to back up every night, how the hell do you do it?

    My answer is to divine what you actually need to back up because it does not exist elsewhere, and which other portions could be recreated from the source within the RTO. HSM and other creative disk-based solutions can also be a partial answer, but banking-proof bandwidth between sites remains expensive. Never underestimate the bandwidth of a lorry full of tapes; but the scale of warehouses is outpacing the growth of tape technology.

    So the concept of a backup window remains with us, because divining what NEEDS to be protected at the raw data level, and then backing up only changes, is still in the “too hard” pile for many organisations.

    So – the RAQUEL model and similar ideas, such as Kognitio’s technology, are fantastic opportunities, but the operational implications are complex to get right and if we are not careful will kill the concept altogether for the corporate world. Google and Amazon have demonstrated that you can do it right, and in the case of the latter, Amazon have got scalability so right they sell it to other people now as a technology company, rather than as a purveyor of books and various other FMCG.

    Many banks, on the other hand, have demonstrated how wrong you can get it and have spent an awful lot of money with IBM, Oracle, Teradata, and co for very little real gain.

  4. Candice says:

    September 30th, 2009 at 2:36 am

    I think this article made some interesting points, I read a textbook directly related to this topic, its called Introduction to Languages and the Theory of Computation by Martin, John

  5. Super Mondays » An introduction to databases at the Bedson says:

    November 30th, 2009 at 1:52 pm

    [...] [...]

  6. test says:

    September 20th, 2011 at 5:09 am

    Blogging About Things…

    …When you have knowledge, skills and experience these are are crucial to make you successful in any area of life.[...]…

  7. buy Social Bookmarking Submission says:

    October 29th, 2012 at 4:32 am

    I am a google and search on many topics. By searching i found this nice website.

Leave a comment