Robert on Code: Lesson 1 – Don't put structured data in a database in an unstructured way

Greetings Geeklings

As a solitary programmer working from my suburban home, I generally get to avoid the idiocy that is 'other people'.

Alas, from time to time the foolish behaviour and general incompetence of others manages to reach into my domain and entirely screw up my day. Sit back in your chair and let me tell you the story of one of these invasions.

For some reason I had the misfortune to need to venture out from my office (read home) and go into THE OFFICE (read salary payer HQ). There was some kind of 'vital meeting' or some other such nonsense. I happened to overhear some colleagues discussing a performance issue they had encountered on a client site. Being nosey and helpful in equal measure I offered the following snap diagnosis – 'configuration issue'. I then cemented my fate by offering to go to said client site and make the necessary configuration amendments.

On site it quickly became apparent that this was no configuration issue. Damn.

I spent a whole day on site trying one thing, then another, then ANOTHER until I wanted to die just so SOMETHING happened.

Eventually I had enough 'information' to take home and replicate the issue and hopefully sort this bastard out.

Half a day of solid hand ringing and desk bashing later I nailed the issue.

So what was it? I hear you scream with unparalleled curiosity and desperation. Well, as they say on Facebook 'it's complicated'.

Deep breath.

Here goes.

First up, suspect A, the SQL. What was the SQL doing all that time? Well it turned out it was executing a criminally inefficient LOOKUP. Bad SQL. Go to your room.

Moving on to suspect B, the Data Structure. What kind of higgledy piggledy system were we dealing with? Is it really just cobbled together from old boots and string? Well, in a way. All the data we needed for this LOOKUP was held in a string. No, you read that right. A string. A bloody string. No wonder the SQL was struggling.

So I've diagnosed the issue. That leaves me with a new problem; fixing it.

So how do you fix something so monumentally irritating? Well what you would ideally do is re-structure the data and re-write the query. Pretty time consuming, but not too tricky.

Unfortunately we do not live in an ideal world and this is not an ideal situation. This application is not mine to fiddle about with no matter how much I want (need) to. This application is owned by another company so really, they need to fix it. Obviously they won't. I don't need to get into the politics here, or mention how great I am, but they are finding it all a bit taxing. Bless.

So instead we have the following situation. Our client needs this performance issue resolved. It is causing major headaches. The most obvious solution is impossible, so instead we have to get a little bit more creative with our solutions.

First up on the ideas list – database engine tuning. Is there a way of making that query fly faster? Answer – no. Fiddlesticks

Adding an index, of any kind, failed. Indexes and clustered indexes both made no difference because the query used a wildcard at the start. This forces the database engine to read the full string of each record not already ruled out by a different clause. This is as slow as it sounds.

A full text index isn't supported by the 'legacy' database engine they're using, SQL Anywhere (v9). So that's obviously not going to work either. So no indexes.

Due to my lack of experience with this database engine I couldn't think of any other feasible optimisation method. So, what to do?

It seemed like fiddling with the application was the only possible solution. To keep the fiddling to a minimum I looked into creating functionally equivalent SQL to replace ‘Suspect A’.

The database engine had to give me some clues so unfortunately some dull research was required. The documentation eventually gave me the idea to replace the LIKE statement with the LOCATE function. This allowed me to determine if a value existed in the string. This magic wand gave me 3x better performance. Great you say. NOT GOOD ENOUGH says I.

This battle is not won yet.

I tried to create a VIEW in the hope that the database engine would be able to cache the results and ultimately improve the speed of the query. The VIEW worked like this: break the string down so that each key in the string represents a field and place the paired value in its relevant record. What a waste of time. No difference in performance.

This is getting annoying.

I eventually decided to use a series of computed fields, as they would allow me to index the values. I took the code from the VIEW I created earlier and used each field to create a 'computed field version' which I placed in the original table. I subsequently updated the queries to use the new computed fields. This took the query performance to sub 1/10th of a second. Hurrah!

So all finished? Standing ovation time? Well no. It's not deployed yet, and it's going to be a bugger to maintain. So it's not all rainbows and unicorns just yet. The final decision is in the hands of 'the business' as to whether this (pretty classy) fix is handed over the the owners of the application or not. I'll keep you posted.

Robert on Code

24 July 2011

Lesson 1 – Don't put structured data in a database in an unstructured way

No comments:

Post a Comment