Noise generation for fun and profit!: Unit Testing Databases

Ok, so maybe in a perfect world databases would simply be for storing data. In the real world, however, tons and tons of business logic has been shoehorned into stored procedures, views and functions.

So how should the pragmatic coder approach properly testing these guys?

The right answer lies in basic TDD methodology. Define the functional requirements for the procedure. Write a test for the procedure, then write the procedure itself. Part of the functional requirement should be a listing of all external objects (tables, views, other procedures) that the object uses. If the object suddenly, say, joins to another table which is not in that list - the test should fail, as the code does not meet the functional design any longer.

To enforce this, the best approach is to create a sandbox database, populate it with only the objects the stored procedure depends on - mocking those objects if need be - and running the stored procedure.

To illustrate: let's say we have a stored procedure "sp_GetPhoneNumber". This procedure takes a last and first name as an argument, and returns a phone number. Now let's imagine the database structure has a table called "contact_info", which contains columns with lastname, firstname, and phone number. Lets further imagine this contact_info table has a foreign key constraint on another table, called customers.

We then say the functional design of "sp_GetPhoneNumber" will be something like "selects a record in contact_info by matching on the columns "lastname" and "firstname", and returns the data in the column "phone_number". From this simple design, we see there is only one external reference needed - "contact_info".

Now, if we approach this as testing against a full "live" database, it means we would need to populate "customer". Maybe customer has a foreign key constraint on "account", we need to populate it, and so on, and so on. If the schema of "account" is changed now, the unit test will fail - even though we don't give a flying fig about it. If somebody comes along and dinks with the stored procedure, that references some other table - the test will pass, even though it no longer meets its design requirements.

Now, on top of that, lets imagine this database we're testing against is pre-populated with helper "baseline" data. Now we're really only testing in a perfect world scenario. What if there are no records in the contact_info table? It doesn't get tested. In this scenario, we end up testing the "baseline test data population" scripts, not so much the code itself. So you basically just pat yourself on the back knowing that your code works under ideal, and static, conditions. Hoorah.

A better approach for the above would to be to create a scratch database, create only the object "customer_info", without any foreign constraints. We could create it as a view which just spits out constant values, the stored proc doesn't care - it only wants a recordset returning object with the expected interface (in this case, the interface is the column names, and data types). Changes to unrelated schema don't affect the test, and the introduction of new objects to the stored procedure break the test.

What if this proc inserts into a table with a trigger on it, you ask? Well, the trigger is a separate unit of code, and should be tested separately.

It boils down to a philosophy of a test only being able to access the data and dependencies it needs.

One could take the "mocked objects" approach to the next level, and set up a expected/actual type of situation. The table could be mocked by a view, which upon execution, records in some metadata table somewhere when it was called. If the stored proc calls other functions or stored procs, they could be mocked to maybe return a constant, and log how and when they were called. They could be implemented entirely, with the logging code tacked on. The point is, these call-out stored procs aren't under test.

You don't run the entire application from it's entry point, sending keystrokes and mouse clicks, to unit test a particular function call in a particular assembly - you mock the objects that assembly uses, set up your expectations, and compare them to the results of the call. Why would you treat business logic on the database any differently?

As a side rant, people tend to look at failing unit tests as a bad thing. They're supposed to fail, they should be as fragile as you can possibly make them. If anything changes to affect the functional design of the code under test, it should break. If the metric managers use to gauge unit testing is "number of passing tests written that always pass no matter what happens", they really end up with a bunch of clever permutations of "Assert.IsTrue(MyFartsDontStink)" In a project of any complexity, it should be nigh impossible to hack around shooting from the hip without breaking a bunch of unit tests.

At least that's how I roll.

Nintendo news: I don't really have any, I've been screwing around with the MAME code wrt reading the instruction EPROM on the playchoice 10 board, hopefully I'll be able to program my own instruction screen and game title soon, so it will show up as "NES Cart" or something on the menu. I'm comitted to making it work without any changes to the board roms, or BIOS, if it's fair to call it that (it's not). I haven't abandoned the project, so be patient PC10 fans.

Noise generation for fun and profit!

Wednesday, March 26, 2008

Unit Testing Databases

1 comment:

Blog Archive