Databases

Databases pose an interesting problem. They are used to store large quanitities of similar data. Take a customer database for example. We want to keep a record of all of our customers’ information.

The first step is to determine what data is available to us. Then we look at how to organize our data into tables. There is a lot of discussion on 3rd normal form in terms of database orgainzation. Generally I agree with this, but there are practical reasons for diverging slightly from that in practice even those it is consistent with the concept. That is the use of ENUM fields instead of mini-tables (also referred to as code tables) when the list of values is relatively small.

In our customer database we can group information in the following ways:

  • data referring to the customer such as:
    • name
    • billing address
    • shipping address
    • phone
    • account number
    • primary contact name
    • account balance
  • data referring to customer transactions such as:
    • transaction number
    • order
    • returns
  • data referring to customer comments
  • data referring to customer notes

This isn't intended as an exhaustive list, but simply one we can use for this example.

Notice anything in this information? Some of the information seems to be similar in format to other information. Shouldn't these be grouped independently? Yes, they should.

So we can start to break this data into logical tables.

  • A customer table:
    • id
    • customer name
    • account balance
  • An address table:
    • id
    • customer id
    • type
    • street
    • city
    • state
    • postal code
  • A contact table:
    • id
    • customer id
    • type
    • name
    • work phone
    • cell phone
    • home phone
  • An order table:
    • id
    • order date
    • customer id
    • ship date
    • address id
  • A product table:
    • id
    • name
    • description
    • price
  • An order detail table:
    • id
    • order id
    • product id
    • quantity

OK, that's enough to work with.

Now you are asking yourself, "Didn't this just get really complicated?" The answer is a qualified "yes." OK, so why did we do it then? Aren't we always striving for simplicity? Of course, but sometimes you need to break things apart to put them together more easily.

Imagine for a moment we built a chair out of a solid piece of wood. If any part were to break it would be difficult at best to fix the chair, and the resulting fix would be obviously different from the original chair. This is true in a database as well. We want maintainability and flexibility over pure simplicity. At the same time, we've made each of the original blocks of data simpler as well.

This type of "deconstruction" of the data is commonly referred to as putting things in "Third Normal Form." In a pure implementation of this approach, the contact table "type" field would be a link to another table which contained all of the valid types. Being more practical, and since there is an SQL field type known as an ENUM, I generally use this instead to document small "code tables" that aren't used more than once.

OK, let's step through one example of retrieving an order for a given customer and the detail records for it. We'll specify the customer id and the order id for this example. It is not my intention to explain the details of SQL, especially since there are plently of resources that can do this quite adequately. I'm not going to explain all the details of connecting to a specific database or how to process queries, since those details are both language and database specific. Instead I will use generic function names in the examples so that those details don't get in the way of the discussion at hand.

Given a customer id of say 237 and an order id of 105, let's construct the first query.

SELECT * FROM customer WHERE id = 237');

This is good, right? Get all the fields from the customer record for this customer. Well, it's OK in this case, since likely we will need all of them, but generally I prefer to list the fields so I am certain of exactly what I'm getting. In addition, the code I use to extract the fields returns them as a keyed array. I do this for two reasons. First, the data is not order dependant, which means that if someone adds a field to this table, my code doesn't break. Secondly, the data is self decriptive, which means that I can return this data up three levels in function calls and the code that receives it can figure out what it has without knowing how the information was obtained in the first place.

Additionally I like short table names in SQL statements so that if I refer to two tables in a query, I can distinguish which field is which easily. That being said, he's is my new and improved query:

SELECT c.name FROM customer as c WHERE c.id = 237');

Great, we have the customer's name. Would be good for an order to have their address as well, so:

SELECT c.name, a.street, a.city, a.state, a.postalcode FROM customer as c, address as a WHERE c.id = 237 AND c.id = a.customer AND a.type = 'street' ");

Hopefully you can see how this provides a fairly clear view of what we are doing.

Now we could do another query to get the order information, by doing:

SELECT o.date, o.ship, a.street, a.city, a.state, a.postalcode FROM customer as c, order as o, address as a WHERE c.id = 237 AND c.id = o.customer AND o.id = 105 AND o.address = a.id AND a.type = 'ship'

But doing that is inefficient. Let's do it in one query:

SELECT c.name, ca.street, ca.city, ca.state, ca.postalcode, o.date, o.ship, sa.street as 'ship-to-street', sa.city as 'ship-to-city', sa.state as 'ship-to-state', sa.postalcode as 'ship-to-postal code' FROM customer as c, address as ca, order as o, address as sa WHERE c.id = 237 AND c.id = ca.customer AND ca.type = 'street' AND c.id = o.customer AND o.id = 105 AND o.address = sa.id AND sa.type = 'ship'

Cool huh?

Although we could write a query that could also capture all the order detail records as well, that becomes very confusing and complex. Not my style, so instead we execute a second query like:

SELECT p.id as 'item-number', p.name, p.price, p.description, od.quantity FROM order as o, product as p, orderdetail as od WHERE o.customer = 237 AND o.id = 105 AND od.order = o.id AND od.product = p.id

There you have it. In two (not terribly complex) queries we have both the customer information and the order detail and could build a nice order detail page. All of the data is self-descriptive so the code doesn't need to understand the database to obtain and use the data.

By cleanly structuring the data, we can obtain whatever information we need in fairly straightforward steps. We could put these two queries into a single function called getOrderDetail($customer, $order) . We need not concern ourselves with how this data is stored where we use this function since it is all returned as self documenting data.