Q&A with Louis Davidson
Relational Database Design Workshop
Q: If your pre-con had a theme song, what would it be and why?
A: “The Imagination Song” from the Journey into the Imagination ride at EPCOT. First, because it reminds me that once I get past Summit, the rest of the year is holidays and vacation.
More importantly, though, imagination is a big part of the design process. To really get a great design that will be close to what gets implemented, you need to mentally visualize and simulate the processes that will eventually be put in place. As often as possible, once I have a model that feels right, I write out a script of actions I expect users to take and then mentally store the rows in the tables corresponding to each of those steps. I frequently find small details that were missed because they aren’t really a major part of the design, and I also find entire parts of the process that weren’t well thought out during requirements gathering and initial design.
My alternative song was “Won’t Get Fooled Again,” but mostly for the title, since one of the major themes of the pre-con is to know what a good design looks like so you don’t get fooled as often and can defend your stances.
Q: What excites you most about the topic of relational database design?
A: Generally the best part of “design,” regardless of the type of design you’re talking about, is that you get a modicum of control over what gets created. In the best cases, you are fed well thought-out requirements to design from, and you sit back and think about how to translate requirements into something “tangible.” Then you put that design to the test mentally, first with simulations, followed by the implementers writing code, and ultimately with actual users doing actual business. (For all the world’s a QA lab, and all the users testers.)
The relational database is the backbone of almost any business/accounting-oriented system and will almost certainly remain so for many years to come. The principles of accounting come into play for every company that takes orders, does business, and bills for that business. And the guarantees that a well-designed relational database provides make it easy to implement multi-user data storage that can be constantly read from and changed concurrently, with data remaining in as consistent a state as you planned for.
Some people may fret that relational database technology is on the outs, but I actually think that as time passes, the need for relational systems will continue to grow rapidly – at least somewhat enabled by the up-and-coming data storage paradigms. As people get the taste of what a great database backbone can do for them, they start to desire more and more data.
Consider dimensional design and Big Data as two additional paradigms. Data warehousing has not only allowed us to take the strain of reporting off our OLTP servers (leading to better reporting capabilities), but it has also given us the ability to support larger relational databases capturing more and more business data. Big Data paradigms like Hadoop and NoSQL will alleviate the temptation people have to try to use the relational database in unnatural ways. Extremely large quantities of unstructured data is not exactly relational databases’ strongpoint. And while SQL Server continually gets better at it, the massively parallel capability to deal with data that doesn’t have be continually consistent lets you capture data volumes that would be silly in a relational database.
Q: Where does your workshop take attendees beyond what you might cover in a 60- or 75-minute session?
A: The biggest difference comes in two places: the volume of examples and the participation aspects. I have always struggled to get a decent start in telling people what I want to cover in 75 minutes. One thing that ends up shortchanged is examples and discussing them in detail. In the pre-con, I have examples of normalization that we will discuss, and then we will work through a second big set of examples to discuss multiple ways of implementing uniqueness, hierarchies, images, documents, user-specified schemas, and more.
Most of the slides in the pre-con are “director’s cut” versions from other presentations I have done. The patterns slides are from a session I will have done at SQLSaturday and Devlink conferences this year. I am tightening up the lecture bits from previous editions of the pre-con so that we cover just the most essential material and have plenty of interaction time. I don’t think a discussion of database design can be done without some baseline discussion of theory, but the pre-con will balance the fundamental “why” we do what we do with plenty of straightforward “how” examples.
Most regular sessions are a bit like any lecture. If you get lost early in the session, you may never catch up, even though what I am talking about might be really simple and you only missed 10 minutes to answer a call. In the pre-con, we will spend 40%-50% of the time doing designs as a complete group and/or as teams. Getting a chance to participate can make all the difference in what you take away from a class, particularly as not everyone gets the opportunity to do designs in their day jobs. So we will do whatever participation we can do based on the size of the group and the configuration of the room.
Q: What's the most surprising statement attendees might hear you say during the pre-con?
A: “We’re done already? It feels like we have only been at this for a few minutes.” This will be made even more surprising because I will probably say it after 10 minutes have passed to see if attendees will take the bait. :)
But seriously, as I said in my Q&A for this pre-con at SQLRally Dallas, the most surprising statement might be: "The fourth and fifth normal forms are actually important in every design." If there is a preconceived notion that is the most prevalent in relational databases, it is that anything beyond third normal form is harmful to your performance. Ironically, the reason why third normal form is the most important is that violations of fourth and fifth normal form are semi-rare. So while most designers who take the time to consider third normal form end up getting it right without realizing it, sometimes issues are missed that don't show up until you are writing a complex query and subtle errors start occurring in results.
Q: What’s the biggest myth in database design that you’d like to debunk?
A: Normalization makes database code slower. In almost all cases, the opposite is true. What truly hurts performance is when the design doesn’t match the needs of the users. When the problem could be solved using N tables, and you give them N / 100 tables, they are going to be annoyed with the fragility of your design. On the other hand, if you give them N * 100 tables, the extra flexibility can make doing even simple tasks impossible.
Normalization helps us match the system to the real-world needs of the users. Get normalization and requirements done correctly, and the resulting design will generally run a lot faster than you will initially expect. And this will lead to a lot less tuning than your DBA team will probably expect. So normalization should actually improve performance if done right.
Q: What still trips you up in your adventures with normalization?
A: The biggest thing is matching up my desired design with the user’s needs and the programmer’s ability to produce an acceptable UI to support it. The short answer is that requirements (and usually the lack of consistent requirements) make it nearly impossible to determine what a properly normalized solution is until users are actually using the database, which is the point in time where changes to the database structure are at their most difficult.
The natural tendency of designers is to sit back in their glass houses, make assumptions about what is needed based on their experience and their understanding of other systems, and produce a system that they think will work without really understanding the needs of the users.
By definition, normalization demands that we break down all of the data storage to the most basic bits of data possible. The problem is that the lowest form of the data that one could attain isn’t always what is needed for the application. I commonly use the example of a Microsoft Word document. If your goal is to store a Word document in a database, you would need to decide whether to store the document in one column in a table or place every character and formatting mark in a row of its own. Duh, one column, right? In 99.9% of cases, this is absolutely the case. But in that .1% case, you actually may need to break the data up into smaller units to support an operation that the user needs doable in a very natural manner. For example, if your application counts the number of uses of the individual characters in documents as its primary output, parsing the document once and storing it as one row per letter would be optimal.
Unlikely scenario? Sure. But you have to keep an open mind when doing design so as not to end up with the system you want and miss creating the system that is needed.
Q: If attendees could start putting into practice just one thing after your pre-con, what would you want that to be?
A: What I like to call “design testing.” I mentioned imagination at the start of this interview, and it will really change your way of certifying that the design is ready for coders. It is difficult to do at times, because it will seem like wasted time to a manager demanding that the application programmers get started coding (that manager has his own manager demanding results as well). However, the more you understand the processes you are implementing for, the better the design and the less rework you will have to do later in the process.
This sort of testing will not only help you prove your design makes sense for the requirements, but also that the requirements cover all the needs for the eventual overall system. As you start to think in terms of the overall process, everything really comes together and you can be more certain that your design will work. Doing this with the lead programmers can be a great use of time because they know the limitations of the tools they are working with and can help you shape the data needed for each step in a process.
I don’t want to give the impression that this type of testing takes a tremendously long amount of time either. The time generally is measured in hours, not days, and will be largely based on the sort of iterations your project teams work with. When the project takes years, then days or weeks probably ought to be spent, with lots of people involved storyboarding the process. But in more modern agile processes that iterate over a short number of weeks, it probably will add a few hours of time for your team to make sure you all are on the same page.
These few hours can save you greatly later in the process, when everyone is saying “great job” instead of “what is this mess?” I think this is the kind of result that would entice someone to sit in my class on database design for 7 hours.
Check out our other Q&As with PASS Summit 2012 pre-con speakers.