Stored Procedures aren't all that

Now and then I check into The Daily WTF, partly because it’s funny and partly because it makes me feel a little better about some of the crazy code I’ve observed or been asked to maintain.

One in particular from about seven years ago amused me greatly, but not just in the train-wreck sense for a change. It addressed the very serious subject of putting all your database code into stored procedures. The comments are actually pretty illuminating.

Alex Papadimolous, actually being serious, wrote:

Though they take a bit more time to develop upfront, using database stored procedures are definitely the way to go for most information systems. They provide a looser coupling to the “data layer” by limiting the points of entry and offer stronger cohesion by breaking operations into reusable functionality.

I don’t think so, if by “most information systems” you mean the typical multi-tier business and financial databases that most of us maintain in our day jobs. I find that the looser coupling is almost always theoretical; it breaks down when you have to look at code from a year or two ago and figure out what the underlying stored procedure is actually doing instead of what you expect.

For example, I might get a defect report indicating a particular calculation is showing an incorrect result on an ASP.NET page. I’ll go to the page markup and find that calculated value in a C# variable, let’s say. In Visual Studio I right-click on that variable name and pick “Find All References” from the context menu. That almost always gives me enough information immediately to say “Oh, so I must be getting the wrong value in crunkfoo from the GetUserCrunkValues stored procedure!”

Now I’m looking for crunkfoo in the text of the stored procedure and… wait, that means I’m opening up SQL Management Studio and scrolling through a potentially very long list of stored procedures, then displaying the text of that stored procedure in a new tab. And now I’m tracing code that isn’t even really the code I’m working on.

It might not match the version that’s in production. I might be diagnosing what amounts to a database deployment fail. (That happened a lot at one recent site.)

The alternative? Just hit the database with your SELECT command from the client code. That’s right. Almost anything you can do in a stored procedure can be done just as easily in an embedded SQL string. In my experience, probably upwards of 90% of database hits are SELECTs with zero or more JOIN clauses, and not much else. Why not just see what the code is doing instead of creating an extra layer of tight data coupling?

Here are some objections to my approach:

Objection 0: Security! We need to limit points of entry to the database.

This is so bogus. Your application can’t limit points of entry to the database. Only the database can do that. Write all the stored procedures you like; it won’t prevent someone from going around your application with a rogue client.

Objection 1. Our corporate policy says we have to use stored procedures exclusively.

I know. I’m saying that’s a bad policy. Let’s not have that policy.

Objection 2: Stored procedures are pre-compiled with an execution plan, so they’re way more efficient than random SQL statements.

That would be relevant if in fact “random SQL statements” were on the table. But they’re not. What I keep seeing is applications that do the same SELECTs, perhaps with JOINs, over and over again. The parameters change, but the code doesn’t. And modern databases? Guess what, they know about caching and query optimization.

In any case, how much optimization do you really need on something like this?

select id, firstname, lastname, curbal from customer where id = @id

But yes, I see stored procedures like that all the time. How is that even worth maintaining? It’s busy work! Stop doing it.

Finally, since when is your application’s performance bottleneck found in a simple SQL query? Far more often, from what I see, the inefficiency is in big datasets being loaded repeatedly just in case they’re needed. Or in using unindexed columns in WHERE clauses. Or in ASP.NET code that shoves everything into View State. Stored procedures don’t do anything to solve those problems. They just make you feel like you’re optimizing something. Think bigger than that.

Objection 3: Scattering your SQL commands throughout your client code creates an undisciplined, unmaintanable mess.

Huh? Who said anything about scattering?

Yes, it’s good to keep all your database access in a set of classes that constitute a Data Access Layer (DAL). Do that.

That’s not a reason for your database access to use stored procedures exclusively. It’s not the same thing at all.

(Aside from the point of this blog post, it’s also not necessarily a reason for the DAL to be a separate DLL. If you need to do that because it makes deployment less painful, or because you’re doing something fancy with connection pooling, or because your DAL is shared by more than one application, go right ahead. But don’t do it just because you saw it in a book once.)

Objection 4: Stored procedures prevent duplicate code.

No they don’t. I’ll go farther and say in many cases I’ve seen widespread use of stored procedures enabling duplicate code. How’s that? Lack of transparency again.

Suppose I’m adding a new feature to an application. Let’s say it pulls customer balances for a report. So I essentially want to code something like select id, firstname, lastname from customer where curbal <> 0 order by lastname, firstname, right? But I’m supposed to get that from a stored procedure, because that’s the rule.

What’s the first thing I have to do? If you said something other than “Go through all the stored procedures on that server and look for one that does that SELECT for you,” you are part of the problem. It might be called from a different application that uses the same shared DLLs. There might be an existing stored procedure that pulls additional columns, or a different subset of columns, or one that gets the same data with a different sort order.

But I have to know it’s there! And the systems that have a stored procedure for every single data access are of course the systems that have a lot of stored procedures. My chances of finding the one that already does what I need are pretty slim. In fact, looking for the stored procedure that already matches what I need to do almost always dwarfs the time it takes to write the access method and a redundant stored procedure to go with it.

Look at it the other say. Suppose I know I’m trying to execute essentially the SELECT I describe above. I can quickly search in Visual Studio for the table name or an unusual column name. If I’m structuring the code reasonably well, it’s probably all in a “data access” module anyway, so I can limit the search. And behold, there’s an existing method that already executes select id, firstname, lastname, lotsofaddressinformation, mothersmaidenname from customer. Hey, can I add an ORDER BY clause to that and not break anything? Can I add a WHERE clause that’s… maybe only executing when I pass a flag into the method?

Now that’s a judgement call of course, but the point is that I have the information at hand to make it. If you hide all that logic behind a poorly-searchable stack of stored procedures, I won’t find it. Or I won’t find it easily enough to get the feature done efficiently, which is just as bad.

I am far more likely to find “matching” SQL code where it’s already in my Visual Studio solution, and easily searchable with no additional tools. And if I don’t find the matching code, I can’t avoid duplicating it.

So yeaaaaah, I’m not sold on the awesomeness of doing all database access through stored procedures. It’s not particularly efficient, it creates a lot of extra work for tiny benefits, and it cultivates code that’s duplicated and hard to find.