Stored Procedures aren't all that
Now and then I check into The Daily WTF, partly because it’s funny and partly because it makes me feel a little better about some of the crazy code I’ve observed or been asked to maintain.
One in particular from about seven years ago amused me greatly, but not just in the train-wreck sense for a change. It addressed the very serious subject of putting all your database code into stored procedures. The comments are actually pretty illuminating.
Alex Papadimolous, actually being serious, wrote:
Though they take a bit more time to develop upfront, using database stored procedures are definitely the way to go for most information systems. They provide a looser coupling to the “data layer” by limiting the points of entry and offer stronger cohesion by breaking operations into reusable functionality.
I don’t think so, if by “most information systems” you mean the typical multi-tier business and financial databases that most of us maintain in our day jobs. I find that the looser coupling is almost always theoretical; it breaks down when you have to look at code from a year or two ago and figure out what the underlying stored procedure is actually doing instead of what you expect.
For example, I might get a defect report indicating a particular calculation is showing an incorrect result on an ASP.NET page. I’ll go to the page markup and find that calculated value in a C# variable, let’s say. In Visual Studio I right-click on that variable name and pick “Find All References” from the context menu. That almost always gives me enough information immediately to say “Oh, so I must be getting the wrong value in crunkfoo from the GetUserCrunkValues stored procedure!”
Now I’m looking for crunkfoo in the text of the stored procedure and… wait, that means I’m opening up SQL Management Studio and scrolling through a potentially very long list of stored procedures, then displaying the text of that stored procedure in a new tab. And now I’m tracing code that isn’t even really the code I’m working on.
It might not match the version that’s in production. I might be diagnosing what amounts to a database deployment fail. (That happened a lot at one recent site.)
The alternative? Just hit the database with your SELECT command from the client code. That’s right. Almost anything you can do in a stored procedure can be done just as easily in an embedded SQL string. In my experience, probably upwards of 90% of database hits are SELECTs with zero or more JOIN clauses, and not much else. Why not just see what the code is doing instead of creating an extra layer of tight data coupling?
Here are some objections to my approach:
Objection 0: Security! We need to limit points of entry to the database.
This is so bogus. Your application can’t limit points of entry to the database. Only the database can do that. Write all the stored procedures you like; it won’t prevent someone from going around your application with a rogue client.
Objection 1. Our corporate policy says we have to use stored procedures exclusively.
I know. I’m saying that’s a bad policy. Let’s not have that policy.
Objection 2: Stored procedures are pre-compiled with an execution plan, so they’re way more efficient than random SQL statements.
That would be relevant if in fact “random SQL statements” were on the table. But they’re not. What I keep seeing is applications that do the same SELECTs, perhaps with JOINs, over and over again. The parameters change, but the code doesn’t. And modern databases? Guess what, they know about caching and query optimization.
In any case, how much optimization do you really need on something like this?
select id, firstname, lastname, curbal from customer where id = @id
But yes, I see stored procedures like that all the time. How is that even worth maintaining? It’s busy work! Stop doing it.
Finally, since when is your application’s performance bottleneck found in a simple SQL query? Far more often, from what I see, the inefficiency is in big datasets being loaded repeatedly just in case they’re needed. Or in using unindexed columns in WHERE clauses. Or in ASP.NET code that shoves everything into View State. Stored procedures don’t do anything to solve those problems. They just make you feel like you’re optimizing something. Think bigger than that.
Objection 3: Scattering your SQL commands throughout your client code creates an undisciplined, unmaintanable mess.
Huh? Who said anything about scattering?
Yes, it’s good to keep all your database access in a set of classes that constitute a Data Access Layer (DAL). Do that.
That’s not a reason for your database access to use stored procedures exclusively. It’s not the same thing at all.
(Aside from the point of this blog post, it’s also not necessarily a reason for the DAL to be a separate DLL. If you need to do that because it makes deployment less painful, or because you’re doing something fancy with connection pooling, or because your DAL is shared by more than one application, go right ahead. But don’t do it just because you saw it in a book once.)
Objection 4: Stored procedures prevent duplicate code.
No they don’t. I’ll go farther and say in many cases I’ve seen widespread use of stored procedures enabling duplicate code. How’s that? Lack of transparency again.
Suppose I’m adding a new feature to an application. Let’s say it pulls customer balances for a report. So I essentially want to code something like select id, firstname, lastname from customer where curbal <> 0 order by lastname, firstname, right? But I’m supposed to get that from a stored procedure, because that’s the rule.
What’s the first thing I have to do? If you said something other than “Go through all the stored procedures on that server and look for one that does that SELECT for you,” you are part of the problem. It might be called from a different application that uses the same shared DLLs. There might be an existing stored procedure that pulls additional columns, or a different subset of columns, or one that gets the same data with a different sort order.
But I have to know it’s there! And the systems that have a stored procedure for every single data access are of course the systems that have a lot of stored procedures. My chances of finding the one that already does what I need are pretty slim. In fact, looking for the stored procedure that already matches what I need to do almost always dwarfs the time it takes to write the access method and a redundant stored procedure to go with it.
Look at it the other say. Suppose I know I’m trying to execute essentially the SELECT I describe above. I can quickly search in Visual Studio for the table name or an unusual column name. If I’m structuring the code reasonably well, it’s probably all in a “data access” module anyway, so I can limit the search. And behold, there’s an existing method that already executes select id, firstname, lastname, lotsofaddressinformation, mothersmaidenname from customer. Hey, can I add an ORDER BY clause to that and not break anything? Can I add a WHERE clause that’s… maybe only executing when I pass a flag into the method?
Now that’s a judgement call of course, but the point is that I have the information at hand to make it. If you hide all that logic behind a poorly-searchable stack of stored procedures, I won’t find it. Or I won’t find it easily enough to get the feature done efficiently, which is just as bad.
I am far more likely to find “matching” SQL code where it’s already in my Visual Studio solution, and easily searchable with no additional tools. And if I don’t find the matching code, I can’t avoid duplicating it.
So yeaaaaah, I’m not sold on the awesomeness of doing all database access through stored procedures. It’s not particularly efficient, it creates a lot of extra work for tiny benefits, and it cultivates code that’s duplicated and hard to find.
Allen White (@SQLRunr)
March 18, 2013 @ 9:34 am
You’re wrong, BTW, for none of the reasons you mentioned. You left out the big word: Optimization. You’re not a DB Developer. I will never tell you how to write C#, C++, etc. code, but I’ve taken queries that run for hours and reduced them to seconds. And I can’t do that if your queries are buried in your Cxx code. I can if they’re in stored procedures. (copypasta from Twitter).
Mark W. Schumann
March 18, 2013 @ 1:31 pm
Allen and I went around the block a few times on this on Twitter. You can search for any tweets today that have both @sqlrunr and @MarkWSchumann in them to see.
I think it ended up like this: Allen’s saying a database development specialist is going to write more efficient SQL, which is probably true. I’m saying… I don’t get a lot of database development specialists on my teams.
The scenario I’m talking about is when I’m not the DBA, but I am the one writing the stored procedures. It’s still me at both ends. The embedded SQL I’m writing is the exact same code as what I would have built into the SP.
Now if you’re talking about an SP that’s more than a few commands long, with a lot of temp tables or maybe a fair amount of branching logic, at some point it’s just a lot more maintainable to quit messing around on the client/middle tier and just keep it all on the database server.
And I’m definitely not advocating doing joins or row filtering anywhere but on the server! I’ve seen some harrowing home-coded client-side JOIN implementations, and a lot of ASP.NET code that pulls a large DataSet and picks through the resulting rows to find the one it wants.
But still, look here, the issue isn’t so much whether you wrap everything in an SP. The issue is how much sort/select/join work you let the database do, when it’s REALLY GOOD at that sort of thing. It certainly does no good to force every database hit into stored procedures, only to find your app developers are going all thick-client on the results by abusing LINQ.
Eric Wise (@eswise)
March 18, 2013 @ 10:13 pm
Seems to me that you’re trying too hard to put solutions into a box.
Sometimes it makes sense to do joins and filtering on the application server. You have a big fat expensive database server and you’re thinking about scaling up or out which requires expensive SQL Licenses, sometimes it makes sense to make use of aggressive caching on the application tier and do filtering etc there.
Sometimes you have concerns about tuning and want all the queries in sprocs for easier access, so that you can change queries without redeploying the application.
Sometimes you want loose coupling. I actually completed a project where the entire data structure changed but the code didn’t, because the code used sprocs and we could return the same fields with a radically different data structure.
Sometimes you just want to be consistent, so because many of your queries are in sprocs it makes sense to just put everything there so it’s easy to find.
Sometimes you have PCI data, or PII data, and using stored procedures with AD restrictions are hands down the most secure way of doing it. (I invite you to check out the stig guidelines for sql server that the DoD uses)
If your app is simple, straightforward selects, no secure data concerns, and not large enough to be in need of tuning or optimization. Yes, you’d be silly to go the all sproc route via policy. The real answer though is that “it depends”
Eric Wise (@eswise)
March 18, 2013 @ 1:04 pm
This… I must disagree with.
As Allen said, optimization becomes a real problem when you have to redeploy your source code to make changes. You are correct that stored procedures have nothing to do with proper indexing or caching strategies… except that they make it easier to implement _consistent_ methods of accessing data.
From a security perspective, the main benefit of using stored procedures is to limit access… to your underlying tables. For you to allow ad-hoc querying against your database you either creating a bunch of views (at that point why not use sprocs?), or you are giving direct access to the tables through your application. If you have given direct access and the account gets compromised the hacker can do some pretty significant damage. If you follow the best practice of only giving the application access to utilize procedures that you have defined you can limit the amount of damage someone can do with unauthorized access.
As for dynamic searching, I tend to allow that within the code, but for the security reasons above only against a view.
I make a mighty fine living these days cleaning it up developer messes in databases. You will find that direct access and in code queries are fine… until they aren’t. When you hit that wall you will find that fixing the issues is way more time consuming and expensive to fix.
Alex Papadimoulis (@apapadimoulis)
March 18, 2013 @ 11:46 pm
The same applies to generating bits of SQL code. Most of the code is probably select statements, but complex stuff (like select statements with crazy joins, multi-statement queries) don’t work so well.
You seem to advocate this boundry through a Data Access Layer, but that’s exactly what stored procs are – a well-defined API that sits on top of the tables and does all the data access things like ensuring data are correct (can’t update a closed-out order), valid (can’t delete the only child), transactional, etc.
Mark W. Schumann
March 20, 2013 @ 12:22 pm
Short answer, Alex: It’s actually unusual for shops to do database changes in the way you (correctly) consider “proper.” You’re right though. A further refinement of my thesis might be: If all you’re doing is requiring all data access to go through stored procedures, you’re doing a crappy job of security *and* you’re making the developers work harder for nothing.
Yup. I’m going to question your examples of database integrity though–aren’t those things better done with triggers and restraints? They seem to be orthogonal to the sproc requirement.
Finally, about learning SQL? I think that’s still a fundamental skill for business application developers, except *possibly* for the ones who really, really only do pretty front-end work. I’ve read a LOT of other people’s database code, some of it from SQL specialists, and… um, I’m really not seeing the big deal. 80% of the code is very basic: lots of two-way and three-way JOINs with ORDER BY and perhaps a GROUP BY here and there.
If you can train yourself to think in sets, even some of the gnarly 20% is accessible, although I agree that you can get to the point where you’re better off with a SQL specialist.
It’s just that… ugh. The “no direct SQL” rule leads to so many *absolutely trivial* stored procedures that are just plain busywork to maintain and pay attention to.
Eric Wise (@eswise)
March 20, 2013 @ 8:18 pm
Here you go, now your devs don’t have to waste time with crud sprocs. 🙂
Mark W. Schumann
March 20, 2013 @ 9:12 pm
I urge everyone to get and use Eric’s script above, just for shock value at the very least. I’m flummoxed.
Eric Wise (@eswise)
March 20, 2013 @ 9:23 pm
INFORMATION_SCHEMA.Columns and INFORMATION_SCHEMA.Tables has everything you need to roll your own code generation. It’s nothing to be flummoxed about.
Alex Papadimoulis (@apapadimoulis)
March 22, 2013 @ 9:19 am
Yes, though the bigger they are, the less unusual it becomes. Worth noting… in the 2003 days at small web shops, I remember people complaining that ASP.NET was bad because it you had to recompile, and if you wanted to change just a single .asp file, you couldn’t.
Of course (and those should be preferred). I was thinking, more “higher level” things that would be very difficult (perhaps impossible) to put in a trigger. Maybe, on the Invoices_AddLineItem sproc, it’d be clearer if you kept stuff like “IF @InvoiceStatus ‘Open’ RAISERROR” and with the rest of the code. A hard rule to generalize, but I think maybe “state-dependent multi-relation validation” or something.
And within the 20%… 80% of it you could probably maintain if you’re careful, and 20% you’d better leave with the guru. Takes a big system for that 4% to actually exist… let alone matter.
Oh it sure does! At least 50%.
Loose coupling does not come cheap; it’s overhead. The trick is to know when the overhead is worth it, and then maximize the value-add of the non-core activity while minimizing the added cost/pains.
Eric Wise (@eswise)
March 20, 2013 @ 9:31 pm
Last comment in regards to:
“It’s just that… ugh. The “no direct SQL” rule leads to so many *absolutely trivial* stored procedures that are just plain busywork to maintain and pay attention to.”
There are plenty of dependency tracking tools for SQL Server, so when you add/remove/rename columns you can identify all the impacted objects much easier than trying to do a find all through your source. In my experience there is a lot more busywork for schema changes if you don’t use sprocs.
Mark W. Schumann
March 21, 2013 @ 7:36 am
I’m not thinking of schema changes so much as the “maintenance” of naming and finding all those trivial sprocs. Maybe I’m complaining more about a nomenclature problem here; I see a lot of shops that name these things haphazardly, update them inconsistently, and define them in nonorthogonal ways. As in, the GetFoo procedure should return the exact same thing from the FOO table as GetBar returns from the BAR table, but noooOOOooooo. One gives you all the rows by primary key and the other gives you just the rows that match the value of an input parameter. It makes a lot of boring research out of something that shouldn’t take a moment’s thought.
The fact that all the pushback on this blog post has come from database specialists suggests that my real problem here is environments where nobody’s really in charge of the data. That’s always a bad thing, but it’s terribly common. You guys don’t see it because your own presence is what fixes it.
Out in the unmanaged-data wild, where data integrity rules are “too much trouble” and nobody bothers to define foreign key constraints because you’re supposed to know to match up column names in your mind? The requirement to use nothing but stored procedures is a wasteful and obnoxious instance of burden-shifting.
Eric Wise (@eswise)
March 21, 2013 @ 8:11 am
My recommendation with the naming is to slowly start refactoring. Over the years, I have come to really appreciate the Noun-Verb approach to naming. Example:
CustomerGetById, StatesGetList, CustomerInsert, CustomerUpdate, etc.
Then in your ssms viewer all the customer* sprocs will be nice and grouped together. In the case of more complex joins, name them after the dominant noun that the query is trying to solve ex: BillingInquirySearchByAccount
Cost of convergence | Critical Results
March 25, 2013 @ 8:35 am
[…] skeptical of my having used only inline SQL in my C# code; up to then the shop standard had been to use stored procedures for everything. We agreed that migrating all that inline code to procedures would happen, just not in time for […]