The W part of DTSTTCPW
A couple of weeks ago, a question turned up on the BaseCamp site we use to coordinate one of my projects. One programmer asked what we thought of a certain calculation he was setting up on the database. It had to do with accumulating “rating” points of an item in a tree-shaped threaded discussion.
Stakeholders weren’t really sure how ratings would work, but at least for now they were okay with saying an item’s rating could be based on the total of the ratings of all descendant records.
I’m not positive that’s the best idea, but it’s a reasonable one anyway. If your comment spawns a tree of comments-upon-comments that overall garner some high ratings, that may suggest your comment was pretty interesting. By the way, in this application you can have Comments, Questions, and Answers, as well as uploads of PDFs, video, and the like. They’re all interspersed in the Conversation tree. They can all get ratings.
The glaringly obvious solution
Well, you’d read a main record, then all its child records, then all of those records’ children, and so on. “But,” said our star Ruby guy, “that’s not gonna work.”
Rather, he went on, what if we cached some intermediate results along with something like a cache timestamp, and (if I recall correctly, and I’m sorry if I don’t) feed some of the ratings up a level or two as kind of a precalculation. Posting the comment might be a little slower, but at least the ratings display gets a lot quicker in the bargain.
Trying to pull rank and be Mr. Super Agile, I said something like, “Dude, Do The Simplest Thing That Could Possibly Work. DTSTTCPW!” And I suggested we just skip the caching and pull all those rating records on demand–because a tree scan is drop-dead easy to code and doesn’t require a lot of deep thinking.
Also, I have to admit that the other part of my skepticism had to do with my faith in modern database engines. We hadn’t even chosen a production platform, but something like Oracle can optimize the heck out of a massive self-join operation. Why optimize it in Ruby code if the database knows best?
Our developer shot back: “But that can’t possibly work! You’re forgetting the W part!”
It’s not “Do The Simplest Thing,” full stop.
It has to be the simplest thing that could actually possibly work. And our guy was saying my admittedly oversimplified implementation concept couldn’t. Which brought me back to the idea of iterative releases.
Remember I said this isn’t a Scrum shop?
Well, it’s not. That means we don’t have Scrum Planning meetings, we don’t have Scrum Retrospective meetings, and we don’t have a true Scrum-style product backlog. And we don’t have a release at the end of each sprint, because there are no sprints.
There are intermediate development milestones though. When the star Ruby guy brought up the definition of “working” I went “Aha!” because he was thinking in terms of a completed, rolled-out, popular, busy website. When you have thousands of active users, caching queries can be important. I, on the contrary, was thinking in terms of iterations, and this iteration was going to be used by… like four people maybe. And query speed wasn’t their main momentary concern. And they might have come up with a totally different rating algorithm that would invalidate the caching strategy.
I don’t know. I’m not saying that throwing together something slow and crummy is “good enough” just because it’s a limited demo situation. On the other hand, I’m nervous about anything that smacks of premature optimization. But when you’re trying to do the simplest thing that could possibly work, it’s important to know what you mean by “working.” And I guess it’s important to guage how much the project is likely to change after the next iteration.
Where do you draw the line between slapdash and overkill? Is it an iteration thing? What do you consider “working” when doing the simplest thing?
Anthony
September 2, 2010 @ 1:29 am
I dont think there is an easy answer here generally however, I think it’s important to bring issues of load in early and to remember to tackle the risky work first. We try to have performance tests early on for those areas of the application that experience tells us are going to suffer under load. In terms of application user experience the responsiveness of the system is just as critical as the actual functionality. You need to understand as a team the impact of choosing the quick solution, maybe because you want to try out the ui, versus the solution that will actually work with your expected load. If the risk is small I.e. It’s a simple well understood change then defer it, if it’s a big unknown better to build that in from the start through you non functional specifications e .g for 1000 concurrent users every page should return in under four seconds.
Mark W. Schumann
September 7, 2010 @ 2:10 pm
Anthony,
Ah, you’ve hit on the main point then. Maybe I could reword this whole blog post as “In this particular application, scope and interface decisions were so unstable that I considered load one of the less risky aspects.” We’d recently gone through a couple of re-specs that caused the project scope to change pretty radically. And the client-ish people (more like our parallel partners in a mixed tech & social venture startup) had seen just a little of our completed code so far.
In my mind, the biggest risk was that the partners would see the part-done site and go, “Whoa, that is so not what we meant!”
Anthony, I’m wondering–what if we rephrased “Do The Simplest Thing That Could Possibly Work” as “Do The Simplest Thing That Could Possibly Address The Highest Risks”?