The W part of DTSTTCPW

A couple of weeks ago, a question turned up on the BaseCamp site we use to coordinate one of my projects. One programmer asked what we thought of a certain calculation he was setting up on the database. It had to do with accumulating “rating” points of an item in a tree-shaped threaded discussion.

Stakeholders weren’t really sure how ratings would work, but at least for now they were okay with saying an item’s rating could be based on the total of the ratings of all descendant records.

I’m not positive that’s the best idea, but it’s a reasonable one anyway. If your comment spawns a tree of comments-upon-comments that overall garner some high ratings, that may suggest your comment was pretty interesting. By the way, in this application you can have Comments, Questions, and Answers, as well as uploads of PDFs, video, and the like. They’re all interspersed in the Conversation tree. They can all get ratings.

The glaringly obvious solution

Well, you’d read a main record, then all its child records, then all of those records’ children, and so on. “But,” said our star Ruby guy, “that’s not gonna work.”

Rather, he went on,  what if we cached some intermediate results along with something like a cache timestamp, and (if I recall correctly, and I’m sorry if I don’t) feed some of the ratings up a level or two as kind of a precalculation. Posting the comment might be a little slower, but at least the ratings display gets a lot quicker in the bargain.

Trying to pull rank and be Mr. Super Agile, I said something like, “Dude, Do The Simplest Thing That Could Possibly Work. DTSTTCPW!” And I suggested we just skip the caching and pull all those rating records on demand–because a tree scan is drop-dead easy to code and doesn’t require a lot of deep thinking.

Also, I have to admit that the other part of my skepticism had to do with my faith in modern database engines. We hadn’t even chosen a production platform, but something like Oracle can optimize the heck out of a massive self-join operation. Why optimize it in Ruby code if the database knows best?

Our developer shot back: “But that can’t possibly work! You’re forgetting the W part!”

It’s not “Do The Simplest Thing,” full stop.

It has to be the simplest thing that could actually possibly work. And our guy was saying my admittedly oversimplified implementation concept couldn’t. Which brought me back to the idea of iterative releases.

Remember I said this isn’t a Scrum shop?

Well, it’s not. That means we don’t have Scrum Planning meetings, we don’t have Scrum Retrospective meetings, and we don’t have a true Scrum-style product backlog. And we don’t have a release at the end of each sprint, because there are no sprints.

There are intermediate development milestones though. When the star Ruby guy brought up the definition of “working” I went “Aha!” because he was thinking in terms of a completed, rolled-out, popular, busy website. When you have thousands of active users, caching queries can be important. I, on the contrary, was thinking in terms of iterations, and this iteration was going to be used by… like four people maybe. And query speed wasn’t their main momentary concern. And they might have come up with a totally different rating algorithm that would invalidate the caching strategy.

I don’t know. I’m not saying that throwing together something slow and crummy is “good enough” just because it’s a limited demo situation. On the other hand, I’m nervous about anything that smacks of premature optimization. But when you’re trying to do the simplest thing that could possibly work, it’s important to know what you mean by “working.” And I guess it’s important to guage how much the project is likely to change after the next iteration.

Where do you draw the line between slapdash and overkill? Is it an iteration thing? What do you consider “working” when doing the simplest thing?