“What is it, really?”

Those four words launch a boondoggle.

It starts with smart software engineers. Smart and bored. They’re using a software library or tool to solve a problem. They have a lot of options to choose from. Multiple open source and commercial solutions, high quality, lots of customers or users. But it just isn’t perfect. It’s missing a few things they need and has a few of things they don’t.

So the engineers begin to ponder the nature of this tool. And they start with a gross oversimplification.

What is a name server, really? It’s basically a lookup table.
What is an object cache, really? It’s… well, it’s another lookup table.
What’s an ORM, really? It’s a map between SQL result sets and object fields.

Everything is just a map! And those are all real examples. I’ve seen companies who:

ignored Open LDAP, Netscape Directory Server, and Active Directory to write their own name server
ditched Ehcache to write their own that crashed the app on any serious load, and nobody knew why (multiple offenders)
ditched ORM to write all their queries by hand, seemingly unaware that they could easily use SQL for the 20% of queries that needed optimization. And since they supported MSSQL, Oracle, and MySQL, they wrote their DAOs 3 times. Cut and paste, baby!
wrote their own version of Struts with some extra features; then they were stuck on a proprietary Struts 1 clone long after Struts 2, Spring MVC, etc. came out.
wrote their own terribly designed version of portlets/JSF/etc. that nobody in the company understood after the creator left (and even he was shakey on it)

I’ll admit, often it’s less boredom than intimidation. You request the feature and the maintainers respond, “That sounds great! The source is over there, let us know when you’ve added it.”

You don’t even look at the source. I mean, it’s gotta be crazy complex. It already does so much. You’re not sure where to start. The developer contribution guide is scant and/or years old.

So you start rationalizing. You’re using just part of this thing. How hard would it be to recreate that? You’d understand all that code because you wrote it. And you could add those extra features you needed.

But you’re vastly underestimating the problem. To start with, the corner cases. I remember a story from Jamie Zawinski about the Netscape/Mozilla rewrite.¹ A couple devs were reimplementing the FTP functionality. They had taken a few weeks and had a question about an edge case. He helped them, but the real issue was that the original code was gnarly because it had taken them 6 months to find and handle all the edge cases. And they were ignoring the original code because it looked icky. The same has been said about search code, Unix utilities, ORM, caching, anything with serious concurrency, etc.

Enterprise software companies seem particularly prone to all this. Perhaps because the sales division loves proprietary tools and lock-in.

What I am not saying

I am not saying don’t innovate. Or that you can’t improve things or come up with better products.

If you want to create a new open source competitor, go for it. A number of ORMs came before and after Hibernate, both open source and commercial. More will come.
If you can build a product and sell it, even though there’s competition, go for it.
If you need a small piece of a bloated dependency, and you can knock this out with unit tests pretty quickly, go for it.
Are you brilliant, working with other brilliant folks who will vet this idea? And it’s for something of massive scale, like Google, FB, Amazon, or MS would need? Go for it.

What I am saying is that building a one-off of a sizeable, complex component, for just your project, will waste tons of money and become a huge regret for all involved. And it’s always done because of ignorance.

Another way

As a manager, if I have the budget for a new, complex subsystem, I have the money to go to the maintainers of the project causing you grief and say, “Hey, if you agree this is a good idea, how much would I have to pay you to implement it? Are there committers who are available and want to be paid fairly to make this better?” Almost certainly yes. Maybe there is commercial support. The work would be blessed in advanced and fast-tracked for review.

At a minimum, you can hire them to write a proper contribution guide and code walkthrough so your devs don’t crap their pants at the prospect of contributing.

This happens often in projects like Linux. It’s cheaper and causes fewer problems in the long run. But when it comes to developer frameworks and libraries, reinventing the wheel seems like too much fun to pass up.

If you liked this, you’ll appreciate What’s the Developer Experience?

Thanks to Dave Ford and Kiran Manur for hilarious, head-shaking discussions about this. And to Joel Spolsky for probably writing about this 15 years ago.

I’m almost certainly misremembering this, but some fine young cannibal will correct me. I couldn’t find a reference, so it was probably in a book. Coders at Work? [↩]

2 Comments

David Ford on December 17, 2017 at 10:38 pm

Here is the problem: sometimes the decision to reinvent vs reuse is not obvious until it’s too late. You may not know that the thing you are reinventing is a Pandora’s box until you’ve gotten in too deep. Also, the reverse is sometimes true. I recently spent an excessive amount of time bending a 3rd party UI component to meet my needs. Only in hindsight did I realize it would have been easier to build the component myself.

So what we need is a list of things that are much more complex than they first appear. Your list is a good start: ORM, cache, LDAP server. I would add: JS UI frameworks (like React or Angular), a database engine, a pdf rendering engine (I reinvented that once), DI framework.

Also a subtle disagreement to one of your points: “ditched ORM to write all their queries by hand”. I would say that reinventing an ORM from scratch is a monumental task and rarely advisable. But choosing to *not* use an ORM is an entirely appropriate decision for many situations.

Finally, in retrospect, I’m not sure I would take back all of my stupid reinvents. Those unproductive reinvent sidetracks turned out to be super productive in one way: the education of Dave 🙂
Philip Yurchuk on December 18, 2017 at 8:23 am

Thanks, as usual you lend insight to a complex issue. For sure, hindsight is 20/20 I couldn’t have written this without the benefit of many people’s hindsight, including yours. Said another way, we’re bad at predicting the future, and this is affected by how optimistic we developers are. We’re optimistic that…

– this will be a huge success with tons of traffic!
– this will be a little thing I can bang out and get back to something more interesting
– this will save me tons of time in the future

I would like to hear what situations are appropriate to ditch the ORM. The two that come to mind are:

– most queries are using advanced, proprietary DB features
– you have a full-time DBA on the payroll with a lot of spare cycles, so why not get her to write all the queries? They will be optimized and the right indexes will be created.

I think the second is more common than the first and is the one I’d choose as a manager. But I don’t think you’re disagreeing with the case I gave where there were many DAOs written for 3 DBs. I do find that most developers vastly overestimate how easy it is to override the ORM, either by writing a different DAO implementation or simply using the ORM’s facility for this. You’re a Hibernate expert, but even I’ve done a one-off to use PostgreSQL’s geo features and it was trivial and well documented.

And yeah, all we can hope to do is learn from our mistakes and not repeat them.

What Is It, Really?

What I am not saying

Another way

2 Comments

Submit a Comment

Categories

Search

What Is It, Really?

What I am not saying

Another way

Share this:

2 Comments

Submit a Comment

Categories

Tags

Search