samkgg Lessons Learned

(This is a regularly updated page where I document what I’ve learned while building samkgg, a demo app for AWS SAM, Kotlin, GraalVM, Gradle, and other technologies as this progresses.

Read the samkgg blog post first to understand what I’m doing. You can find the working code here:

https://github.com/madeupname/samkgg

Builds and Dependencies

It’s important to consider how you organize your code. There are a couple options that give you flexibility. A critical point is that you don’t need to use Graal for everything! The main advantage of Graal is fast cold startups. If you are serving users interactively, use Graal. But some Lambda functions run asynchronously or in batch and it doesn’t matter much then. It might save you developer time to build a standard JVM function and not worry about reflection. SAM supports this.

When you create your project with “sam init” as described in the README, it will create a single Lambda function in a single Gradle project. You can put multiple Lambda functions in that same directory and build/deploy them without error by configuring the path for each function in template.yaml. However, each function will have the same executable and hence the combined dependencies for all functions in your project. This is a little better with Graal, but not recommended.

There is no way to make this a traditional Gradle multi-project build. Each function needs its own Gradle build:

https://github.com/aws/aws-sam-cli/issues/3227

In lieu of a multi-project build, you can do two things. First is to follow Gradle guidelines for large projects. This is definitely not a large project, but we organize it like one. Second is to add more build logic in via Makefiles. Yes, Makefiles:

https://makefiletutorial.com

Other notes:

  • Despite the GraalVM term “native image” the SAM package type is still zip. Hence, Lambda layers are allowed.
  • Similarly, the use-container flag passed to “sam build” means it is going to build the functions inside a Docker container. Since it’s building a binary for a specific platform, the build runs in that target platform with the necessary dependencies.
  • The Kotlin stdlib JDK 8 dependency threw me, but that’s the latest version. It still works with JVM 17 targets.
  • AWS SDK v2 is recommended, most importantly since they are claiming it is GraalVM-safe. It was also reworked for faster Lambda cold starts. However, it is possible to include both v1 and v2 in the same project, which could be required given v2 still has not reached feature parity.

 

Windows

You want to run WSL 2 to support Docker and run Ubuntu locally (more below). This is possibly just my machine, but Docker Desktop makes the “WMI Provider Host” process suck up CPU, blasting the fans, and none of the solutions I’ve read fix it. YMMV.

Git Bash (MINGW64) seems to support the most commands well and doesn’t need Docker, so that’s my go-to shell. Of course, IntelliJ can run Gradle directly and that’s handy, too.

However, Ubuntu is required for running the tracing agent, which is required to ensure you have all your GVM resource and reflection config. And unfortunately that needs Docker Desktop running.

GraalVM

Note: I believe you can/should always use the latest stable version because you are creating an actual binary in a custom runtime. Even though you choose a GraalVM JDK version (11 or 17) when creating the app via the CLI, there’s no separate VM/runtime supplied by AWS. Native images are self-contained.

Minimum docs to read as of this writing. I’ve seen docs change significantly between versions.

https://www.graalvm.org/22.2/reference-manual/native-image/

https://www.graalvm.org/22.2/reference-manual/native-image/basics/

https://www.graalvm.org/22.2/reference-manual/native-image/metadata/

Skipping the docs will lead to a bad time. It may also lead to ignoring viable libraries just because they use reflection.

GraalVM can use code with reflection, it just needs to know about it beforehand. In particular, it needs to know all of the possibilities in advance. If your code dynamically instantiates any of number classes, Graal needs to know which classes those are so they are included in the executable.

Reachability Metadata and the Tracing Agent

One of the biggest challenges with working with Graal is configuring it to include what will be needed at runtime, including dynamic features and resources. native-image does its best to discover this at build time, but it can’t predict everything. Hence you must give it supplemental instructions via “reachability metadata” commonly shortened to metadata. Key files are:

  • resource-config.json – this forces the build tool to include these files so that they may be loaded at run time when it’s not obvious to the tool at compile time
  • reflect-config.json – specify what will be called reflectively so the VM can prepare this code for execution in the binary

Some libraries supply or document this metadata, but most haven’t. This is somewhat eased by using the Tracing Agent (see metadata docs above), which is a runtime Java agent that tracks every time reflection or dynamic behavior is used and stores everything in special config files. The two key ones are

However, the agent can only detect these usages during the run. If your run with the agent skips any classes, methods, even conditional branches, and they would have used reflection, the config files will not get updated and your code can have an error during runtime. A solution to this is to run the agent when you run your tests, assuming your tests have good coverage. You want to add exclude filters for your test libraries.

When I ran it on this simple project, though, the output was enormous (41KB) because it’s listing everything individually. It’s like importing every single class from a package instead of using a wildcard. A small binary is a high priority for Graal. The good news is there is a flag to merge metadata from multiple runs into a single file.

Given all that, you can see why all libraries and frameworks seeking to be GraalVM-friendly (like Micronaut and Quarkus) seeks to avoid reflection like the plague. Sadly, I was using Spock for tests (huge Groovy fan), and discovered the agent will include everything for Groovy and Spock, which is way too much to wade through. I then understood why plain old JUnit was chosen for the template.

Environments/Stacks

Environment is an overloaded term in SAM. You have:

  • SAM CLI environment
    • the “sam” command has a config file named samconfig.toml and this file divides configuration settings among environments
    • the default environment is literally called default; create others as needed
    • you specify the environment for the command with –config-file
  • environment variables
    • there is a block for this in template.yaml
  • environments where the function runs
    • local or AWS

Finally, what a programmer might think of as a deployed environment (qa, production) CloudFormation (and SAM by extension) calls a stack.

Per SAM best practices, you create a stack per environment, such as development, staging, or production. So together with environments, your code can be:

  • deployed to AWS in different stacks
    • development
    • production
  • local
    • Docker container (sam local invoke)
      • development
      • production
    • test (gradle test, no container)

Each of these has subtle differences that are not always obvious/documented. I do my best to document the surprises here.

SAM CLI can be passed a JSON file and/or command line options that contain overrides for the Environment variables in your template.yaml. Two critical points:

  • You have to create environment variables in template.yaml or the Lambda environment won’t pass them to your function, even if they exist.
  • One very misleading issue is that the Parameter block of the JSON file is for global environment variables, not parameters! I was not getting my global env vars overriden even though they were declared in template.yaml and specified for the function. For safety’s sake, I duplicate them in the JSON file.

Logging

I’m not the first engineer to find logging with Lambda and Graal to be surprisingly challenging. My initial choice for logging was AWS Lambda Powertools. It looks like a solid library to help you troubleshoot, however, it relies on Log4j 2 which is not GraalVM friendly. According to that thread, the maintainers say it won’t be ready for Graal until Log4j 3 at the earliest.

Graal officially supports java.util.logging (AKA JUL), which is nice because it’s one less dependency (setup docs here). However, for reasons I don’t yet understand, log messages directly from JUL did not show up when run from Lambda. They worked fine when testing outside the container, which you’ll find is a common challenge with this stack.

The solution was adding SLF4J with the JDK 1.4 logging support library (not the “bridge” library – that sends all JUL logs to SLF4J). SLF4J should also enable logging from AWS Java SDK, and I have seen an instance of an AWS SDK log message in my console.

The next challenge was determining per-environment configuration given the differences:

  • development and prod should have different logging.properties files to set different log levels
  • code deployed to AWS uses CloudWatch

My first attempt was to configure the loggers at build time so Graal didn’t have to include the files in the native image. But Graal considers creating an input stream from a resource to be unsafe during initialization. Logging is still configured during class initialization, but at runtime, not build time. I use AWS-provided environment variables to determine if it’s in a container and if that container is AWS-hosted.

The next issue is that Lambda logs everything in the console to CloudWatch, which is good, but CloudWatch sees every newline as the start of a new log message. I have created a CloudWatchFormatter replaces newlines with carriage returns (which CW doesn’t mind) if the code is running on AWS. My next goal is creating a JSON formatter to allow better use in CloudWatch Insights.

Another interesting concept is Mapped Diagnostic Context (MDC), which is like a thread-bound map. It is not supported by JUL, but SLF4J offers a basic adapter. You can put any key/value into the map and it will be visible to all methods that get the MDC adapter. I added the AWS Request ID from the context so it can be logged with messages from any source.

I owe much of this to Frank Afriat’s blog post and GitHub project. You may wish to use his implementation, which is more robust than mine, but is also marked alpha and relies on SLF4J’s simple logger, which is not as robust as JUL.

DynamoDB

I found the DynamoDB Guide an excellent supplement to the docs.

I don’t see any library that handles schema and data migrations for DDB like Flyway and Liquibase do for SQL. CloudFormation can build tables for you, but it can’t handle data migrations that naturally occur in an active project with a changing schema. Luckily, you can implement the basics of Flyway pretty trivially.

A particular challenge not shown by the docs is a very common scenario: a class has a list of custom objects. Mapping this in the enhanced client (their ORM) is not at all obvious. You have to map it to a list of JSON document, possibly with a custom attribute converter. I think they didn’t bother making this easy because with DDB, this relationship is only useful when it’s a composition relationship, meaning the objects in the collection don’t exist outside their parent. Otherwise, they would be stored in a separate table with any relationship managed entirely in code since DDB doesn’t support joins.

In adding the DynamodDB dependencies, I found I had to update the reflection config due to a transitive dependency on Apache HTTP Client.

Kotlin

I’m learning Kotlin from scratch and it’s less intuitive than I thought, especially (ironically?) coming from Groovy. Running into issues where things don’t work and it’s not clear why. IntelliJ is a big help here, offering fixes that work and shine a light on where to look for help. I think a lot of the challenge comes from how strict the typing is. But there have also been a few times where I fired up the debugger to find out why something isn’t working… and then it just does. To be clear, it was 1) run test, which fails 2) debug run test (no changes) it now works. Like it was really an IntelliJ state problem, which I’d experienced in the past with Gradle. Refreshing the Gradle project might help, YMMV.

Kotlin has some useful features like data classes that automatically give you equals and hashcode methods, but you have to mind the constructors.

 

samkgg: AWS SAM + Kotlin + Graalvm + Gradle

I am starting a new open source project called samkgg. It’s a learning in public project that I hope will help others who want to adopt these technologies. You can find it here:

https://github.com/madeupname/samkgg

I thought I’d explain a bit about why and how I chose these technologies. The main driver is an is an Alexa skill that I’m building. This is not for fun, but what I hope to be a solid commercial product. And even if you’re not building an Alexa skill, you’ll see how to build a production serverless app.

TL;DR

  • Lambda because Alexa docs mostly assumes this
  • GraalVM to eliminate Lamba cold start delays
  • SAM is way more mature than Amplify
  • Kotlin because I can’t use Groovy
  • DynamoDB because it fits in with the serverless model

I am a fan of the philosophy “two ways to win, no way to lose.” If my product doesn’t take off, I still win by adding the serverless architecture to my tool chest. I also find voice UIs/ambient computing to be very interesting.

Why Lambda/serverless?

If you look into how to build Alexa skills, there are various ways you can do it. Technically, I could just write my own REST API in any framework. But almost all the documentation assumes you are building on Lambda. I prefer modular monoliths for most projects, especially to start. But I’m less excited about choosing a poorly documented route. Just using Java is hard enough.

Amazon does have a hosted development platform for Alexa, but this only supports JavaScript and Python.  My background is JVM technologies and I have no desire to throw away that experience and switch ecosystems. I think most developers are far too cavalier about that and I also believe in the power of “skill stacks” – building on your experience.

Luckily, Java on Lambda is no problem. I was greatly helped by the book Programming AWS Lambda, which is Java-specific.

Well, when I say Java is “no problem,” I mean “possible.” In reality, it has an infamously slow startup/load time. This is why most Lambda functions – and microservices in general – are written in interpreted languages like JavaScript and Python. Java is faster if the process stays up long enough, but with serverless you’re frequently starting a new server, so users can experience very slow responses. This can be especially painful for a voice UI – there is no spinner.

Why GraalVM?

The solution to slow cold starts is GraalVM. It allows you to compile Java (and other languages) into native binaries that start up super fast. If you haven’t investigated it, you should know:

  • some years ago, a very high level Oracle exec told me they considered it the future of Java
  • first production release was over 3 years ago
  • it’s used in production today by big companies like Twitter
  • it recently got first-class support from Lambda (JDK 11 and 17)

Why SAM?

My next decision was what technology to implement the Lambdas in. I first looked at Amplify, since AWS was touting this as a low code solution, practically no-code with the Figma builder tool. But after a spending a lot of time, I came away deeply disappointed. Politely, this is the common conclusion. I think only JS developers will be comfortable and I didn’t love that GraphQL was the main technology.

But the biggest problem was no easy path to learn. Much of the docs frequently lagged the tooling, even the “Getting Started” material. Tutorials should be flawless, especially for a commercial tool. Due to its pricing, I think it has tremendous potential for startups, but needs a docs-first approach. I’ll look again in a couple years.

I learned of SAM from a Hacker News discussion of Amplify where it was repeatedly recommended over it. After looking into it, it seemed the right choice for me. Like Amplify, it’s a productivity layer on core AWS technologies (mainly CloudFormation).  Both hit the Law of Leaky Abstractions, but SAM has a smaller surface area. That said, with SAM you will still need to understand:

  • AWS fundamentals
  • CloudFormation
  • API Gateway
  • Lambda (duh)

and more, at a deep level. And in the stack I’m building includes:

  • DynamoDB
  • Cognito
  • CodeDeploy
  • Kotlin
  • Gradle
  • GraalVM

and possibly CodeBuild and Cloud Developer Kit (CDK). In short, about the same as you needed for Spring/Hibernate or Groovy/Grails (plus your DB, CI/CD, etc.).

As Fredrick Brooks said, there is no silver bullet.

Why DynamoDB?

I’ll add I was on the fence a bit with DynamoDB, having not done NoSQL before. But what I really like about SQL is Hibernate/JPA (and really, GORM). And Hibernate does not work in GraalVM. OK, there is a Quarkus plugin for it, and I found a library to make it work, but I did not find a definitive statement: “Hibernate is guaranteed to work in production on GraalVM.” I more found, “This is all stuff we had to do to get it to work.” I love Hibernate, but I need a guarantee.

In fact, I couldn’t find any Java ORM that was officially compatible with GraalVM native images. This makes sense because GraalVM doesn’t allow reflection, which is an obvious feature to use in an ORM.

DynamoDB has a mapper, which is a very lightweight ORM, but at least it works with GraalVM. Well, other than schema creation, but you probably want to do that programmatically to handle migrations anyway. Unfortunately, there is no migration library like Flyway for DynamoDB.

For a small schema, it feels more natural to work directly with an object graph, which is what you get by persisting to JSON. But for a sizeable app I’d consider Aurora Serverless PostgreSQL and Flyway. Going back to mapping result sets does not sound fun, though.

Why not Micronaut? Or Quarkus? Or…

It’s natural to consider one of the serverless/microservice frameworks. I know excellent people on both the Micronaut and Quarkus teams. But they seem an even further abstraction from AWS. If I were building generic microservices, not an Alexa skill, I’d consider them. I would definitely look if I were trying to be cloud agnostic, but that’s silly for an Alexa skill.

So if this sounds interesting, please check it out:

https://github.com/madeupname/samkgg

What is the Mautic Plugin Published setting?

If you’ve installed Mautic and head over to plugins, you notice there is a setting called “Published” with choices Yes or No. Inexplicably, there is absolutely no documentation on this.

Published means enabled. Apparently, they thought it was obvious since the tab is called Enabled/Auth. But given that some of these plugins might publish something it’s a poorly chosen name since a new user might not be clear about it.

Java’s New Pricing

Java’s New Pricing

I recently attended a talk by Georges Saab, Java head honcho at Oracle. The following is an executive summary, simply explained so you can understand the changes and plan accordingly. If you use Java commercially, the odds of you reading this and saying, “We’re fine, don’t need to change anything,” without doing any checking, is very low.

I’ll add the same “safe harbor” statement Georges added: any planning/spending you do should not rely solely on this article – things can change, I might have misheard something, etc. Do your own research.

Background

  • The JDK, or Java Development Kit, is a versioned specification. JDK 5, 6, … 11. There are also editions, such as Java SE, Java EE, etc.
  • For 5 through 9, major versions were released every 2-5 years. Updates (e.g. 8u20) came out about every 6 months. “Updating” means back-porting security, bug fixes, and possibly other improvements that are guaranteed to have no breaking changes.
  • The JDK specification has implementations, which are downloadable binaries. They come from various providers (companies, organizations, or individuals) and may be
    • under various open source or commercial licenses
    • free or paid
  • The most common one is Oracle JDK.
  • OpenJDK is a community project which provides the reference implementation of the JDK. 
  • It is a collaboration between several companies, but >90% of the contributions come from Oracle. 
  • Anyone can create their own build/distribution of OpenJDK and many do – including Oracle.
  • These builds can have code changes. For a while, Linux vendors replaced code that wasn’t GNU-compatible so it could legally be distributed with Linux, but is/was still called “OpenJDK.”

Today and the Future

  • Starting with JDK 10, a major version of Java will be released every 6 months. Far fewer changes, but still major versions, so not guaranteed to be backward-compatible. 
  • JDK 11 introduces the concept of long-term support (LTS) versions. These are the ones that are going to get updates after the next version is released. Example:
    • JDK 12 is released and bug security fixes are back-ported to 11.
    • JDK 13 is released and bug security fixes are back-ported to 11, but not 12.
  • The big news is that Oracle is going to stop updating Oracle JDK 8 for commercial use in January 2019 and personal (desktop) use in December 2020 unless you pay for support.
  • Oracle will only provide updates to the free version of Oracle JDK, and OpenJDK, while it is current (versions released in the last 6 months). Meaning as soon as JDK 12 is released – 6 months later – they will stop providing updates to Oracle JDK 11 unless you pay.

Let me clarify that with an example that shows you your options:

  • JDK (Java) 11 is released and you adopt the Oracle JDK in a commercial setting. You’re paying nothing as usual.
  • Six months later, JDK 12 is released, with fixes and new features.
  • Your options:
    1. Do nothing. Continue to use Oracle JDK 11 for free forever, legally. The license does not expire, it will just never be updated by Oracle.
      • Maybe you’re running on a closed/air-gapped system and there is nothing else you need.
    2. Upgrade to Oracle JDK 12. Still free and likely can run everything you built under 11 without issues.
    3. Switch to OpenJDK 11 from a provider other than Oracle, who is updating it.
    4. Switch to another commercial JDK.
    5. Pay Oracle for updates of 11. I was told the current price for this is $25/processor/month, with volume discounts, run on the honor system. I understand this is a significant discount from earlier pricing models.

It will be interesting to see how this plays out. I totally get that Oracle has a big staff of developers making Java better and they need to pay them. Giving the product away for free makes that more difficult.

On the other hand, many people have gotten used to paying nothing. A number of companies are already planning to provide updates for older versions, but we’ll see what kind of toll this puts on other contributors and who does what for free. Some have extensive Java experience and their own (often commercial) distributions: Red Hat, Azul Systems, IBM, etc. Those may be more cost effective for you. Maybe you’re already using one.

For the record, I have no problem with companies charging money for software development. Mine does and I’m very fond of the rent and coffee it pays for.

CIO/CTO Action Items

  1. Identify every system that is running Java in a commercial environment. This includes desktops.
    • Remove Java from systems that don’t need it.
  2. Identify where those distributions came from: Oracle, Red Hat, GNU, etc.
  3. Determine their update schedule. Be aware of when your Java versions will be out of date. Put it on your maintenance schedule or potentially get pwned. 
  4. Switch to JDKs that will be updated or plan for regular upgrades.
  5. Budget accordingly.

For further reading, Java Champions have created a document summarizing the changes in a bit more depth.

Hope you found that useful! Please comment with any corrections. 

Thanks to Marco Villalobos for pointing out some issues in the first version.

What Is It, Really?

What Is It, Really?

“What is it, really?”

Those four words launch a boondoggle.

It starts with smart software engineers. Smart and bored. They’re using a software library or tool to solve a problem. They have a lot of options to choose from. Multiple open source and commercial solutions, high quality, lots of customers or users. But it just isn’t perfect. It’s missing a few things they need and has a few of things they don’t.

So the engineers begin to ponder the nature of this tool. And they start with a gross oversimplification.

  • What is a name server, really? It’s basically a lookup table.
  • What is an object cache, really? It’s… well, it’s another lookup table.
  • What’s an ORM, really? It’s a map between SQL result sets and object fields.

Everything is just a map! And those are all real examples. I’ve seen companies who:

  • ignored Open LDAP, Netscape Directory Server, and Active Directory to write their own name server
  • ditched Ehcache to write their own that crashed the app on any serious load, and nobody knew why (multiple offenders)
  • ditched ORM to write all their queries by hand, seemingly unaware that they could easily use SQL for the 20% of queries that needed optimization. And since they supported MSSQL, Oracle, and MySQL, they wrote their DAOs 3 times. Cut and paste, baby!
  • wrote their own version of Struts with some extra features; then they were stuck on a proprietary Struts 1 clone long after Struts 2, Spring MVC, etc. came out.
  • wrote their own terribly designed version of portlets/JSF/etc. that nobody in the company understood after the creator left (and even he was shakey on it)

I’ll admit, often it’s less boredom than intimidation. You request the feature and the maintainers respond, “That sounds great! The source is over there, let us know when you’ve added it.”

You don’t even look at the source. I mean, it’s gotta be crazy complex. It already does so much. You’re not sure where to start. The developer contribution guide is scant and/or years old.

So you start rationalizing. You’re using just part of this thing. How hard would it be to recreate that? You’d understand all that code because you wrote it. And you could add those extra features you needed.

But you’re vastly underestimating the problem. To start with, the corner cases. I remember a story from Jamie Zawinski about the Netscape/Mozilla rewrite.1 A couple devs were reimplementing the FTP functionality. They had taken a few weeks and had a question about an edge case. He helped them, but the real issue was that the original code was gnarly because it had taken them 6 months to find and handle all the edge cases. And they were ignoring the original code because it looked icky. The same has been said about search code, Unix utilities, ORM, caching, anything with serious concurrency, etc.

Enterprise software companies seem particularly prone to all this. Perhaps because the sales division loves proprietary tools and lock-in.

What I am not saying

I am not saying don’t innovate. Or that you can’t improve things or come up with better products.

  • If you want to create a new open source competitor, go for it. A number of ORMs came before and after Hibernate, both open source and commercial. More will come.
  • If you can build a product and sell it, even though there’s competition, go for it.
  • If you need a small piece of a bloated dependency, and you can knock this out with unit tests pretty quickly, go for it.
  • Are you brilliant, working with other brilliant folks who will vet this idea? And it’s for something of massive scale, like Google, FB, Amazon, or MS would need? Go for it.

What I am saying is that building a one-off of a sizeable, complex component, for just your project, will waste tons of money and become a huge regret for all involved. And it’s always done because of ignorance.

Another way

As a manager, if I have the budget for a new, complex subsystem, I have the money to go to the maintainers of the project causing you grief and say, “Hey, if you agree this is a good idea, how much would I have to pay you to implement it? Are there committers who are available and want to be paid fairly to make this better?” Almost certainly yes. Maybe there is commercial support. The work would be blessed in advanced and fast-tracked for review.

At a minimum, you can hire them to write a proper contribution guide and code walkthrough so your devs don’t crap their pants at the prospect of contributing.

This happens often in projects like Linux. It’s cheaper and causes fewer problems in the long run. But when it comes to developer frameworks and libraries, reinventing the wheel seems like too much fun to pass up.

If you liked this, you’ll appreciate What’s the Developer Experience?

Thanks to Dave Ford and Kiran Manur for hilarious, head-shaking discussions about this. And to Joel Spolsky for probably writing about this 15 years ago.

 

  1. I’m almost certainly misremembering this, but some fine young cannibal will correct me. I couldn’t find a reference, so it was probably in a book. Coders at Work? []
What’s the Developer Experience?

What’s the Developer Experience?

This is an excerpt from Enterprise Sofware Confidential, my white paper on buying enterprise software without making a huge mistake. 

In this article, you will learn:

  • solid business reasons to care about developer happiness – even when they’re not your employees
  • why so many enterprise vendors produce piles of garbage that still sell

When making a major software purchase, you have a lot of questions. But there’s one that nobody asks, yet is so critical: what’s the developer experience like?

As you read earlier, enterprise software is all about configuration and customization. Your company has proprietary processes to which the software must conform. Or it should. Granted, many times the software is in line with industry best practices and you are not. Changing your processes to match the software is a win/win. But if you’ve invested time and money to create competitive advantages, you don’t throw them away to save engineering dollars.

So no matter the core feature set, there will be a bunch of custom programming to get this thing off the ground. Yet nobody asks if it will be pleasant for the developers. Yes, pleasant! Maybe you don’t care if the programmers like their job. Maybe you don’t like your job. Misery loves company, right?

Kidding! Of course, you care about the happiness of your employees. But in most cases, these are not your employees. They’re at the system integrator or vendor, and you’re paying them a lot to power through it without complaining.

In reality, there are solid business reasons to care about developer happiness, even if they’re not “your” developers. (Note: I’m going to use the word integrator to denote anyone customizing your software. Short for system integrator, but often referred to as professional services or solutions consultants in the biz.)

Golden Handcuffs
Golden handcuffs are what we call the high salaries paid to good engineers so they don’t leave because the work sucks and they are not learning anything new. This cost is passed on to you, of course. And sooner or later, they are going to leave.  Maybe in the middle of your project. Probably to a better platform in the same industry. A lateral move pay-wise, but a raise in enjoyment.

Ill Communication
The experience could be bad because the vendor doesn’t support system integrators well. Poor documentation, no access to the bug tracker, and firewalls between the integrators and the product team. When something goes wrong, it’s on the integrator to prove it to the product team, which is a long and painful process when you can’t talk to them directly. Many eye-rolling questions. (“Have you tried turning it on and off again?”) And that time is billed to you!

A Pile of Garbage
Often times, the reason it’s a bad developer experience is because it’s bad software. It didn’t start that way (I assume). The 1.0 version was pretty good, used the latest and greatest frameworks and incorporated smart design decisions.

Then the sales division took over. If the vendor has to spend a lot on sales, and that team is credited with driving revenue, they amass power. When that mindset grows to the point where engineering can’t make a stand for quality, you’re on a path to a pile of garbage.

Sales will push for new features over fixing bugs. Marketing will mandate deadlines to meet their initiatives (trade shows, commercials, etc.), rather than how long it takes to do it right.

Have you heard of the term technical debt? That’s what you get when you rush developers instead of giving them time to do it right. Quick and dirty is expected every once in a while, but they need time to clean up the mess. In sales-driven organizations, that rarely happens.

Unfortunately, piles of garbage can have pretty long shelf lives. Prohibitive sales costs keep out challengers, and customers rely too much on social proof. Waaay too much.

From a technical perspective, however, I have not seen a product recover from pile-of-garbage status.1 They become notorious among engineers. The problem you’ll have is that engineers aren’t terribly confrontational. Notice that I have not called out any bad actors. Engineers love the truth and want to be honest, but they’re unlikely to openly bash products, especially the ones paying their salary. The good ones simply find better work.

What to do?
I started writing a comprehensive list of questions you want to ask when choosing software. But I don’t want to mislead you into thinking a list of questions off the internet prepares you for shelling out megabucks on software, with answers from dubious sources. If you have a software development background, and bought that type of software before, and oversaw its implementation and use, you’re in good shape. Otherwise, hire a consultant. Doesn’t have to be me, just someone with that experience. Then get them

Then get them to act like a seedy private eye. Wine and dine developers to get them to open up about how much they enjoy the platform. You will be shocked at how many vendors you eliminate this way.

You can comment on Hacker News.

  1. If you have, I’d love to hear about it. []