samkgg Lessons Learned

(This is a regularly updated page where I document what I’ve learned while building samkgg, a demo app for AWS SAM, Kotlin, GraalVM, Gradle, and other technologies as this progresses.

Read the samkgg blog post first to understand what I’m doing. You can find the working code here:

https://github.com/madeupname/samkgg

Builds and Dependencies

It’s important to consider how you organize your code. There are a couple options that give you flexibility. A critical point is that you don’t need to use Graal for everything! The main advantage of Graal is fast cold startups. If you are serving users interactively, use Graal. But some Lambda functions run asynchronously or in batch and it doesn’t matter much then. It might save you developer time to build a standard JVM function and not worry about reflection. SAM supports this.

When you create your project with “sam init” as described in the README, it will create a single Lambda function in a single Gradle project. You can put multiple Lambda functions in that same directory and build/deploy them without error by configuring the path for each function in template.yaml. However, each function will have the same executable and hence the combined dependencies for all functions in your project. This is a little better with Graal, but not recommended.

There is no way to make this a traditional Gradle multi-project build. Each function needs its own Gradle build:

https://github.com/aws/aws-sam-cli/issues/3227

In lieu of a multi-project build, you can do two things. First is to follow Gradle guidelines for large projects. This is definitely not a large project, but we organize it like one. Second is to add more build logic in via Makefiles. Yes, Makefiles:

https://makefiletutorial.com

Other notes:

  • Despite the GraalVM term “native image” the SAM package type is still zip. Hence, Lambda layers are allowed.
  • Similarly, the use-container flag passed to “sam build” means it is going to build the functions inside a Docker container. Since it’s building a binary for a specific platform, the build runs in that target platform with the necessary dependencies.
  • The Kotlin stdlib JDK 8 dependency threw me, but that’s the latest version. It still works with JVM 17 targets.
  • AWS SDK v2 is recommended, most importantly since they are claiming it is GraalVM-safe. It was also reworked for faster Lambda cold starts. However, it is possible to include both v1 and v2 in the same project, which could be required given v2 still has not reached feature parity.

 

Windows

You want to run WSL 2 to support Docker and run Ubuntu locally (more below). This is possibly just my machine, but Docker Desktop makes the “WMI Provider Host” process suck up CPU, blasting the fans, and none of the solutions I’ve read fix it. YMMV.

Git Bash (MINGW64) seems to support the most commands well and doesn’t need Docker, so that’s my go-to shell. Of course, IntelliJ can run Gradle directly and that’s handy, too.

However, Ubuntu is required for running the tracing agent, which is required to ensure you have all your GVM resource and reflection config. And unfortunately that needs Docker Desktop running.

GraalVM

Note: I believe you can/should always use the latest stable version because you are creating an actual binary in a custom runtime. Even though you choose a GraalVM JDK version (11 or 17) when creating the app via the CLI, there’s no separate VM/runtime supplied by AWS. Native images are self-contained.

Minimum docs to read as of this writing. I’ve seen docs change significantly between versions.

https://www.graalvm.org/22.2/reference-manual/native-image/

https://www.graalvm.org/22.2/reference-manual/native-image/basics/

https://www.graalvm.org/22.2/reference-manual/native-image/metadata/

Skipping the docs will lead to a bad time. It may also lead to ignoring viable libraries just because they use reflection.

GraalVM can use code with reflection, it just needs to know about it beforehand. In particular, it needs to know all of the possibilities in advance. If your code dynamically instantiates any of number classes, Graal needs to know which classes those are so they are included in the executable.

Reachability Metadata and the Tracing Agent

One of the biggest challenges with working with Graal is configuring it to include what will be needed at runtime, including dynamic features and resources. native-image does its best to discover this at build time, but it can’t predict everything. Hence you must give it supplemental instructions via “reachability metadata” commonly shortened to metadata. Key files are:

  • resource-config.json – this forces the build tool to include these files so that they may be loaded at run time when it’s not obvious to the tool at compile time
  • reflect-config.json – specify what will be called reflectively so the VM can prepare this code for execution in the binary

Some libraries supply or document this metadata, but most haven’t. This is somewhat eased by using the Tracing Agent (see metadata docs above), which is a runtime Java agent that tracks every time reflection or dynamic behavior is used and stores everything in special config files. The two key ones are

However, the agent can only detect these usages during the run. If your run with the agent skips any classes, methods, even conditional branches, and they would have used reflection, the config files will not get updated and your code can have an error during runtime. A solution to this is to run the agent when you run your tests, assuming your tests have good coverage. You want to add exclude filters for your test libraries.

When I ran it on this simple project, though, the output was enormous (41KB) because it’s listing everything individually. It’s like importing every single class from a package instead of using a wildcard. A small binary is a high priority for Graal. The good news is there is a flag to merge metadata from multiple runs into a single file.

Given all that, you can see why all libraries and frameworks seeking to be GraalVM-friendly (like Micronaut and Quarkus) seeks to avoid reflection like the plague. Sadly, I was using Spock for tests (huge Groovy fan), and discovered the agent will include everything for Groovy and Spock, which is way too much to wade through. I then understood why plain old JUnit was chosen for the template.

Environments/Stacks

Environment is an overloaded term in SAM. You have:

  • SAM CLI environment
    • the “sam” command has a config file named samconfig.toml and this file divides configuration settings among environments
    • the default environment is literally called default; create others as needed
    • you specify the environment for the command with –config-file
  • environment variables
    • there is a block for this in template.yaml
  • environments where the function runs
    • local or AWS

Finally, what a programmer might think of as a deployed environment (qa, production) CloudFormation (and SAM by extension) calls a stack.

Per SAM best practices, you create a stack per environment, such as development, staging, or production. So together with environments, your code can be:

  • deployed to AWS in different stacks
    • development
    • production
  • local
    • Docker container (sam local invoke)
      • development
      • production
    • test (gradle test, no container)

Each of these has subtle differences that are not always obvious/documented. I do my best to document the surprises here.

SAM CLI can be passed a JSON file and/or command line options that contain overrides for the Environment variables in your template.yaml. Two critical points:

  • You have to create environment variables in template.yaml or the Lambda environment won’t pass them to your function, even if they exist.
  • One very misleading issue is that the Parameter block of the JSON file is for global environment variables, not parameters! I was not getting my global env vars overriden even though they were declared in template.yaml and specified for the function. For safety’s sake, I duplicate them in the JSON file.

Logging

I’m not the first engineer to find logging with Lambda and Graal to be surprisingly challenging. My initial choice for logging was AWS Lambda Powertools. It looks like a solid library to help you troubleshoot, however, it relies on Log4j 2 which is not GraalVM friendly. According to that thread, the maintainers say it won’t be ready for Graal until Log4j 3 at the earliest.

Graal officially supports java.util.logging (AKA JUL), which is nice because it’s one less dependency (setup docs here). However, for reasons I don’t yet understand, log messages directly from JUL did not show up when run from Lambda. They worked fine when testing outside the container, which you’ll find is a common challenge with this stack.

The solution was adding SLF4J with the JDK 1.4 logging support library (not the “bridge” library – that sends all JUL logs to SLF4J). SLF4J should also enable logging from AWS Java SDK, and I have seen an instance of an AWS SDK log message in my console.

The next challenge was determining per-environment configuration given the differences:

  • development and prod should have different logging.properties files to set different log levels
  • code deployed to AWS uses CloudWatch

My first attempt was to configure the loggers at build time so Graal didn’t have to include the files in the native image. But Graal considers creating an input stream from a resource to be unsafe during initialization. Logging is still configured during class initialization, but at runtime, not build time. I use AWS-provided environment variables to determine if it’s in a container and if that container is AWS-hosted.

The next issue is that Lambda logs everything in the console to CloudWatch, which is good, but CloudWatch sees every newline as the start of a new log message. I have created a CloudWatchFormatter replaces newlines with carriage returns (which CW doesn’t mind) if the code is running on AWS. My next goal is creating a JSON formatter to allow better use in CloudWatch Insights.

Another interesting concept is Mapped Diagnostic Context (MDC), which is like a thread-bound map. It is not supported by JUL, but SLF4J offers a basic adapter. You can put any key/value into the map and it will be visible to all methods that get the MDC adapter. I added the AWS Request ID from the context so it can be logged with messages from any source.

I owe much of this to Frank Afriat’s blog post and GitHub project. You may wish to use his implementation, which is more robust than mine, but is also marked alpha and relies on SLF4J’s simple logger, which is not as robust as JUL.

DynamoDB

I found the DynamoDB Guide an excellent supplement to the docs.

I don’t see any library that handles schema and data migrations for DDB like Flyway and Liquibase do for SQL. CloudFormation can build tables for you, but it can’t handle data migrations that naturally occur in an active project with a changing schema. Luckily, you can implement the basics of Flyway pretty trivially.

A particular challenge not shown by the docs is a very common scenario: a class has a list of custom objects. Mapping this in the enhanced client (their ORM) is not at all obvious. You have to map it to a list of JSON document, possibly with a custom attribute converter. I think they didn’t bother making this easy because with DDB, this relationship is only useful when it’s a composition relationship, meaning the objects in the collection don’t exist outside their parent. Otherwise, they would be stored in a separate table with any relationship managed entirely in code since DDB doesn’t support joins.

In adding the DynamodDB dependencies, I found I had to update the reflection config due to a transitive dependency on Apache HTTP Client.

Kotlin

I’m learning Kotlin from scratch and it’s less intuitive than I thought, especially (ironically?) coming from Groovy. Running into issues where things don’t work and it’s not clear why. IntelliJ is a big help here, offering fixes that work and shine a light on where to look for help. I think a lot of the challenge comes from how strict the typing is. But there have also been a few times where I fired up the debugger to find out why something isn’t working… and then it just does. To be clear, it was 1) run test, which fails 2) debug run test (no changes) it now works. Like it was really an IntelliJ state problem, which I’d experienced in the past with Gradle. Refreshing the Gradle project might help, YMMV.

Kotlin has some useful features like data classes that automatically give you equals and hashcode methods, but you have to mind the constructors.