samkgg Lessons Learned

(This is a regularly updated page where I document what I’ve learned while building samkgg, a demo app for AWS SAM, Kotlin, GraalVM, Gradle, and other technologies as this progresses.

Read the samkgg blog post first to understand what I’m doing. You can find the working code here:

https://github.com/madeupname/samkgg

Builds and Dependencies

It’s important to consider how you organize your code. There are a couple options that give you flexibility. A critical point is that you don’t need to use Graal for everything! The main advantage of Graal is fast cold startups. If you are serving users interactively, use Graal. But some Lambda functions run asynchronously or in batch and it doesn’t matter much then. It might save you developer time to build a standard JVM function and not worry about reflection. SAM supports this.

When you create your project with “sam init” as described in the README, it will create a single Lambda function in a single Gradle project. You can put multiple Lambda functions in that same directory and build/deploy them without error by configuring the path for each function in template.yaml. However, each function will have the same executable and hence the combined dependencies for all functions in your project. This is a little better with Graal, but not recommended.

There is no way to make this a traditional Gradle multi-project build. Each function needs its own Gradle build:

https://github.com/aws/aws-sam-cli/issues/3227

In lieu of a multi-project build, you can do two things. First is to follow Gradle guidelines for large projects. This is definitely not a large project, but we organize it like one. Second is to add more build logic in via Makefiles. Yes, Makefiles:

https://makefiletutorial.com

Other notes:

  • Despite the GraalVM term “native image” the SAM package type is still zip. Hence, Lambda layers are allowed.
  • Similarly, the use-container flag passed to “sam build” means it is going to build the functions inside a Docker container. Since it’s building a binary for a specific platform, the build runs in that target platform with the necessary dependencies.
  • The Kotlin stdlib JDK 8 dependency threw me, but that’s the latest version. It still works with JVM 17 targets.
  • AWS SDK v2 is recommended, most importantly since they are claiming it is GraalVM-safe. It was also reworked for faster Lambda cold starts. However, it is possible to include both v1 and v2 in the same project, which could be required given v2 still has not reached feature parity.

 

Windows

You want to run WSL 2 to support Docker and run Ubuntu locally (more below). This is possibly just my machine, but Docker Desktop makes the “WMI Provider Host” process suck up CPU, blasting the fans, and none of the solutions I’ve read fix it. YMMV.

Git Bash (MINGW64) seems to support the most commands well and doesn’t need Docker, so that’s my go-to shell. Of course, IntelliJ can run Gradle directly and that’s handy, too.

However, Ubuntu is required for running the tracing agent, which is required to ensure you have all your GVM resource and reflection config. And unfortunately that needs Docker Desktop running.

GraalVM

Note: I believe you can/should always use the latest stable version because you are creating an actual binary in a custom runtime. Even though you choose a GraalVM JDK version (11 or 17) when creating the app via the CLI, there’s no separate VM/runtime supplied by AWS. Native images are self-contained.

Minimum docs to read as of this writing. I’ve seen docs change significantly between versions.

https://www.graalvm.org/22.2/reference-manual/native-image/

https://www.graalvm.org/22.2/reference-manual/native-image/basics/

https://www.graalvm.org/22.2/reference-manual/native-image/metadata/

Skipping the docs will lead to a bad time. It may also lead to ignoring viable libraries just because they use reflection.

GraalVM can use code with reflection, it just needs to know about it beforehand. In particular, it needs to know all of the possibilities in advance. If your code dynamically instantiates any of number classes, Graal needs to know which classes those are so they are included in the executable.

Reachability Metadata and the Tracing Agent

One of the biggest challenges with working with Graal is configuring it to include what will be needed at runtime, including dynamic features and resources. native-image does its best to discover this at build time, but it can’t predict everything. Hence you must give it supplemental instructions via “reachability metadata” commonly shortened to metadata. Key files are:

  • resource-config.json – this forces the build tool to include these files so that they may be loaded at run time when it’s not obvious to the tool at compile time
  • reflect-config.json – specify what will be called reflectively so the VM can prepare this code for execution in the binary

Some libraries supply or document this metadata, but most haven’t. This is somewhat eased by using the Tracing Agent (see metadata docs above), which is a runtime Java agent that tracks every time reflection or dynamic behavior is used and stores everything in special config files. The two key ones are

However, the agent can only detect these usages during the run. If your run with the agent skips any classes, methods, even conditional branches, and they would have used reflection, the config files will not get updated and your code can have an error during runtime. A solution to this is to run the agent when you run your tests, assuming your tests have good coverage. You want to add exclude filters for your test libraries.

When I ran it on this simple project, though, the output was enormous (41KB) because it’s listing everything individually. It’s like importing every single class from a package instead of using a wildcard. A small binary is a high priority for Graal. The good news is there is a flag to merge metadata from multiple runs into a single file.

Given all that, you can see why all libraries and frameworks seeking to be GraalVM-friendly (like Micronaut and Quarkus) seeks to avoid reflection like the plague. Sadly, I was using Spock for tests (huge Groovy fan), and discovered the agent will include everything for Groovy and Spock, which is way too much to wade through. I then understood why plain old JUnit was chosen for the template.

Environments/Stacks

Environment is an overloaded term in SAM. You have:

  • SAM CLI environment
    • the “sam” command has a config file named samconfig.toml and this file divides configuration settings among environments
    • the default environment is literally called default; create others as needed
    • you specify the environment for the command with –config-file
  • environment variables
    • there is a block for this in template.yaml
  • environments where the function runs
    • local or AWS

Finally, what a programmer might think of as a deployed environment (qa, production) CloudFormation (and SAM by extension) calls a stack.

Per SAM best practices, you create a stack per environment, such as development, staging, or production. So together with environments, your code can be:

  • deployed to AWS in different stacks
    • development
    • production
  • local
    • Docker container (sam local invoke)
      • development
      • production
    • test (gradle test, no container)

Each of these has subtle differences that are not always obvious/documented. I do my best to document the surprises here.

SAM CLI can be passed a JSON file and/or command line options that contain overrides for the Environment variables in your template.yaml. Two critical points:

  • You have to create environment variables in template.yaml or the Lambda environment won’t pass them to your function, even if they exist.
  • One very misleading issue is that the Parameter block of the JSON file is for global environment variables, not parameters! I was not getting my global env vars overriden even though they were declared in template.yaml and specified for the function. For safety’s sake, I duplicate them in the JSON file.

Logging

I’m not the first engineer to find logging with Lambda and Graal to be surprisingly challenging. My initial choice for logging was AWS Lambda Powertools. It looks like a solid library to help you troubleshoot, however, it relies on Log4j 2 which is not GraalVM friendly. According to that thread, the maintainers say it won’t be ready for Graal until Log4j 3 at the earliest.

Graal officially supports java.util.logging (AKA JUL), which is nice because it’s one less dependency (setup docs here). However, for reasons I don’t yet understand, log messages directly from JUL did not show up when run from Lambda. They worked fine when testing outside the container, which you’ll find is a common challenge with this stack.

The solution was adding SLF4J with the JDK 1.4 logging support library (not the “bridge” library – that sends all JUL logs to SLF4J). SLF4J should also enable logging from AWS Java SDK, and I have seen an instance of an AWS SDK log message in my console.

The next challenge was determining per-environment configuration given the differences:

  • development and prod should have different logging.properties files to set different log levels
  • code deployed to AWS uses CloudWatch

My first attempt was to configure the loggers at build time so Graal didn’t have to include the files in the native image. But Graal considers creating an input stream from a resource to be unsafe during initialization. Logging is still configured during class initialization, but at runtime, not build time. I use AWS-provided environment variables to determine if it’s in a container and if that container is AWS-hosted.

The next issue is that Lambda logs everything in the console to CloudWatch, which is good, but CloudWatch sees every newline as the start of a new log message. I have created a CloudWatchFormatter replaces newlines with carriage returns (which CW doesn’t mind) if the code is running on AWS. My next goal is creating a JSON formatter to allow better use in CloudWatch Insights.

Another interesting concept is Mapped Diagnostic Context (MDC), which is like a thread-bound map. It is not supported by JUL, but SLF4J offers a basic adapter. You can put any key/value into the map and it will be visible to all methods that get the MDC adapter. I added the AWS Request ID from the context so it can be logged with messages from any source.

I owe much of this to Frank Afriat’s blog post and GitHub project. You may wish to use his implementation, which is more robust than mine, but is also marked alpha and relies on SLF4J’s simple logger, which is not as robust as JUL.

DynamoDB

I found the DynamoDB Guide an excellent supplement to the docs.

I don’t see any library that handles schema and data migrations for DDB like Flyway and Liquibase do for SQL. CloudFormation can build tables for you, but it can’t handle data migrations that naturally occur in an active project with a changing schema. Luckily, you can implement the basics of Flyway pretty trivially.

A particular challenge not shown by the docs is a very common scenario: a class has a list of custom objects. Mapping this in the enhanced client (their ORM) is not at all obvious. You have to map it to a list of JSON document, possibly with a custom attribute converter. I think they didn’t bother making this easy because with DDB, this relationship is only useful when it’s a composition relationship, meaning the objects in the collection don’t exist outside their parent. Otherwise, they would be stored in a separate table with any relationship managed entirely in code since DDB doesn’t support joins.

In adding the DynamodDB dependencies, I found I had to update the reflection config due to a transitive dependency on Apache HTTP Client.

Kotlin

I’m learning Kotlin from scratch and it’s less intuitive than I thought, especially (ironically?) coming from Groovy. Running into issues where things don’t work and it’s not clear why. IntelliJ is a big help here, offering fixes that work and shine a light on where to look for help. I think a lot of the challenge comes from how strict the typing is. But there have also been a few times where I fired up the debugger to find out why something isn’t working… and then it just does. To be clear, it was 1) run test, which fails 2) debug run test (no changes) it now works. Like it was really an IntelliJ state problem, which I’d experienced in the past with Gradle. Refreshing the Gradle project might help, YMMV.

Kotlin has some useful features like data classes that automatically give you equals and hashcode methods, but you have to mind the constructors.

 

samkgg: AWS SAM + Kotlin + Graalvm + Gradle

I am starting a new open source project called samkgg. It’s a learning in public project that I hope will help others who want to adopt these technologies. You can find it here:

https://github.com/madeupname/samkgg

I thought I’d explain a bit about why and how I chose these technologies. The main driver is an is an Alexa skill that I’m building. This is not for fun, but what I hope to be a solid commercial product. And even if you’re not building an Alexa skill, you’ll see how to build a production serverless app.

TL;DR

  • Lambda because Alexa docs mostly assumes this
  • GraalVM to eliminate Lamba cold start delays
  • SAM is way more mature than Amplify
  • Kotlin because I can’t use Groovy
  • DynamoDB because it fits in with the serverless model

I am a fan of the philosophy “two ways to win, no way to lose.” If my product doesn’t take off, I still win by adding the serverless architecture to my tool chest. I also find voice UIs/ambient computing to be very interesting.

Why Lambda/serverless?

If you look into how to build Alexa skills, there are various ways you can do it. Technically, I could just write my own REST API in any framework. But almost all the documentation assumes you are building on Lambda. I prefer modular monoliths for most projects, especially to start. But I’m less excited about choosing a poorly documented route. Just using Java is hard enough.

Amazon does have a hosted development platform for Alexa, but this only supports JavaScript and Python.  My background is JVM technologies and I have no desire to throw away that experience and switch ecosystems. I think most developers are far too cavalier about that and I also believe in the power of “skill stacks” – building on your experience.

Luckily, Java on Lambda is no problem. I was greatly helped by the book Programming AWS Lambda, which is Java-specific.

Well, when I say Java is “no problem,” I mean “possible.” In reality, it has an infamously slow startup/load time. This is why most Lambda functions – and microservices in general – are written in interpreted languages like JavaScript and Python. Java is faster if the process stays up long enough, but with serverless you’re frequently starting a new server, so users can experience very slow responses. This can be especially painful for a voice UI – there is no spinner.

Why GraalVM?

The solution to slow cold starts is GraalVM. It allows you to compile Java (and other languages) into native binaries that start up super fast. If you haven’t investigated it, you should know:

  • some years ago, a very high level Oracle exec told me they considered it the future of Java
  • first production release was over 3 years ago
  • it’s used in production today by big companies like Twitter
  • it recently got first-class support from Lambda (JDK 11 and 17)

Why SAM?

My next decision was what technology to implement the Lambdas in. I first looked at Amplify, since AWS was touting this as a low code solution, practically no-code with the Figma builder tool. But after a spending a lot of time, I came away deeply disappointed. Politely, this is the common conclusion. I think only JS developers will be comfortable and I didn’t love that GraphQL was the main technology.

But the biggest problem was no easy path to learn. Much of the docs frequently lagged the tooling, even the “Getting Started” material. Tutorials should be flawless, especially for a commercial tool. Due to its pricing, I think it has tremendous potential for startups, but needs a docs-first approach. I’ll look again in a couple years.

I learned of SAM from a Hacker News discussion of Amplify where it was repeatedly recommended over it. After looking into it, it seemed the right choice for me. Like Amplify, it’s a productivity layer on core AWS technologies (mainly CloudFormation).  Both hit the Law of Leaky Abstractions, but SAM has a smaller surface area. That said, with SAM you will still need to understand:

  • AWS fundamentals
  • CloudFormation
  • API Gateway
  • Lambda (duh)

and more, at a deep level. And in the stack I’m building includes:

  • DynamoDB
  • Cognito
  • CodeDeploy
  • Kotlin
  • Gradle
  • GraalVM

and possibly CodeBuild and Cloud Developer Kit (CDK). In short, about the same as you needed for Spring/Hibernate or Groovy/Grails (plus your DB, CI/CD, etc.).

As Fredrick Brooks said, there is no silver bullet.

Why DynamoDB?

I’ll add I was on the fence a bit with DynamoDB, having not done NoSQL before. But what I really like about SQL is Hibernate/JPA (and really, GORM). And Hibernate does not work in GraalVM. OK, there is a Quarkus plugin for it, and I found a library to make it work, but I did not find a definitive statement: “Hibernate is guaranteed to work in production on GraalVM.” I more found, “This is all stuff we had to do to get it to work.” I love Hibernate, but I need a guarantee.

In fact, I couldn’t find any Java ORM that was officially compatible with GraalVM native images. This makes sense because GraalVM doesn’t allow reflection, which is an obvious feature to use in an ORM.

DynamoDB has a mapper, which is a very lightweight ORM, but at least it works with GraalVM. Well, other than schema creation, but you probably want to do that programmatically to handle migrations anyway. Unfortunately, there is no migration library like Flyway for DynamoDB.

For a small schema, it feels more natural to work directly with an object graph, which is what you get by persisting to JSON. But for a sizeable app I’d consider Aurora Serverless PostgreSQL and Flyway. Going back to mapping result sets does not sound fun, though.

Why not Micronaut? Or Quarkus? Or…

It’s natural to consider one of the serverless/microservice frameworks. I know excellent people on both the Micronaut and Quarkus teams. But they seem an even further abstraction from AWS. If I were building generic microservices, not an Alexa skill, I’d consider them. I would definitely look if I were trying to be cloud agnostic, but that’s silly for an Alexa skill.

So if this sounds interesting, please check it out:

https://github.com/madeupname/samkgg

Free Streaming and Downloads of Movies, TV, and Books

No, I haven’t been hacked. I found two nice services to download stuff for free, both powered by your library card. OK, technically you are “borrowing” stuff.

Hoopla let’s you borrow movies, TV shows, audio books, ebooks, etc. all for free. You get 15 borrows per month. If you’ve resorted to watching the dregs of Netflix, you need to check this out.

https://www.hoopladigital.com

Kanopy is focused on movies. It gives you 9 borrows per month from their catalog. When you arrive on the site, you’ll notice that all the recommendations appear to come from undergrads plugging their friends’ films. However, I see a lot of gems. 50 films from the Criterion Collection alone, plus modern greats like Ex Machina and Lady Bird. Tons of world cinema. Like Hoopla, all free.

https://kanopy.com

Shipt: An In-Depth Review

UPDATE 10/5/2020:

TL;DR – 3rd party grocery delivery services cost 50% more than delivery directly from the supermarket.

I ordered directly from the supermarket (Vons, $100 total) and compared it to the same order from Shipt. The store had:

  • Same number of delivery slots.
  • Much bigger selection.
  • Fewer items sold out, even though the order was double my usual.
    • Assume supermarket employees store filled the order from the stock room, not the shelves.
    • Store doesn’t sell you items it knows are sold out or that it doesn’t carry at that location.
    • This means way fewer substitutions, and the couple I got were good.
  • Fewer items missing/forgotten.
    • Personal shoppers are rushing to maximize hourly rate and it shows.
  • Big savings. Even with a $10 delivery fee, I saved over $50 vs Shipt and got rewards points on my card. Food markup plus a 15% tip (not required, but expected) adds 50% to your total price.

 

In most cases, doesn’t make sense to use Shipt/Instacart even with the flat yearly fee.

ORIGINAL POST:

I recently solicited opinions on the shopping service Shipt but didn’t get much in response, so I thought I’d try them out. I was running out a few essentials (coffee is an essential) and since I don’t have a trustworthy face mask, I figured I’d try Shipt. It didn’t hurt that I walk to the grocery store and it’s raining.

Basic facts:

  • $99/year or $14/month
  • adds about 15% to most grocery store prices
  • a minimum order size of $35 (or a surcharge)
  • multiple addresses allowed
    • nationwide
    • save Mom a trip
  • you shop through their app or web site (except Target)
  • doesn’t use your rewards card/account (except Petco and Target)
    • no points, gas discount, etc.
    • no digital coupons
    • no mix and match, “$5 for 5,” etc.
  • Target is special
    • integrated into site, including search and checkout
    • charges you a $6 delivery fee on top of Shipt, so considerably more than the 5% surcharge until you hit $120
    • does credit you for the purchase (rewards work)
  • Many big chains supported. Enter ZIP code to learn which, since someplace you consider close, Shipt considers too far. List includes:
    • Costco
    • CVS
    • HEB
    • Kroger
    • Meijer
    • Office Depot/OfficeMax
    • Petco
    • Publix
    • Ralphs
    • Sur la Table
    • Target
    • Vons/Pavilions
    • Winn Dixie

Here’s the result of my experience.

Pros:

  • I am still safe and healthy and now have coffee.
  • Cheaper than store delivery, even if you just use it monthly.
  • Saved ~45 minutes on a one-bag trip.
  • If it’s not available, shopper will send you a photo of a substitution for your approval.
  • Items dropped right at my door.
  • Friendly shopper.
  • I have not heard of Shipt shoppers walking off the job in protest.

Cons:

  • Time slots are currently all taken.
    • You can only choose for today or tomorrow.
    • No booking a week out.
    • No notification when a slot opens up.
    • Yes, you literally have to keep hitting refresh to see if a slot was added/freed.
  • Got a 7-8PM delivery window for the next day, shopping didn’t start until 8:30PM.
    • I opted for possible early delivery, which gave me a 8AM-8PM window. Consider that a pro, even though it didn’t happen.
  • Many items are missing from their website, but you know the store carries it. Some stores have a lot more missing than others.
    • I now see that Ralphs seems to have integrated their inventory system, but not Von/Pavilions. I made the mistake of choosing Pavilions. Test things out yourself.
  • If it’s missing, you can make a special request. However, when I got my bill they charged me list price plus an extra dollar, when I know it was on sale. So an extra $3.50 for two items.
    • Adding insult to injury, one of the two was the wrong variety.
  • Substitutions were sometimes way off.
    • She could think of no substitution for sesame seed sandwich rolls (really?), so they were just not included.
    • Suggested buying a bottle of olive oil 3x larger. Luckily I corrected.
    • I approved an 8 oz bag of mozzarella of a different brand. Then she bought a huge 32 oz. bag of a different type. Why not the one she suggested?
    • Conspiracy theory: they are trying to make me fat(ter).
  • Online help/FAQ practically nonexistent.
  • Chat agents take literally 3 hours to respond.
    • They admit they are taking a week to respond to support emails.

Let’s be clear: in the scheme of things, it’s not a big deal. You send a family member out to get your groceries, probably just as bad. Especially if you send them out at the end of a long day in the middle of the apocalypse. I gave her a good rating and a big tip. But yeah, when this madness is over I expect them to do a lot better.

Would I do this again? Over doing it myself? In the middle of a pandemic? Of course! I will just make sure I start earlier and get a morning slot.

If you think this is worth it for you, you’ll save $10 (and I’ll get $10 credit) if you use my sign up link:

http://share.shipt.com/RGgVs

Stay safe!

What is the Mautic Plugin Published setting?

If you’ve installed Mautic and head over to plugins, you notice there is a setting called “Published” with choices Yes or No. Inexplicably, there is absolutely no documentation on this.

Published means enabled. Apparently, they thought it was obvious since the tab is called Enabled/Auth. But given that some of these plugins might publish something it’s a poorly chosen name since a new user might not be clear about it.