LangFlow on AWS EC2

LangFlow is a React front end for LangChain that makes it easier to visualize what you are doing. My team is currently using for a hackathon so we wanted a hosted version we could all play with.

I don’t know of any security holes in LangFlow, but it’s always safest to assume there are, so act accordingly.

I’ll add I first tried to get this running on Replit and completely gave up during the install process. We assume it’s an issue with NixOS. I’ll note it specifically broke trying to install llama-cpp-python. There is a previous version deployed to Replit, but even forking that and trying to upgrade did not work.

So I spun up an EC2 t3.small to install LangFlow but ran into problems. The first is that the AWS defaults do not offer enough disk space. Worse, no swap disk is configured and for some reason when pip moves files it puts them through memory, so installing a larger package like torch will kill your process due to OOM, and your connection, losing history, etc. It’s not super obvious this is the cause. My first time, I had to restart the whole instance.

Easy mode would be to just run a bigger machine, maybe t3.medium 4GB RAM would do it. But I kept at it, recreating the instance with more disk space (16GB) and a second volume for swap (2GB).

Referring to:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-store-swap-volumes.html
https://stackoverflow.com/questions/74515846/error-could-not-install-packages-due-to-an-oserror-errno-28-no-space-left-on

$ swapon -s

Probably shows nothing, but some instance types are automatically configured with a swap partition or file.

$ lsblk -p
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
/dev/nvme0n1 259:0 0 16G 0 disk
/dev/nvme0n1p1 259:2 0 16G 0 part /
/dev/nvme0n1p127 259:3 0 1M 0 part
/dev/nvme0n1p128 259:4 0 10M 0 part
/dev/nvme1n1 259:1 0 2G 0 disk

That last one at 2G is the volume we provisioned or swap. To get the real name, you have to go to the storage tab of your EC2 dashboard, or try:

$ sudo /sbin/ebsnvme-id /dev/nvme1n1
Volume ID: vol-04d10a66ad2521d93sdb

Should match what the EC2 dashboard when you created the partition. Now create and verify the swap:

$ sudo mkswap /dev/sdb
Setting up swapspace version 1, size = 2 GiB (2147479552 bytes)
no label, UUID=2aff5f50-ff55-431e-a375-d2ea9edde8d9
$ sudo swapon /dev/sdb
$ free -h
total used free shared buff/cache available
Mem: 1.9Gi 189Mi 1.4Gi 0.0Ki 258Mi 1.5Gi
Swap: 2.0Gi 0B 2.0Gi

OK, now for the real show. Python 3 is installed, check with:

python3 –version

We can add an alias:

alias python='python3'

PIP is not installed. Followed instructions: https://pip.pypa.io/en/stable/installation/

But don’t use ensurepip because it won’t install a pip command – you’ll have to run

python -m pip …

every time. Yes, it can be aliased, but this solves it:

curl -O https://bootstrap.pypa.io/get-pip.pypython get-pip.py

Check with:

pip –version

Several packages are required, shown here in order of the installation breaking:

$ sudo yum install cmake
$ sudo yum install gcc
$ sudo yum install gcc-c++
$ sudo yum install python3-devel

 

$ pip install langflow

LangFlow is now installed and you can run it with the langflow command. However, it will bind to 127.0.0.1, preventing outside access. However, this is OK because we want to lock it down. If you run it like that, nothing is encrypted, including your API keys.

We have a couple choices here. The easy way is to set up an Application Load Balancer  in AWS, let it handle certs, etc. I’m pretty sure you can point it to the instance so if it changes, it automatically updates the IP. You get a load balancer in the free tier, too.

But I’m saving that for another project, so let’s got the cheapskate route and make it hard on ourselves. We’ll set up Nginx as a reverse proxy with Let’s Encrypt for certs, but you could use Apache HTTPD or HAProxy if you prefer.

sudo yum install nginx

Now it’s installed, you need to add a block in the config file for Certbox to update for you. I took the extra step to point a (sub)domain I own to the public IP of my EC2 box, which is the server name below. OK, full disclosure, I do not own example.com. But you get the picture.

$ nano /etc/nginx/nginx.conf

Add:

    server {
        server_name  langflow.madeupname.com;
        #root         /usr/share/nginx/html;

        # Load configuration files for the default server block.
        include /etc/nginx/default.d/*.conf;

        location / {
                proxy_pass http://127.0.0.1:7860;
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection 'upgrade';
                proxy_set_header Host $host:$server_port;
                proxy_cache_bypass $http_upgrade;
        }
    }

Next step is getting a cert from Let’s Encrypt. It recommends installing Certbot via snapd. Snapd could be installed via yum/dnf if you had support for EPEL on AL2023, but they removed EPEL support because they don’t like you.1 So you have to install it via pip, which thankfully you just installed.

Instructions: https://certbot.eff.org/instructions?ws=nginx&os=pip

sudo dnf install augeas-libs
yum search certbot # verify you don’t have this installed already
sudo /opt/certbot/bin/pip install --upgrade pip
sudo /opt/certbot/bin/pip install certbot certbot-nginx
sudo ln -s /opt/certbot/bin/certbot /usr/bin/certbot
sudo certbot --nginx

That last step automatically updates nginx.conf for you and reloads, which is pretty slick.

To be extra safe, I created a langflow user to run this instead of ec2-user, since it won’t even have sudo access.

$ sudo adduser langflow
$ sudo cp -pr .local/ logs/ .config/ tmp/ .cache/ .chroma/ /home/langflow
$ sudo chown -R langflow:langflow /home/langflow
$ sudo su - langflow
$ langflow

Finally, go to the EC2 dashboard, Security tab, and add a new inbound rule – or change the existing one for HTTPS – to limit it to “My IP” so it’s only accessible by your box. At this point, you should be able to go to langflow.example.com and it should come up.

Good luck!

  1. Sorry, it’s possible they don’t like anybody. []

Notes on AI / GPT / LLM

Thought I’d share my notes on GPT/AI in advance of tonight’s online discussion of GitHub Copilot. I’ve been doing a deep dive on AI as it relates to my particular consulting practice area, programmer productivity, starting with an 80 page article by Stephen Wolfram on how ChatGPT works. It was surprisingly accessible and here are a few things I learned:

  • We don’t really understand how GPT does what it does. By “we”, he means humanity. And that’s because we don’t fully understand how neural nets work. They seem to work like human brains, but as everyone knows, science knows little about how brains work.
  • Training neural nets is more art (or craft) than science. Lot of educated guesses and empirical observation. Wolfram uses the phrase “neural net lore” many times.
  • After a certain point, training with more examples degrades performance.
  • Training on text does not require tagging like it does with many image AI projects.
  • Bigger problems can be ironically easier for them to solve than smaller ones, like generating a paragraph vs. a sentence.
  • We’re blown away by ChatGPT’s (and Midjourney’s) performance, so we conclude computers are vastly more powerful than we thought. But the reality is that these are easier problems than we thought.
  • ChatGPT is forward feed, so it’s only determining the next tokens (word fragments) to produce. The only thing resembling a loop is that when you respond to what it says, it rereads the entire conversation, including what it wrote, to determine what tokens come next (its response).
  • ChatGPT 3 has 175B connections/weights in its neural net, so it has to do 175B calculations per token produced, and a token may not even be a whole word. This is why longer responses have such lag.

 
Like many, I was impressed with ChatGPT and Copilot, but not blown away. However, my brilliant friend Jim White was, and he’s far more qualified to judge having studied this for decades. Some things he shared:

  • Neural nets can be trained to use tools. Researchers have got it to use a calculator, Wikipedia, Q&A site, search engines, a translator, and a calendar. And ChatGPT has added plugins for this, including Wolfram Alpha. This should cut down on “hallucinations”.
  • Prompt writing is a skill that must be learned. If you’re not impressed, you probably have spent very little time learning it. And ironically, you can ask GPT for help with this.
  • ChatGPT already beats humans in several things, like annotating text. This means for more specialized AI that does require tagging, you can get the AIs to collaborate and train each other. We’ve already seen this (or accusations of AIs “stealing” knowledge from other models).

 
This is important because OpenAI CEO Sam Altman has already said GPT-5 is a long way off. He doesn’t see bigger models bringing better performance, and I suppose GPT-4 costing more than $100M to train doesn’t put them in a mood to spend more. The advances will have to come from other ideas.

Regarding coding tools like Copilot and Amazon CodeWhisperer, I generally hear good things, however they don’t take any feedback from the IDE so they generate broken code and I’m not sure this can be fixed. For one, given the forward feed process, this could be computationally expensive with many iterations (do it again, but don’t use class DoesNotExist, these methods…). GPT has been poor at this so far; it kind of loses track of the conversation. The IDE could provide more context to start with, as it has the full dependencies. However, when you’re adding new code, you’re going to need to add dependencies to your build file, more imports, etc. Same as when you cut and paste from Stack Overflow.

Better (or more) training/weights could help. If you look at what the community has accomplished with Stable Diffusion by trading models, it’s quite impressive. I had thought Midjourney was far ahead, but a recent discussion on HackerNews suggested that was an antiquated view – from two months ago. The community’s new tools, models, etc. allow much better control.

My hope is we’ll get similar out of the open source LLaMA. Or maybe CodeWhisperer, which already has extra training on AWS APIs, will allow you to create a model weighted on your exact stack. It’s a hard problem, and maybe there will be better ways.

In the meantime, some comments from some legendary programmers like Antirez suggest it’s already quite good for whipping out quick and dirty code in languages you aren’t familiar with. Think build files, bash scripts, Jenkins files, etc. This could up the game of full stack programmers, as just about every one I talk to has a strong preference (and experience) with one side of the stack.

I suspect this might also tilt the scales toward stacks that LLMs “understand” better. Obviously, that will be the current kings like Java, JavaScript, and Python. Since LLMs have a problem with recognizing conflicting API versions, more stable languages and APIs will “win” the AI wars.

What I am most curious about is if we can create new languages and frameworks that are designed to be correctly predicted. We’ve already heard talk that AI is going to kill low code platforms. But maybe low code platforms evolve to be even more productive with AI. Surely they are scrambling to figure that out.

I believe two companies are so well vertically integrated that they have the most potential: Microsoft and Amazon. Microsoft is the clear frontrunner. They are already investors in OpenAI and have GitHub Copilot. But they also own the entire .NET ecosystem and Azure. I can imagine an AI-powered, cloud-native platform based on TypeScript everywhere.

Amazon will want to compete and have already started with CodeWhisperer and Cloud9. However, the overall AWS ecosystem has a steep learning curve. Their attempts to do low code with Amplify have not been well received by the larger community. But I think they started with the goal of making software development easier for AWS admins and devs building highly scalable systems. Google had similar struggles with AppEngine. The majority needs a low learning curve and apps for maybe hundreds of thousands, not billions.

It’s early enough to be anyone’s game. Maybe Meta, shifting from the metaverse to AI, will win it all with a PHP stack?

samkgg Lessons Learned

(This is a regularly updated page where I document what I’ve learned while building samkgg, a demo app for AWS SAM, Kotlin, GraalVM, Gradle, and other technologies as this progresses.

Read the samkgg blog post first to understand what I’m doing. You can find the working code here:

https://github.com/madeupname/samkgg

Builds and Dependencies

It’s important to consider how you organize your code. There are a couple options that give you flexibility. A critical point is that you don’t need to use Graal for everything! The main advantage of Graal is fast cold startups. If you are serving users interactively, use Graal. But some Lambda functions run asynchronously or in batch and it doesn’t matter much then. It might save you developer time to build a standard JVM function and not worry about reflection. SAM supports this.

When you create your project with “sam init” as described in the README, it will create a single Lambda function in a single Gradle project. You can put multiple Lambda functions in that same directory and build/deploy them without error by configuring the path for each function in template.yaml. However, each function will have the same executable and hence the combined dependencies for all functions in your project. This is a little better with Graal, but not recommended.

There is no way to make this a traditional Gradle multi-project build. Each function needs its own Gradle build:

https://github.com/aws/aws-sam-cli/issues/3227

In lieu of a multi-project build, you can do two things. First is to follow Gradle guidelines for large projects. This is definitely not a large project, but we organize it like one. Second is to add more build logic in via Makefiles. Yes, Makefiles:

https://makefiletutorial.com

Other notes:

  • Despite the GraalVM term “native image” the SAM package type is still zip. Hence, Lambda layers are allowed.
  • Similarly, the use-container flag passed to “sam build” means it is going to build the functions inside a Docker container. Since it’s building a binary for a specific platform, the build runs in that target platform with the necessary dependencies.
  • The Kotlin stdlib JDK 8 dependency threw me, but that’s the latest version. It still works with JVM 17 targets.
  • AWS SDK v2 is recommended, most importantly since they are claiming it is GraalVM-safe. It was also reworked for faster Lambda cold starts. However, it is possible to include both v1 and v2 in the same project, which could be required given v2 still has not reached feature parity.

 

Windows

You want to run WSL 2 to support Docker and run Ubuntu locally (more below). This is possibly just my machine, but Docker Desktop makes the “WMI Provider Host” process suck up CPU, blasting the fans, and none of the solutions I’ve read fix it. YMMV.

Git Bash (MINGW64) seems to support the most commands well and doesn’t need Docker, so that’s my go-to shell. Of course, IntelliJ can run Gradle directly and that’s handy, too.

However, Ubuntu is required for running the tracing agent, which is required to ensure you have all your GVM resource and reflection config. And unfortunately that needs Docker Desktop running.

GraalVM

Note: I believe you can/should always use the latest stable version because you are creating an actual binary in a custom runtime. Even though you choose a GraalVM JDK version (11 or 17) when creating the app via the CLI, there’s no separate VM/runtime supplied by AWS. Native images are self-contained.

Minimum docs to read as of this writing. I’ve seen docs change significantly between versions.

https://www.graalvm.org/22.2/reference-manual/native-image/

https://www.graalvm.org/22.2/reference-manual/native-image/basics/

https://www.graalvm.org/22.2/reference-manual/native-image/metadata/

Skipping the docs will lead to a bad time. It may also lead to ignoring viable libraries just because they use reflection.

GraalVM can use code with reflection, it just needs to know about it beforehand. In particular, it needs to know all of the possibilities in advance. If your code dynamically instantiates any of number classes, Graal needs to know which classes those are so they are included in the executable.

Reachability Metadata and the Tracing Agent

One of the biggest challenges with working with Graal is configuring it to include what will be needed at runtime, including dynamic features and resources. native-image does its best to discover this at build time, but it can’t predict everything. Hence you must give it supplemental instructions via “reachability metadata” commonly shortened to metadata. Key files are:

  • resource-config.json – this forces the build tool to include these files so that they may be loaded at run time when it’s not obvious to the tool at compile time
  • reflect-config.json – specify what will be called reflectively so the VM can prepare this code for execution in the binary

Some libraries supply or document this metadata, but most haven’t. This is somewhat eased by using the Tracing Agent (see metadata docs above), which is a runtime Java agent that tracks every time reflection or dynamic behavior is used and stores everything in special config files. The two key ones are

However, the agent can only detect these usages during the run. If your run with the agent skips any classes, methods, even conditional branches, and they would have used reflection, the config files will not get updated and your code can have an error during runtime. A solution to this is to run the agent when you run your tests, assuming your tests have good coverage. You want to add exclude filters for your test libraries.

When I ran it on this simple project, though, the output was enormous (41KB) because it’s listing everything individually. It’s like importing every single class from a package instead of using a wildcard. A small binary is a high priority for Graal. The good news is there is a flag to merge metadata from multiple runs into a single file.

Given all that, you can see why all libraries and frameworks seeking to be GraalVM-friendly (like Micronaut and Quarkus) seeks to avoid reflection like the plague. Sadly, I was using Spock for tests (huge Groovy fan), and discovered the agent will include everything for Groovy and Spock, which is way too much to wade through. I then understood why plain old JUnit was chosen for the template.

Environments/Stacks

Environment is an overloaded term in SAM. You have:

  • SAM CLI environment
    • the “sam” command has a config file named samconfig.toml and this file divides configuration settings among environments
    • the default environment is literally called default; create others as needed
    • you specify the environment for the command with –config-file
  • environment variables
    • there is a block for this in template.yaml
  • environments where the function runs
    • local or AWS

Finally, what a programmer might think of as a deployed environment (qa, production) CloudFormation (and SAM by extension) calls a stack.

Per SAM best practices, you create a stack per environment, such as development, staging, or production. So together with environments, your code can be:

  • deployed to AWS in different stacks
    • development
    • production
  • local
    • Docker container (sam local invoke)
      • development
      • production
    • test (gradle test, no container)

Each of these has subtle differences that are not always obvious/documented. I do my best to document the surprises here.

SAM CLI can be passed a JSON file and/or command line options that contain overrides for the Environment variables in your template.yaml. Two critical points:

  • You have to create environment variables in template.yaml or the Lambda environment won’t pass them to your function, even if they exist.
  • One very misleading issue is that the Parameter block of the JSON file is for global environment variables, not parameters! I was not getting my global env vars overriden even though they were declared in template.yaml and specified for the function. For safety’s sake, I duplicate them in the JSON file.

Logging

I’m not the first engineer to find logging with Lambda and Graal to be surprisingly challenging. My initial choice for logging was AWS Lambda Powertools. It looks like a solid library to help you troubleshoot, however, it relies on Log4j 2 which is not GraalVM friendly. According to that thread, the maintainers say it won’t be ready for Graal until Log4j 3 at the earliest.

Graal officially supports java.util.logging (AKA JUL), which is nice because it’s one less dependency (setup docs here). However, for reasons I don’t yet understand, log messages directly from JUL did not show up when run from Lambda. They worked fine when testing outside the container, which you’ll find is a common challenge with this stack.

The solution was adding SLF4J with the JDK 1.4 logging support library (not the “bridge” library – that sends all JUL logs to SLF4J). SLF4J should also enable logging from AWS Java SDK, and I have seen an instance of an AWS SDK log message in my console.

The next challenge was determining per-environment configuration given the differences:

  • development and prod should have different logging.properties files to set different log levels
  • code deployed to AWS uses CloudWatch

My first attempt was to configure the loggers at build time so Graal didn’t have to include the files in the native image. But Graal considers creating an input stream from a resource to be unsafe during initialization. Logging is still configured during class initialization, but at runtime, not build time. I use AWS-provided environment variables to determine if it’s in a container and if that container is AWS-hosted.

The next issue is that Lambda logs everything in the console to CloudWatch, which is good, but CloudWatch sees every newline as the start of a new log message. I have created a CloudWatchFormatter replaces newlines with carriage returns (which CW doesn’t mind) if the code is running on AWS. My next goal is creating a JSON formatter to allow better use in CloudWatch Insights.

Another interesting concept is Mapped Diagnostic Context (MDC), which is like a thread-bound map. It is not supported by JUL, but SLF4J offers a basic adapter. You can put any key/value into the map and it will be visible to all methods that get the MDC adapter. I added the AWS Request ID from the context so it can be logged with messages from any source.

I owe much of this to Frank Afriat’s blog post and GitHub project. You may wish to use his implementation, which is more robust than mine, but is also marked alpha and relies on SLF4J’s simple logger, which is not as robust as JUL.

DynamoDB

I found the DynamoDB Guide an excellent supplement to the docs.

I don’t see any library that handles schema and data migrations for DDB like Flyway and Liquibase do for SQL. CloudFormation can build tables for you, but it can’t handle data migrations that naturally occur in an active project with a changing schema. Luckily, you can implement the basics of Flyway pretty trivially.

A particular challenge not shown by the docs is a very common scenario: a class has a list of custom objects. Mapping this in the enhanced client (their ORM) is not at all obvious. You have to map it to a list of JSON document, possibly with a custom attribute converter. I think they didn’t bother making this easy because with DDB, this relationship is only useful when it’s a composition relationship, meaning the objects in the collection don’t exist outside their parent. Otherwise, they would be stored in a separate table with any relationship managed entirely in code since DDB doesn’t support joins.

In adding the DynamodDB dependencies, I found I had to update the reflection config due to a transitive dependency on Apache HTTP Client.

Kotlin

I’m learning Kotlin from scratch and it’s less intuitive than I thought, especially (ironically?) coming from Groovy. Running into issues where things don’t work and it’s not clear why. IntelliJ is a big help here, offering fixes that work and shine a light on where to look for help. I think a lot of the challenge comes from how strict the typing is. But there have also been a few times where I fired up the debugger to find out why something isn’t working… and then it just does. To be clear, it was 1) run test, which fails 2) debug run test (no changes) it now works. Like it was really an IntelliJ state problem, which I’d experienced in the past with Gradle. Refreshing the Gradle project might help, YMMV.

Kotlin has some useful features like data classes that automatically give you equals and hashcode methods, but you have to mind the constructors.

 

samkgg: AWS SAM + Kotlin + Graalvm + Gradle

I am starting a new open source project called samkgg. It’s a learning in public project that I hope will help others who want to adopt these technologies. You can find it here:

https://github.com/madeupname/samkgg

I thought I’d explain a bit about why and how I chose these technologies. The main driver is an is an Alexa skill that I’m building. This is not for fun, but what I hope to be a solid commercial product. And even if you’re not building an Alexa skill, you’ll see how to build a production serverless app.

TL;DR

  • Lambda because Alexa docs mostly assumes this
  • GraalVM to eliminate Lamba cold start delays
  • SAM is way more mature than Amplify
  • Kotlin because I can’t use Groovy
  • DynamoDB because it fits in with the serverless model

I am a fan of the philosophy “two ways to win, no way to lose.” If my product doesn’t take off, I still win by adding the serverless architecture to my tool chest. I also find voice UIs/ambient computing to be very interesting.

Why Lambda/serverless?

If you look into how to build Alexa skills, there are various ways you can do it. Technically, I could just write my own REST API in any framework. But almost all the documentation assumes you are building on Lambda. I prefer modular monoliths for most projects, especially to start. But I’m less excited about choosing a poorly documented route. Just using Java is hard enough.

Amazon does have a hosted development platform for Alexa, but this only supports JavaScript and Python.  My background is JVM technologies and I have no desire to throw away that experience and switch ecosystems. I think most developers are far too cavalier about that and I also believe in the power of “skill stacks” – building on your experience.

Luckily, Java on Lambda is no problem. I was greatly helped by the book Programming AWS Lambda, which is Java-specific.

Well, when I say Java is “no problem,” I mean “possible.” In reality, it has an infamously slow startup/load time. This is why most Lambda functions – and microservices in general – are written in interpreted languages like JavaScript and Python. Java is faster if the process stays up long enough, but with serverless you’re frequently starting a new server, so users can experience very slow responses. This can be especially painful for a voice UI – there is no spinner.

Why GraalVM?

The solution to slow cold starts is GraalVM. It allows you to compile Java (and other languages) into native binaries that start up super fast. If you haven’t investigated it, you should know:

  • some years ago, a very high level Oracle exec told me they considered it the future of Java
  • first production release was over 3 years ago
  • it’s used in production today by big companies like Twitter
  • it recently got first-class support from Lambda (JDK 11 and 17)

Why SAM?

My next decision was what technology to implement the Lambdas in. I first looked at Amplify, since AWS was touting this as a low code solution, practically no-code with the Figma builder tool. But after a spending a lot of time, I came away deeply disappointed. Politely, this is the common conclusion. I think only JS developers will be comfortable and I didn’t love that GraphQL was the main technology.

But the biggest problem was no easy path to learn. Much of the docs frequently lagged the tooling, even the “Getting Started” material. Tutorials should be flawless, especially for a commercial tool. Due to its pricing, I think it has tremendous potential for startups, but needs a docs-first approach. I’ll look again in a couple years.

I learned of SAM from a Hacker News discussion of Amplify where it was repeatedly recommended over it. After looking into it, it seemed the right choice for me. Like Amplify, it’s a productivity layer on core AWS technologies (mainly CloudFormation).  Both hit the Law of Leaky Abstractions, but SAM has a smaller surface area. That said, with SAM you will still need to understand:

  • AWS fundamentals
  • CloudFormation
  • API Gateway
  • Lambda (duh)

and more, at a deep level. And in the stack I’m building includes:

  • DynamoDB
  • Cognito
  • CodeDeploy
  • Kotlin
  • Gradle
  • GraalVM

and possibly CodeBuild and Cloud Developer Kit (CDK). In short, about the same as you needed for Spring/Hibernate or Groovy/Grails (plus your DB, CI/CD, etc.).

As Fredrick Brooks said, there is no silver bullet.

Why DynamoDB?

I’ll add I was on the fence a bit with DynamoDB, having not done NoSQL before. But what I really like about SQL is Hibernate/JPA (and really, GORM). And Hibernate does not work in GraalVM. OK, there is a Quarkus plugin for it, and I found a library to make it work, but I did not find a definitive statement: “Hibernate is guaranteed to work in production on GraalVM.” I more found, “This is all stuff we had to do to get it to work.” I love Hibernate, but I need a guarantee.

In fact, I couldn’t find any Java ORM that was officially compatible with GraalVM native images. This makes sense because GraalVM doesn’t allow reflection, which is an obvious feature to use in an ORM.

DynamoDB has a mapper, which is a very lightweight ORM, but at least it works with GraalVM. Well, other than schema creation, but you probably want to do that programmatically to handle migrations anyway. Unfortunately, there is no migration library like Flyway for DynamoDB.

For a small schema, it feels more natural to work directly with an object graph, which is what you get by persisting to JSON. But for a sizeable app I’d consider Aurora Serverless PostgreSQL and Flyway. Going back to mapping result sets does not sound fun, though.

Why not Micronaut? Or Quarkus? Or…

It’s natural to consider one of the serverless/microservice frameworks. I know excellent people on both the Micronaut and Quarkus teams. But they seem an even further abstraction from AWS. If I were building generic microservices, not an Alexa skill, I’d consider them. I would definitely look if I were trying to be cloud agnostic, but that’s silly for an Alexa skill.

So if this sounds interesting, please check it out:

https://github.com/madeupname/samkgg

What is the Mautic Plugin Published setting?

If you’ve installed Mautic and head over to plugins, you notice there is a setting called “Published” with choices Yes or No. Inexplicably, there is absolutely no documentation on this.

Published means enabled. Apparently, they thought it was obvious since the tab is called Enabled/Auth. But given that some of these plugins might publish something it’s a poorly chosen name since a new user might not be clear about it.

Java’s New Pricing

Java’s New Pricing

I recently attended a talk by Georges Saab, Java head honcho at Oracle. The following is an executive summary, simply explained so you can understand the changes and plan accordingly. If you use Java commercially, the odds of you reading this and saying, “We’re fine, don’t need to change anything,” without doing any checking, is very low.

I’ll add the same “safe harbor” statement Georges added: any planning/spending you do should not rely solely on this article – things can change, I might have misheard something, etc. Do your own research.

Background

  • The JDK, or Java Development Kit, is a versioned specification. JDK 5, 6, … 11. There are also editions, such as Java SE, Java EE, etc.
  • For 5 through 9, major versions were released every 2-5 years. Updates (e.g. 8u20) came out about every 6 months. “Updating” means back-porting security, bug fixes, and possibly other improvements that are guaranteed to have no breaking changes.
  • The JDK specification has implementations, which are downloadable binaries. They come from various providers (companies, organizations, or individuals) and may be
    • under various open source or commercial licenses
    • free or paid
  • The most common one is Oracle JDK.
  • OpenJDK is a community project which provides the reference implementation of the JDK. 
  • It is a collaboration between several companies, but >90% of the contributions come from Oracle. 
  • Anyone can create their own build/distribution of OpenJDK and many do – including Oracle.
  • These builds can have code changes. For a while, Linux vendors replaced code that wasn’t GNU-compatible so it could legally be distributed with Linux, but is/was still called “OpenJDK.”

Today and the Future

  • Starting with JDK 10, a major version of Java will be released every 6 months. Far fewer changes, but still major versions, so not guaranteed to be backward-compatible. 
  • JDK 11 introduces the concept of long-term support (LTS) versions. These are the ones that are going to get updates after the next version is released. Example:
    • JDK 12 is released and bug security fixes are back-ported to 11.
    • JDK 13 is released and bug security fixes are back-ported to 11, but not 12.
  • The big news is that Oracle is going to stop updating Oracle JDK 8 for commercial use in January 2019 and personal (desktop) use in December 2020 unless you pay for support.
  • Oracle will only provide updates to the free version of Oracle JDK, and OpenJDK, while it is current (versions released in the last 6 months). Meaning as soon as JDK 12 is released – 6 months later – they will stop providing updates to Oracle JDK 11 unless you pay.

Let me clarify that with an example that shows you your options:

  • JDK (Java) 11 is released and you adopt the Oracle JDK in a commercial setting. You’re paying nothing as usual.
  • Six months later, JDK 12 is released, with fixes and new features.
  • Your options:
    1. Do nothing. Continue to use Oracle JDK 11 for free forever, legally. The license does not expire, it will just never be updated by Oracle.
      • Maybe you’re running on a closed/air-gapped system and there is nothing else you need.
    2. Upgrade to Oracle JDK 12. Still free and likely can run everything you built under 11 without issues.
    3. Switch to OpenJDK 11 from a provider other than Oracle, who is updating it.
    4. Switch to another commercial JDK.
    5. Pay Oracle for updates of 11. I was told the current price for this is $25/processor/month, with volume discounts, run on the honor system. I understand this is a significant discount from earlier pricing models.

It will be interesting to see how this plays out. I totally get that Oracle has a big staff of developers making Java better and they need to pay them. Giving the product away for free makes that more difficult.

On the other hand, many people have gotten used to paying nothing. A number of companies are already planning to provide updates for older versions, but we’ll see what kind of toll this puts on other contributors and who does what for free. Some have extensive Java experience and their own (often commercial) distributions: Red Hat, Azul Systems, IBM, etc. Those may be more cost effective for you. Maybe you’re already using one.

For the record, I have no problem with companies charging money for software development. Mine does and I’m very fond of the rent and coffee it pays for.

CIO/CTO Action Items

  1. Identify every system that is running Java in a commercial environment. This includes desktops.
    • Remove Java from systems that don’t need it.
  2. Identify where those distributions came from: Oracle, Red Hat, GNU, etc.
  3. Determine their update schedule. Be aware of when your Java versions will be out of date. Put it on your maintenance schedule or potentially get pwned. 
  4. Switch to JDKs that will be updated or plan for regular upgrades.
  5. Budget accordingly.

For further reading, Java Champions have created a document summarizing the changes in a bit more depth.

Hope you found that useful! Please comment with any corrections. 

Thanks to Marco Villalobos for pointing out some issues in the first version.