A couple of years ago, continuous integration in the JVM ecosystem meant Jenkins. Since then, a lot of other tools have been made available. But new tools don’t mean new features, just new ways. Besides that, what about continuous deployment? There’s no tool that allows deploying new versions of a JVM-based application without downtime.
The only way to achieve zero downtime is to have multiple nodes deployed on a platform. And yet, achieving true continuous deployment of bytecode on one single JVM instance is possible if one changes one’s way of looking at things. What if compilation could be seen as a change? What if those changes could be stored in a data store, and a listener on this data store could stream those changes to the running production JVM via the Attach API?
Nicolas Frankel from Hazelcast joined to share an approach that they have been experimenting with. Nicolas is accompanied in the talk by Aapo Romu, a software architect at Eficode ROOT team.
Instead of stopping the JVM and restarting it again, once they renew, they have two specific class loaders, one for your classes and one for the whole libraries and stuff. And they will just scratch your class' class order and create a new one. And in that way, you can change your own bytecode without restarting the JVM.
Hello, and welcome to DevOps Sauna. Sometimes, great ideas start from a little experiment. Today, we're discussing the idea that the colleague of Nicolas Frankel from Hazelcast came up with. The question is whether you can achieve a true continuous deployment of bytecode on a single Java virtual machine instance. What if compilation could be seen as changes? What if those changes could be stored in a data store and listeners on this data store could stream those changes to the running production JVM via the FhAPI? That's pretty exotic, isn't it? Nicolas is accompanied in the talk by Aapo Romo, a software architect in Eficode ROOT team. As usually in our podcast, we have added the complete introduction speeches from the participants at the end of the episode. But now let's give floor to Nicolas to bring us right to the heart of the game.
So the idea is that when I started my career, well, 20 years ago, there was no continuous anything. It was just sometimes you could test locally. But then with time, we saw there were a lot of problems when you deploy to production, even though we went through staging before, even though there were perhaps multiple steps. And we wanted to have first continuous integration, like compiling and testing that all the changes from the team are okay and can be nicely integrated. And I believe nowadays everybody should be at this step. Everybody should do continuous integration. Probably if you don't do continuous integration at your company, then perhaps you are not at the right company, but there is a whole new level. And I'm a bit disappointed that sometimes and most of the time we conflate the two terms CI and CD. CI is continuous integration. CD, it depends. Could be continuous delivery or continuous deployment. And really, it's a huge step between continuous integration and any one of them.
Continuous delivery is the idea that you make your package ready to be deployed, but the deployment is still a business decision. So you say, "Hey, we are ready to deploy." You ask the businesses. Business says yes. You press a button. Then it's deployed. And continuous deployment is the idea that there is no human intervention. It's a completely, fully automated process from the commands to the deployment and all in between. You have, as I mentioned, the compilation if it's in a language where compilation is necessary. You have the testing. Perhaps you have unit testing. Perhaps you have integration testing. You have quality gauges if necessary. You have everything. Everything is automated. And in the end, your software is deployed in production. And in general, all people like me, we are used to, hey, we want to deploy to production. We will stop the application. We will have a static page saying, "Hey. Dear customers or dear users, we are down for a few minutes. Please come back in 15 minutes," 20 minutes, whatever.
And the problem with that approach is depending on your context, it can have a direct impact on your revenue. For example, if you're an e-commerce shop and you close your shop for 15 minutes, well, it's time where you don't sell. And meanwhile, you have Amazon which sells. So customers would probably switch from your shop, which is down every now and then, to Amazon, which is never down. And of course, they have an indirect impact. Even if we are not talking about direct sales, it has an impact on your image. And the reason is that most users who don't work in IT, they expect applications to always be up because Amazon is always up, Google is always up, because Facebook is always up. Most, if not all of the widespread applications are always up. And so zero downtime becomes something that's not a requirement but just something that is expected. And the problem especially with JVM applications is that in general, whether it's a JAR or a WAR, you need to switch off your virtual machine and then deploy your new JAR or your new WAR and then start it again. And there was this colleague who told me, "Hey, perhaps we could do something smart like stream the changes. And so in that case, we wouldn't have zero downtime." And I said, "Hey, let's do that. Let's try that.
How did you come up with that idea since there is a huge range of multi-node setups that would enable that AO to deploy the whole application as you used to. You would just load balance between the running nodes and the nodes under upgrade? And sorry, I want to go a bit back with your talk about the continuous deployment and continuous delivery. As you said, it is a huge step. And normally, yes, it is a technical problem also, but normally, it is a more cultural problem, that the customers are exactly not ready for that. Even if they would like to have it, they have processes or they have opinions that basically make that impossible.
I completely agree with you. Most of the time, as I mentioned, and that's the reason why I left consulting, is because the technical team might be ready or may be willing to invest some time. The business wants to keep control because they're afraid that if they go the fully automated way, then they will lose control over IT and it will be a mess, where you know that all the big players that I mentioned, they wouldn't be able to do that without this fully automated pipeline. Sorry, I forgot there was a question before.
Yeah. The question was, how did you come up with an idea of hot loading the bytecode to JVM instead of leveraging existing approaches for node and downtime deployments like multi-node setups and load balancing and that kind of stuff?
The problem with infrastructure setup is that we are assuming it's completely stateless. You have the same thing with Kubernetes. Kubernetes allows you to do rolling upgrades. And when you come up on rolling upgrades, if somebody explained it to you and you say, "Well, it's amazing. It's magical. It works out of the box." But most of the time, it's very easy to do that. The complexity lies in the state. And when you handle states, if you have a multi-node setup, perhaps you can keep the sessions, for example. You keep the sessions. You work the data session. That's fine, but how do you cope when you need to change the database schema, for example? That will be very hard with having a multi-node setup. You need to have code that is able to handle both the new schema and the old schema. And then you go into really, really big trouble. And this idea was not mine, actually, as I mentioned. It was a colleague of mine who told me, "Hey, This could be a cool idea." And I said, "Okay. Let's make it work." And of course, I didn't want to move to the full production usage because that would halt all products, but they wanted to check if the basics, if the foundations would be working. And they do, actually. They do pretty well.
Okay. You do have good points related to, for example, the data schema. I agree with that. I'm just trying to think out how to solve that in real life.
So again, it's an experiment, so it could benefit from a lot of improvements. But the basic setup is I have a running JVM, and I consider it a production JVM. And then I have somebody who codes and produces bytecodes. And there is a job, a trimming job, that reads from the place where you compile the classes and that, let's say, injects the bytecodes into the running JVM. And it is as easy as that. We can make it more reliable. We can make it more complex. But the basic idea is you have actually just a place where you generate bytecodes, and you have a running JVM. That's as simple as that.
Yeah. So you are keeping a registry of the compiled classes somewhere.
Exactly, mm-hmm (affirmative).
Then I suppose you change the classes and upload it. Yeah?
Yeah. Yeah. That's the second step if we want to get a bit deeper. I have a registry, an in-memory registry. I'm using Hazelcast. I work for a company called Hazelcast, so I'm using the tools that I know. And actually, Hazelcast is an in-memory data grid, meaning that you can have a whole lot of different distributed data structures, and a registry is just a simple key-value store. So in that case, I'm using the IMAP map that IMAP. And the fully qualified class name is the key, and the bytecode is the value. And on the production JVM side, I subscribe. So let's say at this level, I subscribe to changes in this Hazelcast IMAP, and every time there is a change, I will reload the class using one of the JVM API.
Okay. That sounds interesting. I can imagine it actually working with small hotfixes really well. I had some doubts about when we want to upgrade libraries. Let's say you're running, for example, a Spring Boot-based application and you want to upgrade the whole Spring Boot release. Have you given any thought how that could be done with this approach? Or can it be done at all?
I'm not sure it can be done, but in that case, in order to reload the running bytecode and using the instrumentation API. And instrumentation API is pretty limited. You cannot add attributes, remove attributes of a class. You can change, however, the body of a method. That is part of the instrumentation API. Now, if you want to go a bit further and you want to change the whole application and not only hotfixes, And you mentioned Spring Boots, Spring Boots uses such a mechanism in DevTools. So basically, they have a dedicated class loader. And if you want to change just your code in development, what they will do such an incompatible change, meaning you want to add or remove an attribute, instead of stopping the JVM and restarting it again, what they will do is they have two specific class order, one for your pluses and one for the whole libraries and stuff. And they will just scratch your class' class order and create a new one. And in that way, you can change your own bytecode without restarting the JVM. If you want to change the library, however, you need to restart the whole stuff.
So perhaps if you want to make it more complex to add new features, you could come up with such different class loaders for your running JVM. But I believe it would be something much more complex and much more involved. This is not an experiment anymore. Then we are talking about creating a dedicated product for that.
Yes. That's exactly what I'm interested in, is how that approach would work in a live production environment. For the Spring Boots, I've been using the development version, out loading, a bit years back. And I dropped using that because the Spring Boot startup times have improved a lot. And the fact that it's just two or three restarts when it loads the bytecodes into the JVM, then we ran out of memory. And that was partly because Tomcat does not release all the resources it's using on the reload. Have you seen this kind of problem with your approach, or are they irrelevant for this?
I had another issue recently, and it was completely not relevant to this approach. It's when I was using DevTools and it restarted, and they had this class A cannot be reloaded as class A. So basically, there were still two class loaders in conflict. This is what I noticed recently, but I didn't investigate further. For my usage, as I mentioned, I'm not doing real-world projects anymore, big involved one. I'm doing demos and prototypes. And I just removed DevTools altogether because, well, it was not necessary.
Okay. Yeah. For this to become a real production level product, there are a lot of open questions to be answered.
Completely, but I know people are using this method to send hotfixes to production. And it's in a bank.
Yeah. I can really see the benefit for doing that, and I believe that it is possible to do that kind of thing. That brings me to a question. How do you then make sure that you always know what's inside your products and what's inside your version control that you are in sync when you are deploying a new complete version over that hot patched version?
That's a great question. That's a very, very good question. And I believe it has to do with the way we see the world right now. Right now, we see the world as we make a release, we deploy this release and this release is tagged. And we know exactly how we can associate it with the version control. However, when you are starting to do continuous deployments, this idea of having a release as a fixed number completely goes away. What we are doing probably is we are adding the commands, and the commands is the thing that is deployed. There is no more release as we know the name release. And so this is not like congruence. This is not related to bytecode deployments, but it is related to continuous deployment. In continuous deployment, you don't have releases anymore. You just deploy the latest commands.
As I mentioned in my experiments because I'm doing that on my own computer, I stream the changes from the developer's machine to production, but we could do the exact same stuff by streaming the changes from a continuous integration pipeline folder where it generates the bytecodes to the production. And so there could be a chain of trustability from the production back to the pipeline that created that. And you could also stream any text and any metadata along with it. So it's not an issue per se.
Yeah. That could actually work. Yes.
Hi. It's Lauri again. I have to admit that the topic Nicolas and Aapo are discussing sounds pretty, well, experimental for an uneducated ear. But because we're talking about continuous deployment, I wanted to enlighten you about our pipeline game. In this free online pipeline game, you and your colleagues can learn the perks of continuous delivery in a fun way. You can find the link to the game in the show notes. If you are a member of a local DevOps community or you run a team who you would like to better learn the secrets of continuous delivery and continuous deployment, we offer facilitated workshops, where you and your folks can learn how to improve your software production. Just get in touch with us. And let's talk more. Now, let's get back to our show.
Then again, you probably need to do that larger update at some point, and then you'll need to have mechanics for handling that also. So it's two deployment methods will be regarded anyway.
Probably, yes. You're correct on that.
As you said, you are uploading directly from your development computer. And in real life, it probably would be, as you said, from the CI server to the production of their certain quality gates or security testing and that kind of thing. Usually how businesses work today is that you develop some features, and then you will have pull requests to your master branch. And they get reviewed. Then they are processed in the CI and deployed. Is that the approach also here for the development flow? Or do you see any other approaches for ensuring code-based security so that just any developer can't push some arbitrary code to your products and perhaps introduce some malicious code or something?
Well, my approach is not prescriptive in how you want to work. It depends on how you want to run syncs. It can be from a developer's machine, and of course, it would be very, very risky. Or it could be from a pipeline, and this pipeline could be just run from the master branch when you need to have the whole team who reviewed. Everything is possible. It's not prescriptive. However, I think that's regarding security, that's a whole new world. By default, you could run the production JVM with aesthetic agents. That in general is the usual way to run agents, is when you start the application, you run Java, Dot Jar, blah, blah, blah, -agentry, whatever. That's how most monitoring or instrumentation libraries and frameworks and products work. However, that's the realm of static agents.
There is a whole different category of agents, which are dynamic agents, meaning that you can run the JVM, and then you have another JVM that attaches itself to the running JVM and that loads a new dynamic agent into it. And that is very, very interesting regarding bug fixes or whatever. So you can run a normal JVM, not in debug mode, in standard mode, and you have this other JVM that launches and that loads spy code into it. And of course, you can say, "Oh, but this is very dangerous. This is a security issue." Well, the problem is that most, if not all, JVM in the world in production run like that. The only way to prevent this is the default behavior. The only way to prevent this is to use a security manager. And if you don't know what the security manager is, if you are working in the JVM world, please just check it out. The JVM manager is an important piece of infrastructure.
And that's very concerning. Right now, there is a JEP that wants to remove the security manager. So first problem, you probably don't know about the security manager; second problem, you probably don't use the security manager; third problem, because you probably don't know and probably don't use, then Oracle wants to remove it. And if you are afraid that what I'm showing you here is going to be a security issue, well, even if you don't use it, your JVM, as the attach API by default, as the instrumentation API by default, if you want to prevent this kind of stuff, you need to do something against it. So it's not a security issue because, actually, you probably have all the security issues already without even using that.
Yeah. So you are saying that most of the products in JVMs are actually running the attach API enabled. The security manager is not configured appropriately.
Exactly. That's exactly my point.
Yeah, that is unfortunately probably so true and something that you really should look at when hardening your production servers.
Exactly. And more of you should be concerned about this new JEP. It's JEP 411. And the goal of the JEP is to remove the security manager to study what could be an alternative. But one of the non-goal in the JEP is that it shouldn't provide a replacement. So if you're an Ops or DevOps person working with a JVM, please check this out. It's very, very important. I have a talk called Securing the JVM, and it shows all the bad stuff that you can do with an insecure JVM. Of course, you can change the type of the Java classes. That now is fixed in the latest JDK. I think from JDK 12, you cannot do this anymore. But JDK 8, you could have a class A with an attribute of type in, and with reflection, you could change it to a type String. That's for fun. That's not very interesting. But by default, your application, because it runs on the JVM, knows how to read from a file if it's Java coded then compile it to real Java bytecodes. And it can load it on the fly. You have that running in production right now.
It has been an interesting conversation and we are beginning to wrap up. My two remaining questions to you, Nicolas, are, if you are somebody listening to this and you would want to apply it the crawl, walk, run, approach to this, doing something small first, then something bigger and finally, perhaps something grandiose, where should you start? And associated with that, what tools would you be looking at to accomplish it?
The first thing is this is an experiment. It was never meant to be something grandiose. The idea behind it was, folks, we are living in a world where we are used to thinking about discrete things happening. So there is a release. There is a tag. There is a deployment. But actually, this is because people, this is how we think about time, a series of discrete things happening. But actually, we can change our ways of looking at things to see that everything is just a stream of events, meaning that there is a start, but it potentially never ends. And in this experiment, I'm using Hazelcast Jet, which is an in-memory stream processing engine to do just that. And the idea was once you start seeing the world through the prism of streaming, you can apply it to a whole different stuff. So in general, this is for analytics. You have data somewhere, and you want to transform the data into another format that could be more easily processed. But actually, it can be for classes. Classes, like compilation, is actually an event, and the data is the bytecode itself. So you can apply streaming and event streaming to a whole bunch of things that you never thought before.
So I encourage you to look at Hazelcast Jet, the in-memory stream processing engine. It's distributed by default, and you can do whole really, really different things. Of course, you can do data visualization. You read data from a web service. You store it. You transform it. Then you displayed it on a webpage. I'm using it to do zero-downtime deployment, as I mentioned when we were talking about deployments. The real hot part in continuous deployment is the database, is the state. And so you can read from one database, your, let's say, blue database if we are talking about blue-green deployment, and write the changes from the blue database to a green database. And then you can have true zero-downtime deployment and so on and so forth. It never stops. So just start to think about your discreet events that can be a stream. That's what I want you to remember from this talk.
Thank you. Really, really interesting and I'm sure inspirational for our audience to wrap up. Let me first give word to Aapo to finish off on his part. And then handing over to Nicolas, you for your final words.
This has been a really interesting discussion. And I do understand now that there actually might be really good use cases for this kind of approach also. Totally still not convinced of all of the possibilities for it, especially because large-scale deployments, like whole application updating, probably would not be possible for this. But in the hotfixes, I could actually try this out once it is a product of some form.
Thank you. And over to you, Nicolas.
Well, thanks a lot for the invitation. That was an interesting talk. I completely agree. As I mentioned, it's really meant to be an experiment, but I believe if you start experimenting in different domains, perhaps it will spark some new ideas. And perhaps in some way it will give perspective to a product. And in that case, hotfixes, definitely. Large-scale deployment, probably not. In that case, as I mentioned, you would do another kind of zero downtime deployment. But still, it's fun. And besides, it gets you a lot more understanding about what you can do with the JVM and what you can not do. And you can do a lot by default.
Thank you for listening. Nicolas gave us tons of excellent references to learn more about the subject. You can find all of those in the show notes. Also, if you want to continue the conversation with Nicolas and Aapo, be sure to check out their social media profiles from the show notes. If you haven't already, please subscribe to our podcast and give us a rating on your platform. It means the world to us. Also, check out other episodes for interesting and exciting talks. Finally, before we sign off, I would like to give the floor back to Nicolas and Aapo to introduce themselves properly. I say now take care of yourself, and remember to secure your software delivery chain.
So I'm Nicolas Frankel. I've been in IT for 20 years this year. For a very long time, I was a consultant, so I was going to customers and helping them. And, well, I found out that most of the problems I had to solve were not technical problems. They were organizational or people problems, and let's say I felt a bit frustrated. So two years ago, I decided that I wanted to focus more on the technical side, and I became a developer advocate.
I've been with Eficode this summer for 15 years. When I started my career, I was in product development. After that, I did a lot of consulting. And with Eficode, mostly consulting, and I'm currently a software architect and back to product development. So I am part of ROOT R&D, and we are providing our customers with Eficode solutions for user management and data visualization, which is part of ROOT, a platform for software development, including all their third-party tools.