The journey of Groke and Eficode as they aim to push the boundaries of current platforms and tackle the challenges presented by continuous deployment on the high seas.

Enjoy a replay of The DEVOPS Conference sessions conveniently through the DevOps Sauna podcast. This talk is hosted by Darren Richardson, Lead Cloud / DevOps Architect at Groke Technologies.

Lauri (00:05):

Hi there and welcome to DevOps Sauna podcast. Eficode has organized The DEVOPS Conference where its predecessors for a number of times. Last time in April, 2021, over 10,000 people registered to the event. The DEVOPS Conference is a community event where attendees do not have to pay for the participation and we are happy to organize it again on March 8th and 9th, 2022. You are all wholeheartedly welcome to join the event and register at thedevopsconference.com. If you are a DevOps, Agile or cloud practitioner or a decision maker, I also encourage to review our call for papers. We are introducing a whole new community track to give more stage time for people with their personal experiences and insights. All of the speeches from the previous time are available online to watch, but to make it easier for people to enjoy these great speeches, we are also sharing them through our podcast. I hope you like them as much as I do.

Linda (01:08):

This keynote is by Darren Richardson, who works as the Lead Cloud / DevOps Architect at Groke Technologies. And he will present us with a case study involving continuous deployment at high seas.

Darren (01:21):

Before I get started, I'd like to ask a question and that is: how would you handle continuous deployment in an isolated system with no guarantee of connection that can access heterogeneous USP devices, leverage the power of GPUs and can update and roll back on request? So here is my presentation, Continuous Deployment in Isolated Systems, or perhaps a subsubtitle: How Groke Technologies and Eficode put Kubernetes somewhere we were told wouldn't work and it kind of did.

Darren (01:55):

So my name is Darren Richardson. I'm the Lead cloud and DevOps Architect for Groke Technologies and I'm here to present our DevOps journey. Is everyone enjoying the convention so far? I have been, particularly I liked Cheryl Hung's predictions for cloud native in 2021, because it felt like she was talking to me, because there she was predicting more edge cases and IoT for Kubernetes in 2021, and now here I am 24 hours later talking about exactly that.

Darren (02:24):

This will be more of a story about how we at Groke Technologies pushed from the inception of the idea to the pilot in 18 months. How we leveraged Eficode's RTM platform to build a software factory, and perhaps most importantly, how we tackle the challenges of deployment in some of the most difficult situations.

Darren (02:45):

Firstly, a little history about Groke Technologies. We are a startup, founded in late 2019 with a name to bring autonomous navigation systems for maritime use. So think Tesla self-driving cars but for massive ships. Our initial solutions are tailored for the Japanese maritime industry. We launched one plot pilot in Finnish waters last week and we're launching a second in Japan in May with the support of our strategic investor Mitsubishi Corporation. And my concern in this whole affair is getting our software onto the advisory systems of the ships.

Darren (03:27):

So to meet the team, we had a few people from Eficode in a number of capacities. On the top shelf we have Kalle who you've probably seen interacting with the chat throughout this event. He's been answering a lot of questions there. We have, Kalle has been doing, we might call a lot of the brain work for the plan. And then we have me, Juha who are mostly doing the heavy lifting. We provided the muscle in the equation. Then there is Hamzah from the ROOT side who kind of coordinated our toolkit. This will be a bit of a shock to people from Groke hearing because most of them have never talked to him, but I shoot a lot of emails back and forth with this guy to get our root platform doing what we wanted to. And then we have Malnor who, at the risk of sounding like a 90s heist movie kind of acted like a fixer in that I would make weird requests of him and then he would come back two days later with whatever I'd requested him to.

Darren (04:32):

And of course he was kind of supervising the efforts. You've also seen him engaging with the chat. You might be talking to him as well. And this was the team aside from, this is the team on the DevOps side. There was also a lot of support from the internal Groke staff integration teams and such. So let's move on and have a quick discussion of our product. Now I have to be a bit general here because we are still in a research and development phase. So a lot of what we are doing is not available knowledge for the public yet, but this has been cleared. So I will tell you what I can. Picture that you're a ship captain. You stand on the bridge and you direct 50,000 tons of metal.

Darren (05:19):

Now you do this by assimilating data from various systems on your ship and your own observations. And then you make decisions on how best to operate. As long term goal, this is what Groke Technologies is intending to emulate. An AI captain, an autonomous vessel control. Now, unlike with self-driving cars, things like Tesla, there are far different rules governing a ship. Lack of clear delineation, for example with a car, you know exactly where you're supposed to drive. It's on the road. It's very clear for everyone unless you live in Finland during the winter and then everything is covered in snow and you kind of have to guess where the road is. You just kind of point between the trees and hope and that's kind of the situation we're dealing with at Groke because on a ship, you kind of, you don't have these clear lanes of navigation and when you do, they are definitely not delineated within the ocean.

Darren (06:17):

So firstly, we have awareness. That's what we need to build initially. Our awareness system, which went out for the first pilot last week, sits on the ship and gathers data, gives advisory notices and generally improves situational awareness. So, normally you'd have ships that gather the normal data if you would expect on the bridge and our system fuses it together. So you have GPS data, which tells you where you are. You have AIS data( automatic identification system), which is a marine protocol which tells you where all the ships are. You have visual data from wide field of view camera and basically this data and some others is all fused together and delivered to a custom UI which picks out and identifies objects from the horizon based on this data. And then obviously it gathers this data. It gathers how the captain responds to these situations that they might come up with and then delivers them to our machine learning platform.

Darren (07:23):

Now, technologically, I can't go through a lot but like on a very high level, awareness system consists of two machines networked together. We have a sensor unit to which all these sensors are connected, running an arm process and video GPU and a central unit, which is just an industrial power PC. I want to talk about this a little bit because it's important to highlight the process that go into producing software, but it's not the primary focus here because the process is actually pretty simple. The big challenge was this, how do we keep an updated application stack onboard? And to do that, we need get ups, obviously. We'll get into it a bit later, but we have to make the up to date stack available. We don't have a connection to the vessels at all times as we discussed at the start. So we can't push updates in the traditional manner.

Darren (08:18):

We can't try and time them. We can't insist upon them. And frankly, if we could, we don't want to. We don't want to be in a situation where someone is using our system in the middle of a storm in the ocean and we pull it down because we push an update. This decision has to come from the captain. So a quick overview, it's no matter for linear flow, we code. This goes into the pipelines. The images are built and we push it to binary storage in Artifactory. Now here is where I would've said something about, this is how we allow remote updates to the most inaccessible areas you can think of but yeah, then February 18th of this year, NASA landed perseverance on Mars and immediately performed a software update. So they kind of took a little bit of the wind out of my sails here, but I can say this is how we're planning to enable some of the most inaccessible areas on this planet to get software updates.

Darren (09:26):

I would also like to take a moment to talk about the Eficode ROOT platform. Basically because we're a startup, we've had the luxury of being able to try and do things right from the start. And that is a situation that not a lot of companies find themselves in and having one centralized DevOps platform made life a lot easier for us. So this is kind of a without agenda just because I think it's kind of cool, Eficode ROOT allowed us to go from zero to 60 very quickly. And that has saved us a lot of time and allowed us to develop elsewhere by Kubernetes. It's a fair question in a way. There are two mindsets when it comes to it and one is, Kubernetes is not designed for this so we shouldn't use it. And the other is, Kubernetes might work for this, so let's give it a try. And it all comes down to the fact we wanted orchestration. So we have to not ask why Kubernetes, but why orchestration. We have to take one step back.

Darren (10:32):

So why orchestration? The idea as I've told you is that we need, well, we have these two distributed systems on the ship and we wanted from our side to see those as one connected system. So we don't want to have to deal with updating the sensor in one side and the main unit in the other. We want to update both. And that obviously leads us to containers and orchestration because well, there are other ways to go about it, but this is frankly, in my opinion, the simplest and the most straightforward we knew we wanted to use containers anyway. We knew that's how we wanted to store and deliver the software. So it became a very easy choice. And this led to a research phase, which we basically did some comparisons and it led to the argument of Kubernetes versus Docker Swarm. I think anyone who has run Docker Swarm recently, knows these are not exactly on even ground.

Darren (11:36):

They're not running the same race anymore. On Docker side is simplicity and on Kubernetes side is basically everything else. And it breaks my heart a little bit to say that because I love Docker. It's so simple. It's so streamlined and it's so good at what it does, but there is a feature gap and frankly, there's a feature gap in the whole DevOps infrastructure when it comes to IoT and that is with device support. Device supporting container orchestration is lacking in almost every way. So I had to cast critical eye on Docker and Docker supports devices. Docker composed supports devices. Docker Swarm doesn't. It's a command that's mysteriously and inexplicably missing from the Docker Swarm toolkit. And I'm sure there are technical reasons for that which I don't want go into now out. But from a management standpoint, from a team of founders, this was kind of a stumbling point.

Darren (12:40):

Obviously, if it can't use devices, we can't use it. So it kind of leads to Kubernetes. And I think this is not really a recent discussion. If you Google Docker Swarm devices, you'll find a post on the Swan kit from GitHub, 2016. And it's basically saying this is not possible. There is a workaround provided in 2018, and it will admittedly work in some cases. And then silence. Three years later, Docker Swarm is no closer to providing support for devices. So it's kind of shut itself out of this race, forcing us into Kubernetes. So we looked into Microk8s by Canonical. I don't know how familiar you are with this but I will say I have a complicated relationship with Canonical products. I think a lot of people who ran Ubuntu in 2011, when they switched over to the unity interface, knows what I'm talking about.

Darren (13:41):

I think the people who've been questioning the Snap D package manager probably understand. So when I joined the company and heard we were based around a Canonical product, I had to cast a critical eye over it. And one of the reasons we started to consider Microk8s as our choice of Kubernetes, at Kubernetes is its marketing. From their own website, it says we can see low ops, minimal production Kubernetes for Devs, cloud, clusters, workstations, Edge and IoT. That sounds like it was built for us, right? Low overhead, easy operation made for IoT. Nope. Delving into the documentation, you'll find at least 20 gigabytes of disc space required. That's a lot. It doesn't sound like a lot because we're used to terabytes. We're used to huge hard drives, but in IoT space, 20 gigabytes is a huge amount of memory. For example, our development units were the Jetson TX2 nano running 16 gigabytes of flash memory.

Darren (14:49):

So even before we add any containers to the installation, we are running above, well below the minimum requirement in IoT. And frankly 16 gigabytes feels like the Cadillac of IoT devices because there are sizes that run all the way down to kilobytes. Now don't get me wrong, Microk8s is a lightweight Kubernetes distribution, but at a base 200 megabyte installation size before any additions. I don't know how if we can consider it fully IoT compliant, like how many micro controllers do you know that have hundreds of megabytes to spare. By comparison Ranchers K3s installs at less than 40 megabytes. That's a fifth of the size and I'd have liked to measure how much space would've been saved by not installing using Canonical SnapD package manager, but they state in their GitHub, "I'm sorry, there is no way to install Microk8s without SnapD.

Darren (15:53):

That's kind of the reason I have a complicated relationship with Canonical. That's such a weird prospect for me. Think of any other software, go to the documentation, find it on GitHub, check the installation section and you'll see install from source. You'll see, ready built dead packages. They might even have prebuilt Docker images. They might have uploaded things to app to repost so you can easily pull it. And it kind of says use our software in any way that suits you. And I think that's a great principle. If we can discuss in a less technical manner to think of it as maybe to say, imagine you wrote a book. You want people to read your book. You print it, you make a PDF, a Kindle ebook, you sell it on Amazon. You put it in bookstores. And it says, I made this. Please use it.

Darren (16:41):

But Snappy's approach has been kind of like selling this book you've put your time into writing in one physical bookstore in one physical city. And it's a weird prospect. I don't want to paint a negative picture of Microk8s. We are using it. I will state that right now. And it does a lot of great things for us. Whether we'll use it indefinitely, I can't say. At the moment it gives us a big advantage of being a simple to deploy, simple to enable additional options like remote local image registry, networking add-ons. So those things save us a huge amount of time when we are full steam ahead. Like I say, going from inception of the idea, formation of the company, pilot in 18 months. Obviously we need to have some kind of pace to us to pull that off and Microk8s helped out with that a lot.

Darren (17:39):

So, it's kind of a two sides of the same coin thing. But I can see a future where we switch to K3s and it's not to save on the disc space, but to cut ties with SnapD package manager, which it has its forced automatic updates of packages. And when we can't guarantee a connection, we have to prioritize the traffic coming out of the ship and going back to it. We can't have third party component deciding it wants its traffic now. We have to be completely in control of that. The challenge continuous deployment. The connectivity of the devices prevents standard push deployment mechanics. So it has to reach out itself for updates it's GitHub at its core, but how do you actually facilitate that? And for us, it was really a very simple answer, Bash. And it's like, this is what now a 30 year old technology.

Darren (18:36):

It's very simple, very standard, but it suits our purposes perfectly. A small Bash script sits on the device. So we have the Bash script on the vessel. But basically we are using an old technology to keep this whole stack up to date. The updates is one of the big reasons for using Kubernetes because it kind of allows us to create this small cluster inside the vessel. So it has its control plane inside the vessel and the Bash script grabbing images from our Artifactory and pulling them in when it has connection. And then storing them, storing that application stack for when the captain decides he wants to upgrade. And we can basically use that to store months of updates if we need to, but it has to be the request of the captain to update. He knows his situation better than we do. We are just coding the software.

Darren (19:36):

He knows where he is in the world and when he wants to update. So the magic Bash script pulls an application stack, passes it for image references, pulls the images and pushes them to a local registry. That's another one of the additional features of Microk8s that's really great, that it can just one line command a local registry for us to store these images. No problem. This is how we manage to have the application stack ready for an upgrade and a downgrade at a moment's notice. And then we have a minor speed bump devices in, I say, Microk8s here but it's actually in Kubernetes in general, I would say. I don't know if anyone's tried to use devices in Kubernetes. It's not a simple matter. We had a long discussion of how we would get this to work. And I remember tasking the guys with Eficode from this.

Darren (20:29):

And we had a meeting in teams maybe a couple of weeks later, where we looked at the options and I will present these options to you now. We had four options, basically. Code custom resource interfaces ourselves. And this would require a lot of resources dedicated to an area we weren't entirely specializing in, because we are trying to make the software. This Kubernetes is kind of a step on the journey. It's not where we're trying to get. It's just something we need to do. And if we had to, for example, we have, let's say six or seven physical components we would have to code a resource interface for. That would be a considerable task. And it's not something we could have feasibly done with our manpower. But the thing I will say for it, it is a solution. The other three options we have here are not. They are weird workarounds.

Darren (21:33):

So the second is using the symbolic links to mount from the host OS into the containerized operating system. And that's like a, I'd say a little questionable. It's kind of a weird workaround probably a bit janky and definitely not suitable for production use. And then we go to use privileged containers, which is slightly worse, in my opinion, on the scale here. For example, we have security floors introduced through this. We have direct access from the container to the host OS, which is not something we want. And then the fourth option is pretty much give up on having the vessel as one system. And to me, I'd say the biggest problem of all. We've come this far, wanting this one localized solution. We want to look at the system on the vessel as one thing to then split it into two, gives us kind of a, I'd say kind of a negative feeling.

Darren (22:41):

So we started working with these options and well things weren't that great, I would say. It's kind of a tripping point for Groke technologies. Until we got this new hope and this is something I've actually not seen discussed anyway yet. And in my opinion, this is one of the things that is the enabler for what, exactly what Cheryl Hung was predicting, which is Kubernetes in IoT. We have this Acri which is a Kubernetes resource interface exactly from their website. Akri lets you expose heterogeneous leaf devices, such as IP cameras, USB devices as a resource in the Kubernetes cluster and is exactly what we were looking for. And it's like, why didn't I stumble across this myself? Why am I just now hearing from this after discussing with polar squad about this task we were thinking about. This tripping point we had and they pointed us to Akri and it's like, okay, at this point I've been looking at the solution for five months.

Darren (23:53):

Why have I not stumbled across this? And the answer is actually timing. The initial commit for Akri was the 14th of October. Almost a full year after Groke's project started. Over six years since the initial release of Kubernetes. I do know features can't evolve immediately, but this was probably a ton of work from days labs. But for a feature that's been requested for as long as container orchestration has been a thing. It's a very low turnover rate. And again, it sounds a bit negative. Obviously this is a great step in the right direction, but IoT is growing. And if we want the container infrastructure to be part of that, we have to react quickly. And this is the thing, six years for Kubernetes and still counting for Docker Swarm. It's a long time to wait for what has been a requested and quite a simple feature and a requirement of IOT because if we don't have these devices, it's just the internet.

Darren (25:00):

There's nothing interesting about it. So, it came out quite late for the container industry, but for us at Groke, it was actually provident because this is exactly what we needed at the time we needed it. Summary, I want to talk about a few things. One thing would be, never underestimate the importance of Bash. I make a joke about it, but thinking to our CI/CD pipelines, it's used a lot. It's used a lot in our building of, well, containers. It's used to grab the images. It's one of those things that frankly, a lot of CI/CD is just a pretty wrapper on Bash. So be familiar with it. This is the lesson I learned. I wasn't all that familiar with it when I started in this position and now I'm looking at Bash quite a lot. So it's a nice crash course.

Darren (25:56):

Second takeaway. If containers want to remain relevant in IoT space, we have to address the feature gap faster. IoT is growing exponentially. There are billions of devices being made each year with random sensors in. You'll have some in your fridge, you'll have them, but you'll carry them around with you. It's like if we want this whole containerization system to thrive in IoT, especially in industrial IoT, in isolated IoT, in these systems that require this high level of control, we have to address these feature gaps quicker. And yet it's something I personally would've liked us to be able to handle inside Groke but with a small team focusing in a different direction, it starts to become problematic.

Darren (26:51):

And one more takeaway that I'd like to raise is take chances. I think we put technologies somewhere it wasn't supposed to go. That's what I heard from basically every person I discussed this project with because people hear that Kubernetes is a high availability system. They think it's supposed to be used for web applications or running your databases or what have you. They don't think about it being applied in these kind of small localized use cases, but it gave us the opportunity to consider our ship exactly how we wanted it as one individual system that we can schedule across the entire ship across both nodes as we need to. So, this is the thing. If people tell you that technology can't be put somewhere, question that and take the chance. And sometimes yes, you might fail, but that's kind of the core of DevOps to fail and fail quickly and learn. But there are times when you will be surprised by the results where you will not fail. You will actually find a niche use case for a technology that hadn't been considered in that way before.

Lauri (28:14):

Thank you for listening. To register to The DEVOPS Conference, go to thedevopsconference.com. If you have not already, please subscribe to our podcast and give us a rating on a platform. It means the world to us. Also check out our other episodes for interesting and exciting talks. I say now take care of yourself and see you in The DEVOPS Conference.

Darren Richardson
Lead Cloud / DevOps Architect, Groke Technologies
Linkedin: linkedin.com/in/greatbushybeard

Watch all recordings from The DEVOPS Conference 2021 on the event's page: https://www.thedevopsconference.com/speakers