Interview: The art of fixing a build with Zan Markan, CircleCI

The build process is one of the most important parts of the software delivery lifecycle. Whenever it breaks down for whatever reason, and your CI/CD system is suddenly showing a red cross ❌ on your main branch, it’s usually all hands on deck trying to get the build back to a green ✅ passing state. This will be a practical talk on how to go about debugging and fixing a broken build, featuring some live coding and an explanation of what’s going on.

This talk will cover tips and tricks for

fixing builds fast and efficiently.
treating your build system to avoid common pitfalls that lead to flaky and often broken builds.
how to learn practical approaches to debugging and logging (and interpreting logs) - both on a local machine and on a remote CI/CD service.

We're happy to have a guest from CircleCI, Zan Markan (@zmarkan) who is a developer advocate on a mission to educate and inspire developers on the topics of CI/CD, DevOps and software equality. We also have Anton Podkletnov, a full stack developer from Eficode, and the discussion is facilitated and enriched by our CTO Marko Klemetti (@mrako).

Further to our discussion, you can watch the recorded talks from our insightful speakers at The DEVOPS Conference for free.

May your builds be green!

Zan (00:06):

DevOps at its core, it's a cultural thing. It's all about getting people doing the things to automate what machines should be doing essentially. The first step would be get everyone the basic knowledge and equipment to be able to contribute and then encourage to get them to contribute to those processes and set up and ultimately owning the entire CI/CD pipeline as a team as opposed to having one person relying on it.

Lauri (00:34):

Hello and welcome to DevOps Sauna. The DevOps conference is happily behind us and we are back to the grind. We had over 10,000 registered attendees and we saw over 6,500 unique visitors over the two days. It was a fantastic event and you can still watch all the speeches online for free. You can find the link at the show notes.

Lauri (00:56):

This time we're happy to have a guest from CircleCI. Zan Markan is a developer advocate at CircleCI on a mission to educate and inspire developers on the topics of CI/CD, DevOps and software equality. He's passionate about serverless technologies, mobile development, and developer experience. As he said outside of work, he enthuses over airplanes, craft beer, and the Oxford Comma.

Lauri (01:21):

We also managed to snatch Anton Podkletnov from Eficode to the show. Anton is a full-stack developer and has been working in several projects involving CI pipelines both at Eficode and in the customer. As often, the discussion is facilitated and enriched by our CTO Marko Klemetti

Marko (01:44):

Welcome everybody, and welcome to listen to the podcast together with CircleCI. Today, we're going to be talking about CI/CD, both in big and small organizations, and how the builds work, and tips and tricks for fixing builds fast and efficiently. I'm Marko Klemetti, CTO of Eficode, and together with me today, we'll be talking with Zan Markan from CircleCI and Anton Podkletnov from Eficode. If you may, let's just start by putting CircleCI on the map. If you could Zan, just give a short introduction to CircleCI.

Zan (02:26):

We are a CI/CD platform first and foremost. We call ourselves the largest shared CI/CD platform out there, and yeah, we enable teams all around the world to build and push changes to their software to their customers as quickly, as efficiently, and as painlessly and successfully as possible.

Marko (02:50):

As far as I've understood CircleCI is both offered as a cloud offering, but then also as on-demand, how would you see that area? If you look at CircleCI today, what are ... is this a shared emphasis, or is it something that you're leaning towards in the future?

Zan (03:06):

Yeah. Obviously, teams use us in many different ways and yeah, a lot of our users come from the cloud, which is, I believe, how we started as well as a company. But server is another prominent part of our business as well. We actually just released server 3.0, a new version that's installable to Kubernetes and it has basically every feature parity with a cloud offering. We just released that. It's very popular with our more enterprise-y customers that have different kinds of requirements for running CI/CD pipelines on their own infrastructure behind their own firewalls and so on.

Zan (03:51):

We also have a hybrid version. It's called the Runner. It essentially brings the benefit of the cloud, so everything is managed through the cloud-based version, but the builds or certain aspects of builds can be executed on your own infrastructure. Again, if you have hardware requirements, that are quite specific, or if you have security requirements, you can actually run those kinds of jobs isolated on your own infrastructure with having all the benefits from the cloud-based solution as well.

Marko (04:25):

Wow, interesting! Can you also put yourself on the map of the competition? Is this something that's very specific to you in CircleCI how you see that the competitive edge is coming from? Or is there something else? If you look at for example Jenkins Services or then the cloud providers own CI tools, or then the GitHub actions for example, how would you put yourself in the competitive map?

Zan (04:52):

Yeah. Absolutely. First, you mentioned Jenkins first. Jenkins is the opensource behemoth essentially. It's probably one of the most popular CI/CD tools that people most often know when they think about CI/CD. But obviously, because an opensource tool, you have to DevOps a lot of heavy lifting yourself. You have to maintain all that infrastructure, you have to deal with all the plug-in ecosystem, maintain everything yourself.

Zan (05:26):

If you don't have this capacity in your team, it's actually like a full-time job for one or more people. On the other hand, CircleCI is a managed solution. You either use our cloud-based service, which is a SaaS, so you don't have to DevOps anything. You just use it. Or obviously install it on your infrastructure. Again, we provide all the tools, everything that you need for that. That's the distinction between using opensource tools versus using our managed or hosted software.

Zan (05:59):

With the other cloud vendors, they often have CI/CD as a feature that's complimenting their entire ecosystem of tools. But that's to say they don't necessarily have the ability to focus on CI/CD whereas CircleCI is all about CI/CD. That's what we do. Every single engineer we have is focused on making this CI/CD pipeline experience better for the developer. Yeah. Essentially cloud-based providers, also DevOps ecosystems like GitHub for example, are essentially competing with other kinds of providers in that space and offering features because their competitors have them. We can be focused on just providing that CI/CD excellence essentially.

Marko (06:55):

Perfect. Sounds reasonable. I'm going to start by throwing the ball of what is the difference between small and big organizations and the teams working over the CI/CD in small and big organizations?

Zan (07:11):

In my experience, to my knowledge, the difference is usually just the number of developers that are working on things. But when you drill down to it, developers are still more often than not, they're still working in small kinds of teams within those organizations. There's obviously Amazon’s two-pizza team or something similar like Agile Team. However, you want to call them. Those kinds of small clusters, small teams of developers are very similar across the board I would say.

Zan (07:47):

They can use the same tools, they can use the same programming languages, same programming paradigms, and once you get down to that level, to the team level, it's quite universal. Sometimes obviously larger orgs will have dedicated people who maintain those pipelines like CI/CD engineers or developer experience engineers that make sure that the tooling across the org is standardized. Some start-ups, some smaller teams may not have this ability, so more effort is spread on sharing that across the entire organization. But once you get to the team level, it's pretty much the same or equivalent.

Marko (08:35):

Sounds reasonable. Do you want Anton to add in from your experience?

Anton (08:42):

Yes, actually wondering that assuming you have a small team like let's say a maximum of 10 developers, what are the benefits of using CircleCI compared to let's say how about even self-hosted GitLab?

Zan (08:54):

So a small team, what's the benefit of CircleCI? Obviously, it's a tool that does the job. I think it does it really well. It will do the job for that team. We have some features that are very, very I would say our competitive advantage. For example, like speed or just the flexibility that we offer versus some other competitors. If a team is comfortable in a tool, if a team is comfortable in this kind of CircleCI paradigm, that's what they should essentially be ... that essentially should be using I suppose.

Anton (09:31):

Yes. Thank you. Just been wondering that there are several projects where you actually have the CI pipeline but actual deployment is still not continuous development. You still have releases, and I realize that you can still use even CircleCI, so kind of you just push the button when you are really to deploy. Everybody's talking about CI/CD, but in reality, there's a lot of CI and not that much CD still. Because a lot of the enterprise solutions still want to have a scheduled release date, which means that you can still have the automation, and you can still have the development, but if you go to a sales manager, and say that, "We have this cloud-based solution," and he said, "How can you be sure that it will work at this and this time?"

Anton (10:25):

There's still this somehow strange mistrust that the automated system will not work, and I think it's time to change that. Everybody should understand that having an automated tool doesn't mean that you can have scheduled releases either.

Zan (10:41):

Yeah. I think so too. But the reality is as an industry, we're just not there across the board. Some orgs obviously, some teams are way more advanced or way more invested into the CD aspect, and some teams are just getting started. But everyone can benefit from the same tools, everyone can benefit from having tests run, or some automation. Especially last year, we've seen that this is really, really crucial for pretty much every team too, especially when we started going all remote and so on.

Zan (11:23):

But yeah, organizations will come to this adopting the CD paradigm at some point at their pace. Some enterprises are slower, or later, some enterprises are sooner. Our job is essentially to educate and to enable anyone who wants to get there to be able to get there as quickly and painlessly as possible.

Marko (11:51):

Yeah. I'm a huge advocate of completely separating deliveries and releases from each other, so as you speak about continuous delivery, it should mean that you deliver continuously. But the delivery doesn't necessarily mean a release. I've seen that this is also a big change for many big organizations to realize that okay, we can actually deliver working software. We just haven't switched on the features that are new or for certain customers only or whichever.

Marko (12:23):

Actually, you're delivering the software all the time might be even failing in production all the time. But the fixes happen only in the areas that are not either enabled or are just the small ones. So I think we've settled on the CI/CD leaning towards the CD. Can you both talk a bit about why it is important to resolve issues with builds as quickly as possible? Actually, I see that CD is one of the reasons, but if we think that we're still working in the releases world, where the delivery isn't necessarily happening, is it still important to fix the builds immediately?

Zan (13:05):

In my experience, yes. Absolutely. It's crucial, because we take all the human factors out of the equation, and that's where most of the failures actually happen when humans make small but avoidable mistakes that computers would probably not make. As you said, software is constantly being delivered, and even though it's not automatically deployed to your Kubernetes clusters to wherever you are running it, it should still always be in a state where you can say, "Okay, I'm going to take this and I'm going to go with deployment. I'm going to go with releasing this to the end-users and customers." Yeah.

Zan (13:50):

That's the crucial bit. Often as possible or as constantly as possible, you need to be in that state where you can say, "Okay, we can build this. We can go through the rest of the process." And whether that's automated or not, it's in its own question, but obviously I would say ideally, it would be automated, but yeah, even if it's not, that's still deliverable. Yeah. If a build is broken, if the tests aren't running, or if something else has broken, like something with the infrastructure, something with the setup, so it doesn't even compile. You obviously can't make and build any changes and that software is then not deliverable. That's my angle on this.

Anton (14:35):

Yeah. I have similar notes that regarding assuming. In fact, I, myself, have once in a while pushed some changes into the repository without actually testing. It doesn't even compile like so blindly trusting my own logic, which is of course not a good practice. Sometimes the problem is not in the build, but the problem is in the developing side, that developers did not change their test well enough before pushing into even their own branch. But then again, let's say your build fails, so is there some sort of like I said, hidden knowledge, or some sort of hidden parameters which you are using on your local build, which have not been documented.

Anton (15:26):

Once you have a working automated system, and there is no longer this ... Let's say the team changes or the guy who was developing this feature moves away to a different project, or even a different company, and then half a year later, somebody realizes this component no longer builds, because we are not sure what he was using. And perhaps, the developer no longer has access to the code. Let's say he deleted the repository, and he maybe forgot what his hidden commented debug lines were.

Anton (15:57):

If you have an automated script, it's either working or it's not working. You can see all the script phases in the script, there are no hidden commands that you have to input. For several customers, we have realized that moving to an automated system generally improves this transparency.

Marko (16:18):

I noticed that when moving towards continuous delivery as you work in cloud, Kubernetes, Docker, Container World, there are two things that promote continuous delivery. The first one is the secrets management, so once you have your build settings in place automated. Usually, once you forget about the secrets management, and the delivery then just include the secret management, as it should. The other part of course is if you do for example Kubernetes blue/green releases or similar, it's really hard to do manually. Actually, having the automation do it for you is much better than to try to even do the separate CI and releasing.

Zan (17:02):

Yeah. Secrets management especially is a very interesting and potentially a tricky topic. Even though you have everything automated, let's say a developer then leaves a company, and their keys are revoked, and they are the ones who set up everything. Sometimes that can still happen even though you have the whole pipeline scripted, automated, and well-documented. Obviously, the way to get around that is to have accounts just for that service set up, so that you don't rely on a personal secret versus automated ones.

Anton (17:41):

And then another small comment related to this security, let's say you have a project which has all the let's say database passwords and so on stored in the actual code, which means it's in the repository which means some random external developer getting access to the repository will get production passwords, which let's say you have this automated deployment, then you should design or maybe change your design in such a way that these kinds of sensitive information is not stored in the database. Because if you're building it by hand, you just know, okay, I have this file here, I can use it from here.

Anton (18:20):

But then deployment too needs access to these files, so you should perhaps sometimes redesign your software in such a way that you no longer need it during the building phase. That it's maybe sometimes later copy from some conflict pile, or is read from some conflict pile, and not stored in the code. Which improves the security aspect.

Zan (18:42):

Yeah. Absolutely. And one that I kind of forgot to mention earlier was just like if you have your build lingering for a long time that's not fixed, you're still accumulating ... It very often happens that you're still accumulating these changes, new bug fixes, new features while someone, some poor sod is essentially working on getting that thing moving, and it reminds me of that ship that was stuck in the Suez Canal for a while a few weeks ago.

Zan (19:17):

All those kinds of ships that couldn't really move because of ever given really going sideways had to decide, "Okay, we're going to go around Africa to get to Europe." That just seemed like yes, this is what you do when you're kind of ... you can't deploy or deliver with CI/CD and you have to rely on manual processes instead.

Marko (19:43):

Yeah. Likely they were eventually able to unlock the queue, so a bit of a relief there. It's not always roses in the software development side though.

Marko (19:53):

Going back to having the CI/CD pipeline established, if something fails, how would you go about investigating a possible build problem?

Zan (20:03):

Yeah. What I would do is essentially turn it off and on again, which means trigger it again. See if it magically fixes itself. It usually doesn't, but it will give me a signal that it's a non-deterministic thing to do. Then I would obviously start looking at why something fails, like look at the logs at where it fails. Maybe it's a failing test, maybe it's something infrastructural. Maybe it's a key that's been missing, the environment variable has been revoked or mis-set or something like that.

Zan (20:44):

The component, like try to really pinpoint what we're dealing with, and which step it happened in. After which point then you kind of start looking at that specific step and seeing okay that's something that could be looked into. Tests obviously are very different to more procedural or scripted parts. What else do we have? Yeah. If it scripts, then obviously try to add some more logging, some seeing if there is ... yeah, some environment stuff that's mis-set or just misconfigured. Maybe it was something that's been ... obviously, if it's CI/CD, and it's happened now, and not happened when you previous commit, see, the differences between those two commits and that usually gives you a good idea of what and how it might have gone wrong.

Zan (21:45):

If it's tested obviously it's especially when you're dealing with emulators and external devices integrations, then that can bring in some more flakiness, which is an art form itself four minutes on its own. Flaky tests, how to fix those. How many retries is enough? Do we try to kind of avoid this rewrite tests themselves to get them back to working? Yeah. That's kind of my thing and yeah, once you kind of know where it is, then you can start looking into more advanced ways to pinpoint it.

Anton (22:29):

Yeah. I guess I can continue right from there, so again, the most important thing is to realize what was the problem? Did it compile? Was it the test? Was it the runner itself? Quite a common issue is actually let's say it's some sort of long-term project, and independent of the language whether it's JavaScript or Python or even Java sometimes. Some library changes. Your code didn't change, but because your build is running in a container and the new version of the library, stops either supporting some deprecated piece of code, or perhaps just changes the way it works.

Anton (23:10):

Then you're like, "I didn't change this piece of code, but it no longer compiles. What the hell?" Then you just have to realize that you have not updated your code. You have not removed some deprecated parts. Or sometimes it's even the configuration. Because sometimes you have incorrectly configured the repositories to get the external tools from. Like you're using the wrong Docker image or you're using the correct Docker image, but let's say they didn't move. Or Red Hat moved to some new version, and you're using the latest one, and now it behaves differently.

Anton (23:46):

This specifically relates to let's say JavaScript of C++ they are infamous for strangely changing the behavior of a few libraries. Was it the user error? Because once in a while, I run into a situation while I was running a Jenkins job, and then it said, "No changes." Because I didn't realize it takes a while for the Jenkins job to realize that there were actually changes in the repository and they canceled the job. Then somebody saw that the build was failing, and there was an alarm. Why is the master branch build failing?

Anton (24:23):

You should not push crap into production, but therefore just you can sometimes just let the build finish, and see it from there. Because in the logs, you might get some strange results if it was interrupted manually. Again, this comment that Red is bad is like the attitude is like if you have a failing build, it seems like oh, no, the world is ending. Apparently, like the problem is if you have some feature, which some build which hasn't ... you've noticed the problem and it hasn't been fixed, it's not a problem if one of the feature branches is not working perhaps. It's still developed.

Anton (25:03):

Because again, they have this kind of branch-like approach. Let's say you have even several developers working on the same piece of code, like the same feature. While if it's not the main branch, so okay that the build fails because it means the developer realized that, okay, maybe my code compiles, but the tests are not running and then keep fixing it. Having a failing build is not the end of the world.

Marko (25:32):

Surprisingly often I notice that the library changes affect the build result, and it's actually funny if you look at Alpine Linux being currently one of the defacto container bases for Docker for example. Alpine has ... it's infamous for changing their libraries or just running Alpine latest. You have bumped into for example Python dependency changes on the air and also there was a huge security flaw in Alpine for many years in the Docker images roots. It's kind of funny how if you define your modem development pipeline leaning for the library versions to not changing, and then using latest or you would have MDM packages for example for JavaScript updating themselves. It might cause huge trouble.

Marko (26:25):

What many of our customers have done is that they have a parallel build running on the side, so you have one with the locked versions, and then the other one with the versions updated in real-time. Your production or mainline build isn't failing. But then you see immediately if your so-called latest build branch is failing, and you can react there without having to compromise your mainline code.

Anton (26:52):

And yet another comment. Sorry to interrupt. Do you have let's say your previous build is working, and now the latest build is not working. Do you have some sort of rollback procedure? Is the automation in such a state that it's possible to either get back to the previous version or to actually deploy the previous version? Because in several projects the update is automated, but rollback is not. If something fails, then you need to call the guy who actually knows how to do the commands.

Zan (27:26):

Yeah. I mean rollbacks are definitely very interesting. Just because you don't think about them until you need them. I mean the way it , I think, should work and in an ideal world is you switch to a different commit that has been proven to work, and then the rollback will essentially redeploy everything to that version. The world is not an ideal place, so sometimes you obviously need to ... and often you need to kind of get that person who knows the environment inside and out and get them to help out, or manage that. But hopefully, that doesn't happen, and doesn't happen often.

Zan (28:12):

Also, shouldn't happen if you have the ability to really deploy and deliver changes, which are very, very small. Because if the changes are small, you're relying on big bad rollbacks a lot less frequently than you would if you were deploying three months worth of code essentially each time.

Marko (28:36):

Yeah. Also, the rollback is usually easier said than done, so 90, 95% of rollbacks might be just redeploying the previous version, but when you have database migrations in place, how do you go about debugging either the build or the rollback procedures? Do you have any experience there?

Zan (28:57):

Luckily, not a lot. Yeah. I imagine databases are a whole different beast, and anything that deals with that usually ... ideally you have databases that are ... you take snapshots as frequently as possible, and then go to that version of the database. Potentially losing some data in the process, but ideally, you can still retrofit that data that has accumulated between a snapshot and when it stopped working, and you roll that back, but at a later stage, not as a part of that kind of rollback. Because you're mostly interested in integrity up until that snapshot. Obviously, the schemas are all working. So, I think that's a good starting point at least.

Anton (29:54):

I've worked in several projects where it's been noted that if you really have to do a database rollback, basically, we have the snapshot system, but it works but try not to do that. Because it takes a while as a big database. The only issue where I actually ran into a situation where some data was lost was in my own game development. Was a multiplayer game with thousands of players, and I decided to change some item systems, and then I realized that I treated some strings wrong, and some people lost some value, but I mean it was a game.

Anton (30:35):

The backup unfortunately was fortunately a week ago, so I made some sort of script, which managed to interpolate what they could have had realistically and then contacted them, so if it's mismatching you can contact the support. But in actual enterprise production, it's again, if you have financial databases, remember to keep recent backups. At least make a backup either the day before release if it's already a big database, or maybe even several hours if it's possible.

Marko (31:08):

Yeah. Many cloud services already provide the live backup, which is also good. But then as I said if you use say messaging systems like Kafka or then have real-time databases which you just simply cannot roll back, then it is definitely one of the big issues for big organizations. I will only add here that from my experience starting continuous delivery, in a big organization usually starts from avoiding these pitfalls, and trying to start a continuous delivery in areas where you don't have, for example, database migration or some other schema migrations in place.

Marko (31:48):

But instead, build for example microservice platform and then separate the databases completely and even then, if you have to use databases, use either local Redis or MongoDB, whichever solution, locally and then try to use message queues.

Lauri (32:05):

Hello again, CI/CD has many aspects to consider and trying to solve them on your own can be a tough job. We recently introduced a free online pipeline game where you and your colleagues can learn the perks of continuous delivery in a fun way. You can find the link to the game in the show notes. If you are a member of a local DevOps community, or you run a team who you would like to better learn the secrets of continuous delivery, we offer facilitated workshops where you and your folks can learn how to improve your software production. Reach us at marketing@eficode.com. Now, let's get back to our show.

Marko (32:46):

Ages ago, I said I have few commits in cruise control, which is probably one of those that started the CI movement like in the halfway through 2000s. I bumped into an Electric cloud back then, I think it's now part of Cloudbees but they had a concept of preflights. Preflight means that you run the build before entering it into the CI machine. My question to you is now that you're doing commits and you're checking the code locally in order for it to work before you enter it into the CI machine, how do you make sure that it actually builds and do we have some sort of a modern concept for the preflight? Which means essentially using the CI machine before actually entering it into the CD machine.

Zan (33:44):

Obviously, it's all down to scripting it, I would say. You kind of build your own workflow in the way that it works for you. Usually, if you rely on branching to develop your software, you're working on a feature branch or even a sub-feature branch, or something else, you can say don't do the deployment unless it's on main or whichever few branches that you want. You can usually kind of filter based on that and in the workflows at least with CircleCI you can do that.

Zan (34:22):

There is also the preflight checking of the config file as well, so just the validation that your pipeline is correctly configured, done by a CLI, and parses the script and that obviously checks it for you, and you can actually do that in a pre-commit hook with Git. You can just do that automatically. Then when it comes to running unit tests, running any type of tests, I tend to still run unit tests locally just because that's how I grew up with, and I'm old-fashioned, so I still like to make sure that it's all verified. But that's very much a manual process for me.

Zan (35:09):

As far as any kind of longer running more integration functional type tests, I don't want to run them locally, because they take a lot of time. That's the point of CI to take that and do that while I'm focusing on other things, and give me obviously the signal as fast as possible. But not kind of keep me bogged down with just waiting forward to pass while I'm trying to concentrate on getting this feature shipped. Yeah. As far as deployment, you can always add more deployment avenues. You know how ... I think Heroku popularized that kind of staging PR deploys essentially, so that was a pretty cool feature back in the day, where you were able to just kind of spin up a new environment for your PR to check whether something works and for a website that's just beautiful.

Zan (36:07):

Because you can again offload all of that in an asynchronous way, and still protect your main deployment pipelines. That's kind of my take on preflight. You can do it, but building those things into your existing pipelines is probably a better way than to separate it.

Anton (36:32):

Yeah. Very similar notions again, so also when you are developing, if you have whatever IDE you are using, well, let's hope you are not developing in Vim at least I guess this is a controversial topic, but let's say you're using some sort of graphical editor, most of them come with at least error checking, perhaps even linking tools so you can see. Let's say you are using visual studio code and you are doing some commits, you will see that your code has some sort of errors even before you commit to the branch, so you realize that something is still wrong.

Anton (37:10):

In the similar note, let's say you have this sort of branch space development before you merge your branches to main, let's say you have a team of five or maybe 10 or maybe even 20 developers working on several features, which are all scheduled for some sort of monthly release. A good practice is to have after all these ... all the branches have been merged together to still run the check not on the individual branches, but on the merged version. Because sometimes even though the code in the different features works, once it's merged together, one ...

Anton (37:48):

Change A brakes change B, and then before you are pushing the master, it's not very easy to decide what's the fix. Because you might need to spend some time fixing it, so having this kind of actual environment let's say you are a web-based application, as most of them are, you will have to see that it still works, some sort of UAT environment. It can even be temporary like you can run the tests on this environment, but it can also be accessible to your developers.

Marko (38:19):

Yeah. I've seen that many of the best performing organizations have already moved away from the so-called staging environment. For example, Smartly is directly deploying their pool requests to a production like environment, where the users can then just go and both demonstrate the new feature and test that it works as it should. Once it's merged, you just go into the production and run it from the merge. Now that we have for example say ... GitHub has the pool requests that are building on top of each other, it's actually a good way of doing such deployments for example using the CircleCI for the purpose.

Marko (39:02):

I would like to talk next a bit about the skills needed for running the CI/CD, and what kind of new skills a does a developer need in order to be able to do modern software development. What kind of needs or skills can we get from somewhere else like for example DevOps or Ops people. What skills are needed and how do you see the separation between the developer and someone else's responsibilities ?

Zan (39:31):

I mean the most important skill for CI/CD is definitely very good eyes to make sure that you see all the white space in YAML. Jokes aside, obviously there is a lot of YAML, and understanding how that works definitely made me a lot more proficient with CI/CD where I really started using visual studio code for example, which shows you those kinds of lines to make sure that indentations are all in the right order. I mean the most ops-y type of skill is definitely reading the logs, reading the build logs, reading the test logs, any kind of stack traces. Just kind of understanding how your platform is spitting out information about what's happened or what's gone wrong especially is crucial.

Zan (40:27):

Whether you're a Java developer, you obviously want to know how stack traces work. But also how and where to look when you see one, and that's obviously specific to each platform, each tool that you're using, but that's like the main thing. Building on that, obviously you have the logs, you have the ability to see them, it's great to be able to kind of filter through them very easily, so Grap and piping them to Grap and just kind of understanding those basic terminal commands to work with that kind of a lot of textual data is great.

Zan (41:15):

I think that's more ... historically been more of an Ops skill as well. Now it's definitely quite common and it's one I do encourage every junior developer to really start picking up at least invest a little bit into that. I'm not talking like understanding in-depth how AWK and SED work for example, because I don't know how they work. Usually, you can find that on Stack Overflow if you need a particular command. But yeah, at least using the basic terminal commands like grabbing and that kind of stuff for regular expressions in logs and binding things in what's happening.

Zan (42:00):

Yeah. The other tools are obviously when you get to SSH into somewhere, you want to know how to use a textual-based editor. I personally rely on visual for everything, and it's I'm still very slow and very clunky with Vim or Nano or other tools, but at least I know how to edit things and get into where I need to be and with some Googling sometimes, but that's a good skill to have as well.

Anton (42:31):

Yeah. Continuing on that topic exactly even if you don't know how to do something, at least some general understanding that you can Google or use Stack Overflow, or at least understanding what are the tools, which are available in every system. Basically using that and of course, you don't need all the esoteric Git commands, but at least one thing which I haven't used for a long time before ... I think it was actually good blame to understand that you can see so what is the purpose. You understand who has made this change and what is this related to, because very often you realize, okay, there is this feature which was undocumented and changed a few releases ago, and now something broke. You can see where it came from and now you have some better logic to find it, and also get div like you want to compare to files what has actually changed.

Anton (43:24):

There are visual tools for that. Nowadays you don't really need Git commands, but assuming at least you know that there are these Git commands. Because at some point, you might actually have to go to a command line not be afraid of using the command line, and at least understanding that you can press up to get the old commands. Surprisingly enough at least nowadays most junior developers are still typing their curls like every single time. I was like, "How many minutes are you wasting every day?" But you should understand that the old command histories there, it's a build machine and how SSH works, and how SCP works. SCP is one of the things, which I still Google, but once a year I sometimes forget which way it works. But if you need to copy something from the machine or to the machine, it's still ... we are not moving away from the command line.

Marko (44:22):

Yeah. That's for sure. I would add that for Git commands, the listener should at least find out about Git Stash, and Git Cherry Pick, which are two of my very favorite things to do alongside Git Blame if you have any troubles for once you develop something.

Zan (44:42):

For a set, so in essence knowing how editors work, knowing how the common line works, also there is a cool tool called Postman. So, if you are doing internet-based development, you can use either the browser developer tools or the Postman to pick the actual command line comments such as curl from Postman, which then in a sense helps also to start learning the command line work with the examples coming from some other tool that you can then first use graphically and then start moving towards command line.

Anton (45:19):

And of course holy word question I use Insomnia, because I like the Insomnia interface, but it's an alternative to Postman.

Marko (45:26):

Well said. Well said. The next question would be how do you share understanding and skills for development across the whole team?

Zan (45:38):

Obviously, it's important. I think the best way to learn is to be involved, and it's important to really get everyone involved and equipped to be involved in the process, so we talked about the core skills being the terminal and Git and all of that. I very much encourage every single developer to understand those kinds of core terminal and Git skills, so that they can then be more involved in the rest of the process.

Zan (46:12):

Same goes for reading logs. It's a skill that needs to be not only taught but also fostered so that you keep practicing it. Once you have that, you can essentially get the whole team on top of how CI/CD is configured, and they can actually get benefit from looking at the scripts, looking at what happens in a build, what happens in a deployment, and understanding that.

Anton (46:44):

Actually, another comment. It's also the other way around. Let's say you have a dedicated DevOps guy who is doing the deployments. If he or she is totally not understanding what are the new features, what's going on, what is going to change, it might cause a problem like okay, this is not building, but do we have a reason? Do we have a big new feature? Are we changing some sort of libraries? Are we changing some logging? Is it expected to fail? So that let's say you have a dedicated DevOps team that knows what are the new features and what are the upcoming features.

Anton (47:24):

If we have a known problem, let's say we're using some library which is becoming deprecated, that at some point this build is going to break. That it's clear that maybe this is a temporary solution. I mean again, we are not in a perfect world. Sometimes we have deadlines to meet, and once in a while even if the code is good, like this solution might be changed later. But it's clear that it will break eventually, this mostly has to do with let's say outdated libraries or some questionable JavaScript or Python developments.

Anton (47:56):

Yet again, developers should make sure that the build is working before it's actually committed. If you know this is not going to compile, why are you committing? Why are you saying that this is done? Tool request itself is like even if somebody checks your tool request you have to make sure that this is compiling before you actually do the poll request.

Marko (48:19):

That is one of the things when you create a branch, I think the branch can fail as Zan already said that if you run your unit test, you make sure that it compiles builds, and you have integration acceptance tests, maybe end to end tests that you cannot test on your own computer or the local environment. Then having the builds failing in your branch is quite okay. But then latest when you go to create pool requests and of course, the mainline red is really deep red in comparison to the branch builds that you would do with your team or individual.

Marko (48:59):

Before wrapping up, I would like to ask Zan, if we wanted to now go about and start using CircleCI, and find out how it works or even merge our current builds to CircleCI, how would you go about doing it?

Zan (49:14):

If you want to start using CircleCI, the best way to do it is CircleCI.com or CircleCI.com/developer for the developer and documentation portal. The tool is actually pretty handy and in the way that once you start ... Once you create an account login and kind of connect your VCS provider, let's say GitHub, it gives you a list of projects that are accessible to you to pick. Then you basically pick a project, and it usually detects what language, what framework it is, and gives you a sample configuration to get started. That usually runs some build, maybe runs Linter, maybe run some unit tests. It's not very optimized or it's not tweaked to your needs, to your project. But it should give you the starting point for going ahead with the rest. If you're looking to migrate from Jenkins, we actually do have a Jenkins' converter tool, which essentially does the ... converts your Jenkins' configuration into CircleCI, and that's kind of another thing to give you a bit of an edge, or few steps. It helps you to make the first couple of steps.

Zan (50:32):

Yeah. That's kind of a good place to start I suppose with CircleCI.

Marko (50:40):

Sounds good. Yeah. I remember when the Jenkins file was introduced, I had to create a Groovy Engine for myself to actually see what is going on in the build file. Having this kind of converter definitely comes in handy. I also noticed that from the CircleCI documentation, there are a number of other migration guides for various environments too. How to see, how to get up and running with CircleCI.

Marko (51:06):

Let's do a wrap-up, what kind of advice would you Zan and Anton give to our listeners so that they can create and deliver apps more efficiently and most importantly, more enjoyably?

Zan (51:21):

Yeah. For me, it's essentially get the whole team on board, get the whole team equipped and able to contributing. Because DevOps at its core, it's a cultural thing. It's all about getting people doing the things to automate what machines should be doing essentially. The first step would be to get everyone kind of the basic knowledge, and equipment to be able to contribute, and then encourage them to contribute to those processes, and to the setup, and ultimately owning the entire CI/CD pipeline as a whole team, as opposed to having one person relying on it.

Anton (52:06):

Yeah. Not much to add to that point. Again, the team should be I guess interested in ... don't blame like this is the frontend guy's fault, or this is the database guy's fault like you are trying to develop an application or whatever system you are developing, but it's a team effort. You don't play it's ... the build doesn't work because the DevOps guy doesn't know how to write the pipelines. We're a bit running out of time, but again, just make sure the whole team is interested in delivering reliably.

Marko (52:40):

I'm going to add that as Zan said culture is the most important, an essential part of DevOps, so being working in a blameless environment where everybody is working towards the same direction is the number one thing. It's often very difficult to achieve, but it's something if you go for continuous delivery, it's essential.

Marko (53:03):

I'm going to add in that you cannot improve what you are not measuring. I know it sounds harsh after talking about culture. But the fact is if you want to start improving your current practices, you have to start also measuring. One of the ways to measure is of course starting to do CI/CD.

Marko (53:23):

CircleCI actually has a really good podcast from six months ago, how to measure DevOps success for key metrics. I had this article as part of my speech in the DevOps conference just a month ago, and this article or blog article will be featured in the text below the podcast. You should go and read that as an additional read. For this podcast, many thanks to you Zan. Many thanks to you Anton for joining in for the discussion over CI and CD.

Zan (53:58):

It was a pleasure.

Anton (53:59):

Thank you.

Zan (54:00):

Thanks for having me.

Lauri (54:01):

Thank you for listening. If you want to continue the conversation with Zan, Anton, and Marko, be sure to check out their profiles on Twitter and in LinkedIn. You can find the links in the show notes. If you have not already, please subscribe to our podcast, and give us a rating on your platform. It means the world to us. Also, check out our other episodes for interesting and exciting talks. All I'm saying to you now is take care of yourself, and keep up the Zero-day delivery.

Published: May 18, 2021

DevOps CI/CD Sauna Sessions

Eficode

Subscribe to our podcast

Related tracks

Sauna Sessions

You’re not compliant (but passed the audit)

How do we keep track of our development pipeline, the builds, tests, commits, and security scans? Is there such a thing as a black box for software?

Go to full transcript

Sauna Sessions

Discussion with Xray: How culture builds quality?

How does culture build quality? Why is it hard to build high-performing teams, and what characteristics should the teams be looking for in their tools?

Go to full transcript