A week before The DEVOPS Conference, Andy Allred, Lead DevOps consultant at Eficode meets Mike Guthrie, Engineering Team Lead at NetFoundry to discuss zero-trust in a DevOps environment. Will you sleep better at night if your DevOps environments are dark?

Mike (0:07):

Stop assuming trust with your internal systems. As we've adopted a culture of security within the company that I'm at, and actually watching the team, the teams start to think more in that way. I think I would challenge the community as well, that you can move fast, but we can also be secure. And I think sometimes we get a bad rep for running fast and loose. And I think with some of the new technology available, I don't think we have to do it that way anymore. And I would just challenge the community to step up the way we handle security, especially with the tools that we're using every day.

Lauri (0:41):

Hello, and welcome to the DevOps Sauna Podcast. I wonder if you have already heard about The DEVOPS Conference. It is coming again on March 8th and 9th and you are invited to join our event. At the conference, you have almost 40 awesome speeches to listen and to select from, and one of the themes is security and compliance and that’s what we're talking about today. The topic of today is, will you sleep better when your DevOps environment is dark? What does it mean to make your DevOps environment dark? Well, we'll find out. We are joined today by Mike Guthrie, Engineering Team Lead at NetFoundry, and Andy Allred, Lead DevOps consultant at Eficode. Let's tune in.

Lauri (1:30):

Thank you both Mike and Andy for joining the DevOps Sauna Podcast.

Mike (1:34):

Thank you, guys. It's fun to be here.

Lauri (1:36):

Thanks. The topic today, if I'm not mistaken, we're talking about zero-trust today, right?

Mike (1:42):

Yep. Talking about zero-trust and, you know, will you sleep better at night if your DevOps environments are dark? With any DevOps environment you've got, you know, you kind of get your production app that everybody works hard to, you know, we harden it, we do all you know, security audits against it. We do pen-testing against it, all these things but then we've got all this peripheral stuff, right? All the weird systems that you build to support the main system and we don't necessarily put the same rigor around those, right? And it's like, you know, we think about, “Oh, yeah. Go ahead and try to break into my front door of the production app and so forth.” We've hardened that thing, it's solid but then you say, “Hey, pen tester. Come try to beat up on my monitoring system. Come beat up on my developer's support portal or you know, come beat upon, you know, the CI/CD system. I think most people would get a lot more uncomfortable and there was some stuff that, you know, as we started to see major hacks across the industry and they were getting into the DevOps systems, you know, they were getting into CI/CD. They were getting in through monitoring and things like that. And they just kind of got us thinking where it’s like, I know for me there were a few systems where like, “Man, it's like the door for this particular system is really not as solid as I would like it to be” and it actually started with our data warehouse to where we had cloud staff that needed to talk to the data warehouse and I was like, “Yeah. The door for that is just not as good as what I would be comfortable with. It’s a lot of data. It's a really high-value target. And so, that's when I decided, “Let me take a stab at and put a zero-trust networking technology in front of it. Let's see what that does for it.”

Mike (3:17):           

So, it started with the data warehouse and because we've got, you know, end-users that need to talk with it and interact with it directly but they're all over the globe, and you know, we're a remote company so some people might be at a coffee shop, some people might be at home. So, it didn't really make sense to do a classic, you know, IP whitelisting type of model. So, we needed to look at, you know, the zero-trust model just simply landed itself much better because, in a zero-trust world, you're not worried about IPs and ports so much as you are about services and identities. Anybody who's accessing it is a known entity, they’re a known identity. They have to enroll and they are granted explicit access. So, we tried putting our data warehouse or putting zero-trust in front of our data warehouse so that people could access it. And typically, like coming from an Ops background, anytime somebody says, “Okay, we're going to lock this thing down or we're going to, you know, add new access controls.” To me, that was always synonymous with like, “Okay. Now, we're going to massively break things for like two weeks and it's going to take a long time to figure out all the roadblocks that we've introduced for ourselves. And what I found with introducing zero-trust networking was actually so much easier than I was expecting it to be because we were able to kind of pre-validate all the changes and pre-validate everybody's access before we made everything dark.

Mike (4:34):            

And so, we got everybody enrolled. We got everybody set up with the proper access. And I was speaking from the context of OpenZiti and our enterprise offering with NetFoundry with it. One of the things that we have with it is quite a bit of metrics and visibility around it so that anybody who's accessing a service, we can actually see, you know, what identity was accessing what services, and when. And so, we're actually able to pre-validate when we did the migration and say, “Okay. You know, let's put these users into the zero-trust network and have them start intercepting and going through the dark network basically to get to the data warehouse.” We can actually see, “Okay. Yeah, they're able to access it just fine,” we can see their traffic flowing, everything's working and we were able to do that before we made the service dark. It's actually made it really easy so that when it came for the switchover day, we were like, “Okay, kill the public access.” It was basically just a none event. Like we killed it and I was expecting fireworks just because of my history working with this stuff and then it was nothing. Everything just worked and we're like, “Okay, cool.” Like, “We're done. Let's go do another one” and just opened our eyes to where it's like once you kind of understand the process of what you need to do and you just get everybody set up and you have the infrastructure in place especially once you do it once. Adding additional services and making them dark and putting it behind the zero-trust network. We're like, “This is actually way easier than I was expecting it to be and went a whole lot smoother than I guess it would”

Andy (5:49):  

And when you say you made the service go dark, I have quite an operations background myself. Many times, my CI/CD pipeline or some of my monitoring tools were completely dark to the security guys because they didn't know about them. That was the way I was able to maintain them and use them from anywhere I wanted, and from whatever coffee shop. Everything went just fine. But I'm guessing, you know, when we get to zero-trust that going dark means a little bit different.

Mike (6:19):   

Yeah. And you know, when I say going dark too in the zero-trust space too with the OpenZiti technology, what it does is that you'll have a fabric mesh that typically is hosted in a public cloud. You’ll have an endpoint or an identity. Let's say my laptop is an identity but also my cell phone might be a separate identity. And maybe I want to implement separate access controls for each of those because maybe I don't need access to the same stuff on my cellphone.

Andy (6:50):  

So, do you have agents? Do you have an agent running on your laptop and on your cellphone?

Mike (6:55):   

Yes. We have what's called a tunnel that will kind of live on whatever that identity is, and that’s…

Andy (7:02):   

It's kind of a similar idea to like just a VPN client or something like that that is just an agent that routes some network connections?

Mike (7:11):   

Yeah, and so what it does is that it'll pick up, and then it'll intercept the traffic. The couple of major differences that it would have over VPN is that with VPN, you're essentially kind of opening the door and opening the gate to the whole private network whereas this kind of, you know, in addition to having the encryption and so forth and having that identity-based access, it also lets you implements at least privilege as well because you can get very explicit in terms of what that identity has access to. So at the network level, they will only have access to the things that are explicitly granted for that service. So, instead of, you know, exposing the entire network in whatever ports and IPs are available inside of that, you're only granting them access to given services. And like say, for the desktop clients, you can actually see which services you've been granted access to and so forth.

Mike (8:01):            

In terms of on the other side of the equation, in terms of how you make something dark is that you can place what we call an edge router. You can actually place that inside of a private network space. The way that works is it actually calls outward and maintains a persistent connection to the mesh out in the cloud. And because it's an outbound, a punch-out so to speak, you can actually close all ingress to that network space. So, in terms of traditional networking, you can close the door entirely so that nothing can get in. And it's actually that persistent connection that's allowing things to go in but it's only granting as you give that explicit permission. So, essentially, you could have an entire, you know, VPC or an entire security group be completely dark with no ingress at all.

Andy (8:46):  

Okay, but then from the user's point of view, you don't do anything different than install this client and conceptually think of like a VPN client type of thing, and your services are just granted for you. But then, behind the scenes, all the networking is kind of magically, I'm waving my hands here, magically taken care of for you and you just get connection to the services you're allowed to see.

Mike (9:09):   

Yep, that's correct. In terms of the access as well, it's managed through what we call attributes. Think of like tags and so, in the same way that you would manage like I don't know if you've worked with like an active directory where you put people in and out of groups and that controls their access at an application level. It's the same kind of concept where you simply give somebody an attribute and that might represent like an access group or things like that. And that's essentially granting access at the network level. So, in addition to whatever authentication your application might have, this is granting them access at the network levels just so that they can even talk to that network service.

Andy (9:46):   

Right. Okay, and I guess you do integrate with some active directory or user directory somehow?

 

Mike (9:52):   

Yeah. So, for enterprise offering, yeah. We have something that allows, you can sync your active directory groups. And actually those groups can actually become attributes and so forth and so you can maintain that. In that way, you're using an active directory to actually manage the network access as well.

Andy (10:07):

Right. So, you're not setting up a new set of data for all the users. You're just kind of reusing what's already there and mapping groups to services and such.

Mike (10:16):

Yep. The customers that we have that are larger, they might have a couple thousand users that's kind of the only feasible way that it works. The other thing that we're doing as well, we should hopefully have an article coming about this pretty soon too but we're trying to document and demonstrate how to do basically network as code to where you can actually take a YAML definition. You know, tie that to your CI/CD system and so forth. And basically, you have declarative networking in that way to where everything is defined in a YAML file. And basically, you simply submit that YAML file and it'll use some libraries underneath to actually basically DIF and so forth and it will manage that.

Mike(10:55):           

So if you say, “Okay these 10 endpoints, they need these attributes.” You could commit that to your version control, pull up a pool request and then you know, your CI/CD system handles actually applying that change for you so that way you don’t end up in the same challenges that you have with, you know, like say AWS where people are clicking around on the console and making changes and you don't know what they did. So…

Andy (11:15): 

Okay. So then, the big advantage from this over like a traditional firewall or even, you know, network firewall, network rules type of thing is traditionally you just block the unneeded ports but usually from blocking the needed port unless you already have a client and part of the mesh.

Mike (11:33): 

Yep, exactly. Because you can make connectivity work with no open ports at all. At least, you know, within kind of your, you know, traditional firewall and things like that. It creates a lot of new possibilities in terms of creating that secure connectivity. You can do it quickly without creating, you know, security holes. You know, I remember the other previous company I was in. The first job when I moved from being a developer to a DevOps guy was to revamp the monitoring infrastructure because I had a background in that. So, of course, you know, I stand up the new infrastructure but one of the first things I needed to do was go sit down with a Linux admin and be like, “Okay. I need you to punch these 26 holes in the firewall because I need to monitor everything.” And so, I had my list of all the things. And although that was necessary, it's not ideal either because of course now I'm just punching new holes in the firewall and creating new points of entry.

Andy (12:23):

It's known tools that you’re using so the ports are well-known and kind of everybody on the internet is using the same ports for the same services. So, it's easy to figure out which ports to try checking and whatnot.

Mike (12:35): 

Yeah. And you still, of course, you know, do your best to try to lock access and use only the sources and things like that. With that, you're still even…anytime you’re leaving a port open, I think we're seeing this emerge more and more. It's like you leave a port open, you're still subjecting yourself to network-level attacks and you're still accountable to any zero-day vulnerability that's ever going to get found for that thing that's on the other side of that port. And we've seen a few of those in the last few years too and some of them, there's a couple too that I really like, that add some zero-days come out for almost like, “Wow. That's horrendously bad.” That particular tool had a zero-day on it because it's like, you know, you know people are using that to manage their infrastructure and automate a lot of things. And I think, you know, as I’ve started working with this, particularly for us in the DevOps space, you know, we don't like to administrate systems. So, our way around that is we try to automate everything. We build a system to administer the systems for us. And so, in order to do that, we of course, give it all these superpowers to do all these incredible, awesome things. But then, you know, if we stop and think about it, it's like, “Well, what happens if that really cool automation system gets compromised?” Like you think about the level of damage that that can do because we've now replaced a human operator with an automation tool, you know, that's full of code. It’s like, “It's a CI/CD pipeline.”

Mike (13:58):          

So, of course, what does it do? It runs database migrations. It deploys and executes codes in all the places that actually matter and, you know, runs configuration management for us and it does all these things. It's like, “Well, that's great,” but I mean, why do you think, you know, some of the major compromises we've seen in the last few years are going after CI/CD? Well, it's a perfect venue. They're not going to come in the front door. It's like you got, you know, encryption all the way up to your load balance. It's like, “Well, they're not going to try to crack your encryption. That's hard,” you know. They're going to go around the back door and be like, “No, let me just get inside your private networks base” because by there, you know, your security password is weakened and you know, you're not trying so hard to protect everything.

Andy (14:38):

Yeah. And conceptually, you kind of think that well this is in my CI/CD pipelines so I only have the secrets or the tokens or the users in there. It’s one place I need to keep secure. So, this is already better but then it's also like one place to attack, one place to look at. And when you get that, you get the keys to the kingdom.

Mike (14:57): 

Exactly. I think that's why for us in the DevOps space, we need to be thinking more in this way. I think that where we seen really bad system compromises, it's because we have assumed trust in a lot of our internal tooling. In a public API, we don't assume trust because it's public. It's the front door. We promote it, we sell it, but because we don't promote and of course our internal tools, our data warehouses, our CI/CD systems, our monitoring systems, we don't promote them publicly. We don't protect them with the same rigor either. And I think these are the systems that we need to really start thinking about how do we secure these better.

Mike (15:25):          

And these are all systems that really lend themselves well to the trust model because generally, it's a very finite list of people and they are actually, you know, people and entities that are like only known and trusted people should be accessing these systems. And therefore, I think they lend themselves really well to a zero-trust model because there's no reason, you know, a port needs to be open to anybody everywhere for a given system like this. It's like only the people I know personally and I've granted access to explicitly should be able to access this. So, I think these types of systems lend themselves really well to zero-trust.

Andy (16:08):

Yeah, and I guess if you’re granting trust per service, per environment, per user kind of level and blocking all connections except those, then it's much more comfortable to give kind of temporary access or developers more access than typically you would and whatnot. And I guess, that has some speed benefits as well for developing, debugging the issues, and whatnot.

Mike (16:36): 

Well, we've probably all have been putting in the position where it's like this developer, something's broken a prod and this problem, it's only existing in prod, we can't reproduce it anywhere else. We need to give this developer access to talk to prod so they can debug it. But it's like we've all had that person where he's like, I don't know if I want to give this person access to like all the things. It's like I'm a little nervous about that, maybe they're new, or maybe they're just not familiar with the other 80% that's up there. And you're just like, “Gosh. It's like…” and you know, it can be a little disconcerting for things like that or operations versus developers. Or you got one developer that is more focused on building-related stuff and another developer that's more focused on, you know, operational-centric stuff, things like that.

Mike (17:18):          

You can separate and isolate their access within even the same environment. You don't have to open up the whole door to everything. And I think in terms of, as we think about securing our system, it's like that’d be thing I challenge people in the industry, in our space to start thinking about it that way of stopping opening the door to the entire private network space because it's a massive opening that we live and I think that's what hackers are targeting these days. It's like, I just need to find a way to get inside because once I'm inside, you know, you're basically inside the city walls. You can wreak havoc and you don't put in the same protections, so.

Andy (17:54):

Yep, yep. Exactly.

Lauri (17:55):

I was thinking about when you said that basically all the systems themselves are automated and then we are sort of putting our effort Into putting our hands on the automation systems themselves. That begs a question of whether performing security testing for the actual output of the CI/CD is as important as performing security testing on the CI/CD itself.

Mike (18:21): 

Yeah. Typically, you know, when you have, you know, a security audit or compliance checklist at some point and this is just kind of, you know, real-world having to deal with it. You have to put a scope to it then you have to say, “Okay, the boundary lines in terms of what we're going to audit, they start and stop here.” And typically, we'll put those within, you know, “Okay. Only our production accounts or only our production application space and so forth.” We just typically don't extend those security checks beyond that because it just from a practical standpoint, it is very difficult to audit literally everything in your entire ecosystem. It's not very practical. So, of course, we don't do it but as a result, we also kind of shoot ourselves in the foot because there might be some glaring vulnerabilities in that space, and yeah. Those systems are actually deploying and executing code in the production space. It's kind of a gap that I think just out of being practical and people needing to get their jobs done, they've maybe willfully overlooked. And maybe for audits and things like that, you know, maybe that's what they need to do and so forth.

Mike (19:24):          

But it's more we just need to think about securing these tools better. Having done it and I started doing that with the data warehouse and was able to do that pretty smoothly then I'm like, “Well, this works cool. Let me do that to our CI/CD as well.” I was able to lock that down, secured as well. Having gone through it a couple of times, now we're looking at it for SSH access, you know. That common practice in the cloud is you've got, you know, bastion to get into, you know, which serves as the door and it's like, “Well, there’s no reason that bastion port 22 needs to be open to the world anymore. We're just going to put it behind Ziti and we're going to, you know, make it zero-trust now. So, that's the only people who can even hit that port or basically, just close off the public port and we put the tunnel on the bastion itself.”

Mike (20:05):          

And so, it tunnels and actually exits on localhost for the bastion and then forwards on from there. So, we're in the process of moving our bastions to zero-trust and being completely dark. Once you've done the model once, you realize it's like, “Okay. This actually isn't so bad.” I stood up like a Grafana server recently. Who doesn’t love Grafana in the DevOps space? But I realized I'm like, “This is connecting to a lot of data sources.” And, you know, why not from day one, we're just gonna make it dark because we know how to do this now and for us, it's becoming kind of the new standard of just like, let's just make it dark because there's no reason to open this up. The only people who ever need access are internal and it should be trusted parties and we're starting to do that now from day one, which is kind of cool that we're just able to start launching new tools that are completely dark and just have it work and it’s seamless and that's just part of our internal processes now.

Lauri (21:00): 

It’s Lauri again. We had a webinar with Bank Data, one of the largest financial companies in Denmark. In this webinar, they share their DevOps journey and tooling choices and focus on how DevOps practices and tools help integrate security and compliance requirements in the software development. You can find the link to the webinar in the show notes. Now, let's get back to our show.

Andy (21:29): 

We talked a lot about the users getting access and developers and groups but does this apply the same way when we're talking about services that talk to others services?

Mike (21:40):  

It'll depend on your situation in terms of what you have control over. Some of the customers that are most interested in Ziti simply because it's open-source and so forth is the fact that you can actually do a fully app embedded implementation. So, you can actually have an application itself be the identity and then an application on the other end be the identity. You can actually do an endpoint-hosted service, so to speak. So, you can actually do, you know, it's fully encrypted like truly end-to-end to where, you know, at the accessing application all the way to the terminating application where everything is, you know, your SSL termination is application to application. There's no point ever where your traffic's unencrypted. It’s been hard to get people thinking in that way in terms of zero-trust but it's like for us, when we say zero-trust as a company, that's actually what we're really talking about to where you truly don't trust anything. Not even, you know, your Kubernetes pod or your ACS cluster, anything. Like we just assume everything is hostile and we want everything encrypted completely and the customers looking at us just simply because they have incredibly strict security requirements. And that's actually what's most appealing to them because they're like, we never have to decrypt our stuff ever and there's never a point where that traffic can be intercepted and so forth because you're encrypting application all the way to application.

Mike (23:01):           

And it's a fully app-embedded implementation, and it's truly a zero-trust situation to where it's this application talking to this application. There's no middleman anywhere. You're not always going to have that. You might only have it where you can control one side of the application or you might have it where you can't control either side in which case that's where the tunnellers come in. And you can put those and the idea is that you try to get those tunnellers as close as possible to where the traffic is intercepted and then of course, where it is exiting the fabric on the other side. You generally want to assume you and your ECS cluster or even your Kubernetes pod assume that other containers inside that are hostile and so forth and so you want to try to get that tunneller as far in and as close to the actual destination as possible.

Andy (23:46):

Right. Okay So, then for like external SaaS services or things like that you would want to try to expose them on the inside you're really private lockdown network and put the tunneller there that then people connect to via the tunneler?

Mike (24:01):

Yep. Yeah, we did have to, like for our data warehouse for example. We did have a cloud service that needed to talk to it because we used it for data visualization. So, we did have to create a basically white list. You go find the outbound whitelists from that particular service provider and we did have to whitelist those IPs because we can't control that side of the equation. You know, we could control, you know, the end-user access and so forth but we couldn't control the cloud provider because they were also a public service and so forth. So, there's times where your hands are a little bit tied and the best you can do is, you know, create a whitelist for those types of situations. We've had that for a few, like I had that for web books coming from our version control provider. We needed those web books coming in for our CI/CD so we still had to create a whitelist for that. Whereas, everything else is using zero-trust access to get to it.

Mike (24:58):          

Sometimes, your hands are tied and you do the best you can with the options that are in front of you but you know, I kind of look at it too as let's be practical. Let's be realistic. Let's do what we need to do to raise the bar on our security. But, you know, we also need to get our jobs done too. I think what was fun for me is you know, you may have experienced this too, you talk about the CI/CD systems keeping them hidden from the security guys where it's like, there are times where DevOps, it's like feels like we're the troublemakers to where it's like, because part of our job is to make systems talk to each other and automate things and wire stuff together. So, it's like, there are times where it's like, “Oh, the security guys hate us because it's like, yeah we have to go walk around and be like, “Hey. I need you to, you know, open the doors to all these things so that I can automate stuff and so forth”. And so, you feel like you're walking around punching holes in security all the time because of the nature of wiring stuff together, you have to open doors and make things talk. And that was one thing whereas one being able to work with this model because it let you connect things together quickly, you are not that problem child anymore, where you're opening all these doors to where it’s like, “No. I'm actually elevating our security posture for our internal systems and moving fast and I'm still able to wire stuff together quickly.”

Andy (26:07): 

And then you're able to walk to the security team and give them a blank piece of paper and say, “Here's all the IPs and ports you need to scan.”

Mike (26:14): 

Yeah, exactly. That’s what they look for and they're like, “Well, there aren't any. Good luck,” you know. I once had a friend of mine that I worked with that was a fantastic security engineer. He kind of helped change my perspective about security where he said, “You know, most hackers, they don't want to work hard. If something's difficult, they're going to go move on and go find low-hanging fruit” because it's just easier. If they see low-hanging fruit, they're going to go get it simply because they can. You make it hard enough, they're going to move on. And you know, I think about that and it's like, “Okay. You know, if you don't have any open ports as ground-zero, well, good luck,” you know. And so, it's like they might get somebody’s system but they’re not going to get mine because it's like there's not even a point of ingress. So, have fun trying to get into that. You always have to be diligent and keep thinking about, you know, what are spaces that are kind of left out in the open and so forth. But at least for me, it's like I want to talk about how I sleep better at night. It's like, “Well, okay. I no longer have my data warehouse open to the public internet where somebody can brute force and try to get in.” It's like no. It's dark, you can't get in without trust and it's like that I'm like, “Okay. That I can live with. I'm comfortable with that. I'm not worried about that problem anymore.”

Lauri (27:24):

And since everything's going through this fabric network, I guess it's pretty easy to have an audit of everything that's happened, and then who did it.

Mike (27:33): 

Yeah. I always like to talk about this because I got to be a part of helping build this actually but when kind of getting to be at the ground level for the OpenZiiti technology and so forth, I looked at traditional network monitoring technology and it's actually where I came to work at NetFoundry is because I was so frustrated with the traditional network monitoring. Because I looked at it and I’m just like, “Nobody's done anything cool and interesting in this space for like 20 years”. We’re still using SNMP to talk to the routers and get the utilization. This is horrendous, like it's the worst. That's like, you know, a simple networking protocol that's somebody drills you and somebody's laughing in the drill. There's not much new happening in that space in terms of where the world is evolving to. And so, being able to be a part even at the design level and having those discussions of like how do we create visibility in a networking space that you simply can’t do with traditional hardware?

Mike (28:24):          

That was really exciting. So, getting to build that and also be a part of bringing that, I got to help bring out with our enterprise product because it's all identities and services. We don't care about IPs and ports anymore. So, we're not using net flow and things like that which is not this, you know, a massive lake of data that's barely useful. No, every flow of data that goes through the network we know exactly who it was, we know what service they were accessing, and we even know which edge routers it passed through on the way. It makes it so that it's like, you know if we even have a question of like, “Hey. You know, who went through the bastions in the last week, who was actually logging into the pod.” If you have that question, if you get that concern, or if you ever have that data, we're like, “Hey, we think somebody internally did something. Can we go check?” If you have your system set up as zero-trust, you have all that accountability too because you can actually go back and look at the traffic and see, “Oh, yeah. These three people used it in the last week and we can see that the use cases are all valid. Here's what they were going in to do and so forth.” But you have basically a traffic audit at the network level of who specifically accessed what services internally.

Mike (29:29):          

And that's something that I think is unique. I still remember that at a previous place I worked, I had a question one day of, you know, it was “Hey. We had to let this person go.” It was kind of like that awkward question of like, “We maybe have some concerns. Can you go check for anything weird or unusual you know related to this person, you know, around your systems?” And of course, that's a horrible question because it's like number one, you know you don't have the adequate login to go..., because you assumed trust inside of your space and so forth. So, it's like this person could have gotten in anything, you don't know. And you don't have clear visibility into what they're talking to. All you can do is go mine logs, look for anything weird and things like that. But it's like, again, looking for anything weird. You don't even know what you're looking for because really what you needed was, I needed to know if that person, that individual, I needed to go see what they’re accessing and was any of it concerning. And in the zero-trust world where you actually have network-level auditing of literally all the services that they interacted with because you have all of that traffic captured. It kind of solves that problem too to where you have accountability right out of the gate because everything's zero-trust. And you can see exactly who was accessing what at any given time. You know, in the same way that you have accountability with version control and now you have that with your network access too.

Andy (30:45): 

Are you capturing also what they're doing with the service or just that they're connected to a service?

Mike (30:50)

Just that they're connected and that the traffic pass through it. So, because there's a point where once it exits the fabric, you know, now you're doing whatever that application or service is doing and so forth. So, there's a point where it's hands-off and it's no longer seeing that and so forth, but it does at least record, you know, who's accessing what and so forth.

Andy (31:11):  

Yeah, okay. And then, it'd be up to application logs and whatever for anything further, but at least you'd be able to see these are the services that were connected to and at what time and…

Mike (31:24):  

Yeah, because there are certain applications, you know, cloud providers, service providers that some of them do an okay job, some of them do a great job of auditing within their application. But I think where the gap was, is that, you know, historically there's not a great way at the network level to just simply see, you know, who's accessing network services, you know, let's say you've got a Redis instance, and you've set up developer access to it. It's like, you don't know who's accessing Redis or things like that, or some sort of shared dependency, things like that. If it's just a more traditional network dependency, a lot of these things don't have any effective auditing whatsoever in terms of who's been accessing it directly. And for those types of scenarios, it's great because you can actually see who's accessing that kind of stuff. So you know, for like a developer access scenario, if you need that kind of accountability, you have it out of the box.

Lauri (32:19): 

I have a question about the zero-trust. Specifically when we look at the CI/CD pipeline, so rather the toolchains. So often, it happens that the more complicated the business environment is, the more independent tools that toolchain comprises. And then there's a whole architecture question about how do we basically construe this toolchain. So on one hand, you have version control in code analysis. On the other end, you have like artifact repository and scanning and then automate testing deployment, and then environments. And if I think, the traditional data center network logic, you have the lobby and then you have the frontend for the applications or basically web servers and application servers. And then you have the backend for databases. And you could say that it's like an increasing level of security alertness when you go from the public internet to lobby to the frontend to the backend. And if I'm thinking zero-trust, I mean…

Andy (33:15): 

At least in theory.

Lauri (33:16): 

Well, surprisingly enough, also in practice, that's a separate episode and I speak from my experience. It was surprisingly complicated to do, but it was achievable. So, my question is when you think this from a zero-trust perspective, so first question is, does it still exist this sort of idea of layered increasing level of security awareness towards the backend, and then you would have to design your model in a way that you are always trusting from inside to the outside? So, how do you approach this from a toolchain standpoint which comprises of multiple tools?

Mike (33:54): 

I think, in terms of that, it's actually you start to think of these individual tools as identities onto their own. Again, they're still known. You can assign an identity to them, you can assign explicit trust to them and still grant list privilege around them. I think that's a strength of it too, of just being able to say, “Okay, this testing toolchain, it only needs access this far.” You can assign that unknown identity and so forth. There's auditing around that, you know, when it's running and so forth. So, in terms of the expectations of even when it's running, and so forth, you can audit that, you can review that with the varying layers and things like that. The fact that you can get very explicit and very layered with the access control and so forth. It puts you in a position to manage a situation like that and actually creates that kind of trust.

Mike (34:39):

In terms of the zero-trust, it's not that nothing has access to anything, it's that you start with your baseline is nothing can talk to anything. It's the idea of like, “Hey, you can only come into my house if I know you and I know who you are. Yeah, you're good. You check out, you can come in and so forth, because you’re a trusted Identity, I know actually who you are and so forth.” And it's granting access in that way as opposed to things like, think of it in terms of, it's a different way of thinking about it where a lot of times previously, you'd set things up where you might have two data centers. And you have to create a bridge between the two data centers. Well, now if data center A gets compromised, basically data center B is also compromised because you've had to create a bridge and you'd have to open the door between them and link them. That's the piece where in a zero-trust world, it's like we got to get away from that thinking to where it's like no, like assume anything inside a data center A is hostile and assume everything inside of the data center B is also hostile. But certain trusted entities inside of data center A can talk to certain explicit services inside of data center B, and it is trying to narrow the access and create a trust model around that where it's like we know we're isolating access at the network level. We're naming things explicitly to where it's like, it's not just an entire subnet that gets access. No, it's like a specific thing. It might be a server. It might be a container, it might be a specific application or Python script. Do you know what I mean? Where it's like this Python script has an app embedded implementation where inside of the code, it encrypts it and sends it across and so forth, because it supports that type of model. And so, it's trying to get away from kind of opening the entire floodgate in terms of you know, I need data center A to talk to data center B. You know, so now the doors just open and anything in A can talk to B. Well, from a security model, that's not so good, at least in the world that I think that we're having to move towards.

Andy (36:32): 

So, you're kind of setting up that entity A talks to entity B, and the fact that they're in different data centers or regions or whatever is almost irrelevant.

Mike (36:43): 

Exactly. And that's the speed element to where you actually can move faster in that way because you're not having to go talk to the network guys who are grumpy at you because now they got to, you know, do more peering and they got to open more stuff because you haven't had to create any new doors anywhere for entities to talk. It's like no, I can connect my systems, wire stuff together. I can get my job done. And yet I have not created new openings anywhere from a security perspective. It kind of doesn't matter where entity A is or where entity B is because the other thing that you can do with OpenZiti is you can create phony DNS names, and it will intercept on that. So you can even mask the actual destination address. So you can even say, you know, sandbox.database.ziti, and it'll intercept that address and it'll ship it across, and it'll send it to wherever that database actually is. And you're not even revealing the actual address on the other side.

Lauri (37:38): 

Right. When you do secure databases, do you have plugins to MySQL and Postgres and whatever to enable this or is it something you add in your container as a sidecar type of thing?

Mike (37:52): 

There are different places that you can intercept, it kind of depends on your setup and so forth. We've done a lot of work around a JDBC driver to basically so that it can be somewhat generic and we can use that. That's one of the things that we are continuing to develop and build out and so forth. But MySQL and Postgres, they were kind of frontline targets that we wanted to hit because of course, those are extremely common technologies that people use and that allows people to embed Ziti into their application pretty easily if they can use a JDBC driver. But it's also allowed us to do things like, basically like as ziti filed JDBC driver for IntelliJ, for example. So, you can actually use that, that your IntelliJ app itself that's actually where it's embedded from there and it'll intercept within the application, as opposed to on your laptop.

Andy (38:44):

Yeah, yeah. Okay. This is really intriguing. And something I've taken a look at briefly here and there but want to take a look deeper. How would I get started? How does one install a zero-trust network?

Mike (38:58): 

I think the easiest way, there are two offerings that we've got. There's OpenZiti, which is the open-source offering and there's some kind of quick starter material that lets you stand up a network locally. And then the enterprise offering NetFoundry has. We're just kind of revamping our plans, but there's basically like, I forget the name of it. It's like a starter plan where it's like if you've got less than ten entities in your network and so forth, it's the free tier model. That'd be probably the quickest way to start up because in the same way that like AWS console lets you kind of like click a few buttons and you've got some stuff running. It's the same kind of deal where it tries to get you started to where you stand up a network, and then they'll typically be kind of the cloud hosting fabric portion and then you install either tunnellers or edge routers in your infrastructure and start making things.

Andy (39:48):

Okay, and how feasible is it to actually set it up in real-life the first time outside of hello world? Usually, the hello world getting started guide is like click-click, click “Oh, hey, this is great.” And then you do it with a real app and it's sometimes a different story. 

Mike (40:04): 

Yep. After the first time, I was relying a little bit on pre-installed infrastructure, but if you imagine clicking a button which initializes your network, you click another button to initialize your fabric mesh in the cloud. And then, we have a VMM image that's available for most major cloud providers. You drop that somewhere inside of your private network space, whatever your cloud provider is, and then you download a tunnel or client for your laptop. And that gives you an end to all the components and so forth to kind of get started. And then from there it's defining, you know, what do you want your access controls to be and so forth. And so, you'd have to define what's the actual service that I want to talk to on the other end, that I want to be locked down. You know, you give your endpoints or you give your identity the attributes necessary to talk to it.

Andy (40:48): 

Right, okay.

Mike (40:49): 

But getting started from ground-zero, you know, it's probably more time to spend familiarizing yourself with the UI, than the actual product and so forth. But most of that learning time is just understanding the concepts, but in terms of the actual setting up of the infrastructure, it's very lightweight. It's like well, how long does it take you to download a VMM image and fire it up in your infrastructure, and then download something for your laptop? In terms of the setup and installation, that's about how long. The rest of the time it’s simply defining your policies and access.

Andy (41:17):

Right. Okay, and then synchronize with AD and do all the other stuff later on as you kind of develop your knowledge and understanding of how it all goes. Okay. Sounds good.

Mike (41:30): 

I think for me, it's been a fun journey because I joke with my guys internally that I'm not like a Kool-Aid drinker. I'm not a marketing guy. And I probably wouldn't be but I do like tools that in the DevOps space, I can get a lot of mileage out of, and that helped me solve a variety of problems. Because once I land on a tool like that, I kind of adopt it into my repertoire and so forth. And in terms of that connectivity problem as a DevOps guy wiring stuff together, this one does a great job of really solving that for me where it's like, no, I still get to move fast. I can get my job done and then I don't make the security guys hate me every time I set up something and wire systems together. That piece of it, once I started using it myself, it was pretty exciting for me.

Andy (42:13): 

So, you get to do the move fast and break things without exposing them?

Mike (42:17):  

Exactly. Yes.

Andy (42:19): 

Cool.

Lauri (42:20): 

The shared secret. So guys, are there any other thoughts that come to mind that we haven't discussed about this topic and you absolutely have to say it before we wrap up?

Mike (42:32): 

I think for me, just a challenge to the DevOps community is to stop assuming trust with your internal systems. And it was something that as we've adopted a culture of security within the company that I'm at and actually watching the team start to think more in that way. I think I would challenge the community as well, that you can move fast. But we can also be secure. And I think sometimes we've gotten a bad rap for running fast and loose. And I think with some of the new technology available, I don't think we have to do it that way anymore. And I would just challenge the community to step up the way we handle security, especially with the tools that we're using every day.

Lauri (43:14): 

Thank you. What about you, Andy?

Andy (43:16): 

I quite often have been the one who's like punching holes here and exposing services there just because I wanted to move fast. And then, I almost always regret it later on when I realize what I've done. So, I think this is a good idea. Definitely, I'm going to be playing around with it some more and see how I can move fast and break things internally without exposing them.

Mike (43:42): 

There is a natural tension with our job position. And because we sit between two worlds, inevitably somebody is always waiting on us. So, it's like yes. That push to move fast is very real.

Lauri (43:55):

Thank you for listening. As usual, we have enclosed a link to the social media profiles of our guests to the show notes. Please take a look. You can also find links to the literature referred in the podcast on the show notes alongside other interesting educational content. If you haven't already, please subscribe to our podcast and give us a rating on your platform. It means the world to us. Also, check out our other episodes for interesting and exciting talks. Finally, before we sign off, I would like to invite you personally to The DEVOPS Conference happening online on March 8th and 9th. The participation is free of charge for attendees. You can find the link to the registration page from where else than in the show notes. Now, let's give our guests an opportunity to introduce themselves. I say now take care of yourselves and see you at The DEVOPS conference.

Mike (44:46): 

Hey everybody, this is Mike Guthrie. I am a senior engineer at NetFoundry and I lead our RAV team which is reliability, automation, and visibility. And here today just to talk about how to make your DevOps environment dark.

Andy (44:59): 

Hi, I'm Andy Allred. I started my career as an electronic warfare specialist in the US Navy working on nuclear-powered fast tech submarines, which is something always kind of unique and gets people's attention. After that, I moved into telecoms and worked in telecoms mostly where I started on the radio side but then shifted more to the IT side of telecom companies and worked there for quite a number of years. Then recently, I've moved into the consultancy space and working as a consultant for DevOps and Cloud projects.