Skip to main content Etsi

AIProduct developmentPlatform engineeringConference talks

GenAI best practices and use cases | Justin Reock

In this session, Justin Reock, Deputy CTO at DX, explores best practices and proven use cases for integrating Generative AI (GenAI) into software development workflows. Discover practical techniques like meta-prompting, multi-shot prompting, and system prompts to enhance engineering productivity and consistency. Through live demos and real-world examples, learn how AI tools excel in tasks such as stack trace analysis, auto-documentation, and code refactoring, along with strategies to overcome common challenges in AI adoption for engineering teams.

GenAI best practices and use cases | Justin Reock
Transcript

00:00:03:24 - 00:00:19:11 Unknown Thanks, everybody. Very happy to be here. We have a lot of content to get through in a relatively short amount of time. So unfortunately we won't have time for questions, but I will be hanging out here. I'll also be at the Stockholm show, so hopefully we'll get a chance to, to hang out and talk a little bit more about gen AI. 00:00:19:11 - 00:00:36:12 Unknown So we're going to talk briefly about the current impact of AI. We're going to look at some challenges that a lot of organizations are kind of facing right now when it comes to adopting this technology. We'll look at some adoption strategies, some things that especially as leaders, we can take away and hopefully, kind of implement with our teams. 00:00:36:14 - 00:00:53:16 Unknown We'll look at a few interesting SDLC agent integration use cases. A lot of what you're going to learn in the meat of this session is applicable to both coding assistants like cursor and Copilot and things, and also agents when you build prompts for agents. So kind of keep that in mind when you're looking at some of these use cases. 00:00:53:18 - 00:01:12:11 Unknown We're going to go over some prompting best practices and some high impact use cases. This was part of a study. This was a guide that we actually built based on interviews and surveying developers to try to figure out what some of these really high value use cases were. So it's all kind of based on data. And I'll explain kind of briefly that study methodology. 00:01:12:11 - 00:01:36:05 Unknown And then we'll go into some next steps. Okay. So gen AI is impacting development right. We don't know necessarily if it's good or bad yet, but we look at some of these industry reports like for instance, the Dora report, that came out and now April of this year, which saw modest but positively leaning results when we looked at industry averages. 00:01:36:10 - 00:02:02:15 Unknown Okay. So we saw that a 25% increase in overall AI adoption was associated with a 7.5% increase in documentation quality. Okay. Good. A 3.4% increase in code quality. Modest, but at least not trending in the opposite direction. Right? 3.1% increase in code review speed 1.3% increase in overall approval speed and a 1.8 decrease in code complexity. Okay. 00:02:02:15 - 00:02:26:00 Unknown Seems fairly innocuous until we break it down by company. These are per, per company metrics looking at the Dora metric for change failure rate. So what you're seeing here on the top are companies who actually increase their change failure rate by 2%, which doesn't sound like a lot until you realize that the industry benchmark for CFR is 4%. 00:02:26:02 - 00:02:47:06 Unknown So that means 50% more defects than the industry benchmark. But then you also have these companies on the bottom who have decreased the amount of defects that they're shipping. So 50% less decreases. We can't trust the industry average data that we're seeing right now. But what we can do is work really hard to be the type of culture and company that's on the top of this graph, okay. 00:02:47:06 - 00:03:05:12 Unknown And measure and do our best to be there. It's just not evenly distributed right now. Some organizations are seeing very positive impacts to KPIs. Others are struggling with overall adoption and even seeing these negative impacts that we saw and we really wanted to set out and kind of discover the differences. What is the difference and what can we do to help companies improve? 00:03:05:15 - 00:03:25:05 Unknown At DD we measure developer experience and productivity. So we have a nice wide view of how a lot of companies are doing. Obviously we're looking at correlating this now, AI usage to these foundational productivity and experience metrics. But we also also want to be able to provide materials and education that can help people move those metrics in the right direction. 00:03:25:07 - 00:03:46:00 Unknown One of the biggest indicators that we found, though, in talking to companies that are struggling, is that there's just a lack of enablement and overall education on best practices and use cases. Not only do we have to provide training for our engineers, we have to give them time to learn and experiment. This is new tech, or you can't just turn it on and expect everything to just work. 00:03:46:00 - 00:04:06:04 Unknown Great, right? There's actually a lot of nuance to the way that this technology works. We've also found that some companies just don't really know what to measure or how to measure. And of course, that's something that DCS is very good at. Some, some overall adoption strategies, some things to think about beyond, I would say code generation integrate across the SDLC. 00:04:06:05 - 00:04:30:15 Unknown Who's familiar with Ellie Gold right. Theory of constraints. Okay. A few of you. An hour spent and saved on something that's not the bottleneck is worthless. Okay. And code generation has not necessarily been the bottleneck. So we need to seek out other areas of the SDLC where we have true bottlenecks, where we can apply agents and this technology to try to increase automation and improve throughput. 00:04:30:17 - 00:04:49:19 Unknown We want to unblock usage as much as possible. I hear a lot of companies saying, well, I'd love to use cursor, but we can't because of data residency. Get creative right? There are ways to cell phones, these models. There are ways to run these models on private infrastructure. Think around some of these problems. We want to evangelize the metrics when teams are doing well with this. 00:04:49:19 - 00:05:11:07 Unknown We should share that with the organization. When teams have had a successful experiment, those teams should be allowed to teach the rest of the organization what they've done. We need to reduce the fear of AI. Okay, this technology is not ready to replace jobs. There was already the vanguard of companies that tried this and they failed. And they're clawing it back. 00:05:11:07 - 00:05:35:01 Unknown And no one wants to work for them now because they fired all their engineers. Okay. So we need to move into the next phase. We are seeing 8% improvements, 10% improvements. Zapier is hiring like crazy right now because they know that every new engineer they put in will be more productive because of a result of this technology. That's the way that we need to be thinking about this is something that can augment our teams. 00:05:35:03 - 00:06:02:17 Unknown That said, AI is probably not coming for your job, but somebody really good at I might take your job. Okay. So it behooves us all to learn about how this technology works and get better at it. And we need to tie this to employee success. And of course, all this means we need to establish compliance and trust in the outputs that are being created by this technology, which we can do through better testing, better quality engineering, and really just understanding where these lives and make mistakes and how we can move around them. 00:06:02:17 - 00:06:19:12 Unknown And we'll talk about some strategies for that. But first, let's talk about some kind of fun use cases that we're seeing. Morgan Stanley there's a great write up in this and the Wall Street Journal and also Business Insider. They have a lot of legacy code, including mainframe code, including and I hate to say it because I'm an old Perl head. 00:06:19:12 - 00:06:40:01 Unknown I've been writing code since the 90s, but Perl legacy code, COBOL legacy code, main frame, natural language. And they built an agent that goes through, grabs context, and builds developer specifications to allow developers to help modernize the code. Right. So it's not a full end to end solution. Again, we're not there yet, but it reads legacy code and creates developer specs. 00:06:40:07 - 00:07:09:05 Unknown And they're saving about 300,000 hours annually by just eliminating the reverse engineering step and understanding how that legacy code works. Fair has built an automated code review system that they trigger off of a pull request. It pulls context and surrounding code documentation, provides pull request comments right as a first pass, code review. And they're saving about they're completing about 3000 reviews of relatively low risk stuff right now a week with that agent canvas using this for PR generation. 00:07:09:07 - 00:07:30:09 Unknown So they actually have project managers that are able to just say, this is what I want to change in the system. And it generates very like engineering friendly specifications, which is often a problem. Right? Getting a good clean handoff from project into specs that developers can actually understand. It has MCP servers that expose context and documentation, connects to Jira, and even creates mockups in Figma. 00:07:30:11 - 00:07:52:20 Unknown Spotify is handling 90% of their incidents right now. Using agents that understand SRT Runbooks, they can detect problems and monitor for alerts, and then they can suggest runbook steps directly in SRT channels. In slack handling. 90% of the incidents at Spotify. In fact, they're starting to call their new AI enabled or AI native SDLC like the Spotify 2.0 model. 00:07:53:01 - 00:08:10:11 Unknown That's kind of what we're moving into. Okay, let's get into the meat of this thing. So first of all, again, what we wanted to do is we wanted to seek out these companies that were at the top of that graph I showed you. We wanted to figure out what they were doing. Well, and how we could help other people kind of move to that state. 00:08:10:13 - 00:08:35:23 Unknown So we wanted to determine the highest value prompting practices, and best practices, and also high value use cases. And we used data to do this. So we interviewed a lot of s level leaders from these companies who were doing well. And just ask them, what what are you enabling your engineers to do. What are some of the prompting best practices and things like that that you're reflecting that engineers become reflexive in their use of and where we found, overlap in those interviews. 00:08:35:23 - 00:08:52:19 Unknown We put them in this guide. Then we also surveyed developers directly. We wanted to look for developers that were saving at least an hour a week. By the way, the average time savings has moved up to about 3.5 hours a week right now. Again, not the bottleneck because we're losing hours and hours to meetings and context switching and all kinds of other stuff. 00:08:52:21 - 00:09:13:18 Unknown But time is being saved. And so we surveyed these developers and found out we asked them to stack rank what are your top five use cases for AI that you're getting really reflexive about? And we created sort of a top ten list that went in this guide. So this is all, sort of data driven. Now, if you'd like a copy of this guide totally free, it's like 65 page PDF. 00:09:13:20 - 00:09:29:21 Unknown You can download it here. And it goes through the same, practices and, and use cases that we're going to go through in this workshop. I, I do need to move on. If you don't get a chance to capture it, though. Don't worry. I will show this slide again at the end of the session so you'll have another opportunity. 00:09:30:01 - 00:09:47:05 Unknown If you missed that opportunity, come talk to me and I'll show you where you can. Where you can get the guide. I am proud to say that this guide has become required reading now for a number of engineering organizations. So we're hoping that it's, that it's been very helpful for folks. Okay. So let's talk about some prompting best practices first. 00:09:47:07 - 00:10:10:01 Unknown Meta prompting. All right. This is the idea of actually paying a lot of attention to the structure of the way that you put together your prompt. These are still just probabilistic, motors, they're machines. They do way better when the context that's provided to them and the prompt that's provided to them is really well structured. When you give them a lot of clues about how you want them to execute their workflow. 00:10:10:05 - 00:10:26:13 Unknown And then, moreover, the way that you want them to structure their output, right? So you can just go in and put, sort of a lazy prompt and go back and forth and sort of tune the thing. Or you can be sort of thoughtful about the way that you want the model to execute its work from the very beginning, and you'll save yourself some time in that back and forth. 00:10:26:18 - 00:10:42:10 Unknown So you can see here this is a very simple example. We're saying instead of just saying fix this Springwood error and then giving it the error, we're saying debug the following spring boot error. Paste the error details in and then we're giving it some dash points. We're saying okay identify and explain the root cause. That's the first part of your workflow. 00:10:42:16 - 00:11:00:05 Unknown Then provide a a fixed version of the problematic code and then suggest best practices to prevent similar issues in the future. Now, I could end there, and it might be able to infer the kind of output that I want if I just stop the prompt there, but it will definitely give me the kind of output that I want. 00:11:00:05 - 00:11:18:05 Unknown If I take this one step further and be very explicit about how I want this output to look. So I'm saying give me number one error message and stack trace. Summarize number two a root cause analysis number three fixed code with comments. And number four give me some preventative measures. And then it will definitely output the way that I want it to when it goes forward. 00:11:18:11 - 00:11:37:21 Unknown And this can work well for assistants and also work well for agents. So meta prompting this came up number one, overwhelmingly from the people that we interviewed. Now, this one though, I mentioned that I've been writing code, professionally since the late 90s. I wrote my first bit of code in the 80s on an old Tandy Radio Shack model was a basic. 00:11:37:23 - 00:12:11:12 Unknown I'm on this journey to okay, but this is one of the use cases that I have become really reflexive about and has absolutely changed the way that I work, and this is called recursive prompting or prompt chaining. Now, at a high level, this just means having one kind of conversation with a model, taking that conversation transcript, passing it into a different kind of model to have a different type of output, like a specification, and then maybe even taking that specification and asking another model to scaffold code and what it's really good for is having brainstorming sessions, right. 00:12:11:12 - 00:12:33:02 Unknown So you can see here in this example, this is just a silly example. I'm saying I'm a mobile developer. You're a senior React Native architect. And this sentence has to appear. Let's have a brainstorming session where you ask me one question at a time about the following requirements. And then I'm going on to say that I have two separate iOS and Android repos, and I want to convert them all into React Native. 00:12:33:02 - 00:12:52:12 Unknown And this isn't about whether you like React Native or not. It's just an example. But the point is that the bot will then go on to ask me one question at a time about the problem that I'm trying to solve. We'll have a very comprehensive conversation. It's almost like a two way rubber ducking kind of session where it's like I'm saying what I want to do, but the bot is actually asking me questions very comprehensively. 00:12:52:12 - 00:13:11:13 Unknown Now, I don't know about you, but I always forget stuff in the planning phase. Whenever I'm sitting down, always there's some gap somewhere or something I didn't consider. This has been instrumental for helping me make sure that from the very beginning, I get the right type of spec. So you're seeing it's like asking me, okay, number one, what are the main features and functionalities of the existing mobile app? 00:13:11:15 - 00:13:35:12 Unknown And then it's asking me some more questions. And these are all word for word came out of the the example when I was generating it. Final question for now are you targeting just phones, or does the app need to support tablets or other form factors? Right. So you get the point. I could then take that full conversation and feed it some meta prompting as well and say, okay, now take this whole chat transcripts, transcript, and come up with a step by step specification for how to implement this. 00:13:35:18 - 00:13:54:23 Unknown Give me a step number, a summary of converting the feature, an example prompt, and even maybe example unit tests for what I'm trying to do. And then give it the full transcript from before. Then he'll end up with a spec, and you could stop there and hand that spec to a developer and go on. Or you could pass that into a code generating model and ask it to scaffold out your project. 00:13:55:02 - 00:14:14:05 Unknown Now, I used to have to do this pretty manually, and you still kind of do and like cursor and cloud code and some of the other solutions. But Amazon Care, I got a chance to try the public preview for that. Anybody else tried care yet from Amazon. It's really cool. And that's a first class citizen. So when you start a new project, it actually takes you through this whole brainstorming workflow. 00:14:14:05 - 00:14:42:02 Unknown First, I was very pleased to see that, okay, one shot or few shot prompting similar to a prompting, except in this, this case, we're actually providing example code, right? So beyond just coming up with a particular structure, we're actually saying, okay, here's an example of the code that I want you to produce. Right. Stylistically and semantically. So as opposed to a lazy shot prompt, which is just nothing, just saying, hey, build this for me, we're actually saying, okay, here's an example of a structured Rest API design. 00:14:42:04 - 00:15:01:19 Unknown Then I'm giving it a simple Rest controller annotation or request mapping. Then I'm saying, okay, now generate a hello controller that uses a hello service layer that returns a data transfer object instead of plain text and follows RESTful semantics. Right. So you see, this is kind of like meta prompting, except we're very specifically saying this is the way that I want you to write the code. 00:15:01:24 - 00:15:24:04 Unknown When you output. All right. I mentioned before, we're going to look at some things to make these models behave better. One of them is having a good feedback loop for keeping your system prompt up to date and cursor. These are called cursor rules. And this is like your agent markdown. But there's various versions of this. The idea is that you want to find the one that extends to as many people on your team as possible, right? 00:15:24:04 - 00:15:42:20 Unknown For a couple of reasons. First of all, it helps everybody to have a system prompt that's up to date and tuned well for the way that you want it to behave, as opposed to just helping you. I've definitely seen engineers who keep like a local rule file, and certainly some of these solutions have markdown that will let you do the same thing that increases your context window, and it doesn't help the whole organization. 00:15:42:20 - 00:16:02:01 Unknown So you want to come up with a feedback loop where when the model does something it's not supposed to do, you have a way of providing that feedback to somebody who keeps the system prompt and can keep them up to date from everybody? I've seen people put this in source control for like, well, decentralized organization and just have a system prompt that's in source control and everybody can kind of contribute to it, and it has its own version control. 00:16:02:01 - 00:16:23:12 Unknown Any of that's fine. Whatever works for your culture. But the point is do this right, build this feedback loop so you can see the example here. It's basically saying, hey, you've been observed making the following errors. You're providing outdated spring boot versions. You're suggesting are using deprecated methods. You're returning snippets that have syntax error. So going forward always provide code snippets that are spring boot three or greater. 00:16:23:12 - 00:16:46:02 Unknown In this case, verify that you're not using deprecated methods. Double check that any code snippets are syntactically valid. And then I'm even giving it a specific output that I want. So I'm saying, okay, when you're responding, give relevant explanations of your code choices and other bits of meta prompting there. Right. So good meta prompt inside of the system prompt one that you keep up to date because it can get stale as rules change within the organization. 00:16:46:04 - 00:17:06:04 Unknown It's just mostly important to have a feedback loop to keep it up to date and have a way to provide that feedback to whoever's keeping it up to date. Okay. Multi-model adversarial engineering. This was one of the first ones. This was can we actually have multiple models evaluate each other's, solutions to see which one they think is better? 00:17:06:06 - 00:17:30:03 Unknown The nice thing is, at least right here in 2025, these models have absolutely no ego about what they're doing. They will totally admit, when they've done something wrong. Although I have and I your mileage may vary, but this this one, I've given this workshop a few times now, and I have had people say like, hey, if you tell like copilot that this is a ChatGPT solution, it's going to maybe like the marketing people might have gotten to it a little bit and they're like guarding their own brand. 00:17:30:05 - 00:17:49:11 Unknown So I don't know, it may work better if you just don't mention where the solution came from, but you get the idea, right? We're saying, like, here's a solution from ChatGPT evaluate its correctness. Note any potential improvements. We're feeding that to copilot. Then we're doing the opposite thing, which we're saying, well, here's what copilot said. We're already moving in like this is very manual. 00:17:49:17 - 00:18:14:18 Unknown But we're moving into architectural solutions for this already with things like agent to agent and AI gateways. So and some of these solutions to you like the quote unquote reasoning models are already doing some of this work behind the scenes as well. But just something to be aware of that if you're not quite sure you want to make sure you're driving the best solution and you already have multiple serve it like all of us are doing multi solution because everything's moving so quickly, we can't really land on one anyway. 00:18:14:20 - 00:18:32:07 Unknown You may as well just go and have the models test each other and see which ones they think are the best solution. Okay, multi context prompting. I spoke to one engineering leader who said that when they use whisper or voice to text, or just click the microphone button in the solution that their engineers are 30% faster, right. 00:18:32:09 - 00:18:56:12 Unknown It's almost like ultimate vibe coding because you're like talking to it and things are just like appearing, right. The point is, don't be afraid to move beyond text with these models, right? Think about interacting with voice. Certainly think about uploading pictures. They're getting better and better at understanding pictures. I have I was talking to a company who's already set up an MCP loop between their browser tools and like the back end. 00:18:56:12 - 00:19:12:23 Unknown So it's like if something fails in the front end, it literally just automatically screencaps developer tools, sends it into the agent so that it can understand the browser error just by reading the screenshot. I've been doing that to you with front end work. This is a silly example here where we have a decision tree. It's what do I do today? 00:19:12:23 - 00:19:32:05 Unknown Am I indoors and outdoors? And if I'm alone, I should read a book. And if I'm in a group and indoors, I should play a board game. And if I'm outside and alone, I should take a run. And if I'm in a group, I should go play pickleball. That's a big thing in the US right now. But the point is, like instead of trying to type out this full decision tree, don't be afraid to just upload a picture. 00:19:32:09 - 00:19:52:10 Unknown These models are really getting better at this. So think about different, contexts in which you can engage the model. All right. So beyond the system prompt, we also have another lever here we can pull for making the model more or less deterministic. Right. This is called altering the temperature of the model. Right. Temperature is heat is entropy is randomness. 00:19:52:10 - 00:20:11:05 Unknown That's where you connect the dots there. The thing is, when these models go in predict like the next token to give you, they're not just saying, okay, this is the token. There's a big matrix of them. And then the randomness that's applied for picking the token from that matrix is the temperature. So you actually have some control over whether it's going to pick the first one or whether it might pick one that's later in the list. 00:20:11:07 - 00:20:31:08 Unknown It's kind of a crude way of explaining it, but it's sort of how it works. The temperature moves between the numbers zero and one, so 0.0001 would be considered a low temperature. Don't use 011. Weird, weird stuff happens when you set those absolutes, but you want to put stuff in the middle and .09 would be considered. 0.9 would be considered like a high temperature. 00:20:31:10 - 00:20:54:15 Unknown So you can see like if you combine a system prompt with a good deterministic setting for the agent or even for the, the assistant, and you can experiment this with this, by the way, like anyone using LM studio or a llama or a doc or model runner, highly recommend. Like playing with these things, you can run a lot of open source models, like right on, like a decent laptop or even a gaming machine. 00:20:54:17 - 00:21:13:09 Unknown And and get out some interesting stuff. But you can always set the temperature and these and you can see what's happening. I have a low temperature in this example here on the right where I'm saying create a JavaScript method to render a gradient of colors from blue to red. And I'm not sure how well you can see that code, but character for character, it's exactly the same when I set a low temperature. 00:21:13:09 - 00:21:33:02 Unknown So I ask it that same prompt twice. I get exactly the same output both times with a low temperature. Now I crank that temperature up to 0.9 and look at this. They're both valid solutions for the same problem. But you can see here on the left it's starting with like an HTML block. And it's using some CSS to do the rendering. 00:21:33:07 - 00:21:54:06 Unknown Whereas on the right it's taking a wildly different approach. It's just using JavaScript. It's pulled in a canvas object and it's manipulating that. Okay. So again both of these are valid right. But these are very, very different depending on the temperature that I've given it. So you can now see where like system prompt and temperature working together can give you some more like determinism, some more like predictable behavior from these models. 00:21:54:06 - 00:22:07:08 Unknown All right. So how about some high impact use cases then. So I mentioned that really just came from interviews. Figuring out what SVP is, are doing when they've been rolling this stuff out successfully. This one's way more straightforward. It was just speak to engineers that are already saving time with this stuff. 00:22:07:14 - 00:22:23:09 Unknown Give me your top five use cases that you're using right now that you think saves you the most time. And we turn it into a top ten list. Now we're going to go through each of these. So I'm not going to go through all of these right now. But number one was stack trace analysis. All right. Interpretive use case, not even a generative use case. 00:22:23:14 - 00:22:39:07 Unknown And another one that I've had to learn to be more reflexive about. I'm an old school Java developer. I was doing JTB stuff. I am used to 200 line stack traces that I have to go line by line and try to figure out what's wrong. And now, if you like, put some of these things in agent mode, like cursor for instance, and the build fails. 00:22:39:11 - 00:22:55:16 Unknown It's already doing this for you like it's already interpreting this. It will save you a lot of time. Like it's usually pretty, right. So you can see here this example. This is just one where I had a build fail. And I asked it to analyze the stack trace and look what it did. It turned it. Oh you're missing dependencies. 00:22:55:16 - 00:23:13:00 Unknown And here are the dependencies that you're missing. And by the way, this is a Gradle Java project. So here's what you should update in Build.gradle. And it's giving me some clues as to make sure that those dependencies are going to be available. So the next time you find yourself kind of like oh, the build failed, let me go through, figure out what's wrong, take the give. 00:23:13:03 - 00:23:31:01 Unknown Give it to the agent and see what it does. Give it to the assistant and see what it does. I know that this is hard sometimes because, like, we like solving puzzles. Like we like figuring out problems. This is. This is like why we became engineers, right? But we also have to start asking ourself philosophically, what is toil now? 00:23:31:03 - 00:23:49:02 Unknown Right. And if toil is something that the agent can do accurately for us, that saves us time. It's our responsibility to use it that way, right? Because we have it right. So I know it's tough, but but this one actually works really well. This is another one that I've tried to become more reflexive in refactoring existing code. Yeah, sure. 00:23:49:04 - 00:24:06:22 Unknown Why not? You've got maybe legacy code, or you have code that was written by another engineer that you want to make more efficient or more readable. And that's effectively what I've done here. Now, silly example. I'm just having it calculate a limit up to a certain point, and the sum of that limit. So I'm like, okay, if I put in three I wanted to add one plus two plus three. 00:24:06:22 - 00:24:26:13 Unknown It should spit out six. Great. But then I ask it to improve it for readability and efficiency. And look what it does. It reminds me that I've got this great in stream library in Java, and I could be using that instead of creating a for loop. Now I've got one line of code, right. As long as you know what an int stream is, it makes the code a little more, readable. 00:24:26:15 - 00:24:46:03 Unknown All right. Mid loop generation. So I mentioned before that like it's a cool brainstorming workflow to go from like conversation to spec to scaffolding code. Oftentimes that scaffolding will give you a functional layout as well. Sometimes it'll just start filling in the functions for you too. But the next time that you've got to write something that's at least fairly boilerplate, right? 00:24:46:03 - 00:24:59:22 Unknown Think about just, well, could I just write the function header and maybe give a little comment about what this function is supposed to do, and allow the bot to just fill in the middle like I already know what I want the function to do, right? So maybe I can have a suggestion from the agent that'll fill it out. 00:24:59:22 - 00:25:17:15 Unknown It'll save me some time. I can always go back and tweak anything I don't like, right. But is another one to kind of try to get reflexive about, especially when you're doing something that's like quasi generic, like in that case, generating a Fibonacci sequence. Why not? And it does a pretty good job. Good old computer science, 101 Fibonacci sequence. 00:25:17:17 - 00:25:40:12 Unknown And there it is. Calculating that for me. Okay. Controversial topic here, test case generation. But it came up in the survey. It's just the data. I'm just reporting on the data. Don't shoot the messenger. But, honestly, this has become like a first class citizen in some of these solutions. Now, like, for instance, Copilots had this for a while where it's just like generate test cases based on this. 00:25:40:14 - 00:25:58:21 Unknown Look, with plenty of exceptions, this is not always our favorite thing to do, right? We want to write the code. We want to write the functionality. We need. The tests don't get me wrong, we can't skip and we gotta have them, right? I mean, that's how we keep our code working. Well, we especially need them now when we're going to be generating some code with with AI. 00:25:59:02 - 00:26:19:18 Unknown And we want to be able to trust the outputs that we're putting into production. But again, not always our most favorite thing to do. Right. So see what the assistant does in terms of writing unit test cases. You can even, and I've experimented with this one using front end languages too, that use like jest. You can even say like, take this class and make it give it 90% code coverage, right? 00:26:19:18 - 00:26:34:08 Unknown Not even generate the unit test. Here's how many of the unit tests that need you to generate until you get this coverage. And it'll do it, it'll go into a loop. It'll start generating tests, it'll run jest in coverage mode. It'll spit out the output, see how close it is, it'll read the output and be like, oh, I needed 90%. 00:26:34:08 - 00:27:00:06 Unknown I'm at 85. I better generate some more tests. So an interesting thing to, experiment with. Okay. Learning two new techniques came up on this list as well. Yeah. You know, I mean, the thing can sort of understand the code in front of you and you can ask it some questions about trying to learn a new thing. And this isn't really any different than what we've been doing for a while anyway, which is like going to StackOverflow or going to Google or whatever. 00:27:00:09 - 00:27:19:14 Unknown It's really the same thing, just a little bit faster data retrieval kind of meeting us where we're at without having to go to a browser. But, we're saying in this case, okay, I've got five years of experience writing Java in spring. Great. I've given it some context about what I know. Show me how to create a Java 24 virtual thread and spring new thing, kind of new thing, and 24 and it's doing and it's okay. 00:27:19:14 - 00:27:36:03 Unknown Well, to create a virtual threaded need to use an executable, new virtual thread for task executer. And it's giving me a little bit more context about that. It's even giving me some example code. And this is here for me at like 3:00 in the morning, if the mood strikes me right, I don't have to wait necessarily to get the answers that I want. 00:27:36:09 - 00:27:54:09 Unknown And it also understands the context of the code in front of me. Now, I got a question a lot with this one. People ask me, but is that learning? You know, if you just let the AI do it for you? And it's a valid question, I think, but it's also the same question as is going to Stack Overflow learning when I'm just copying that in and trying to do something from that. 00:27:54:09 - 00:28:13:22 Unknown Right. That that's on you. Right? That's your responsibility. Whether you want to just sort of take whatever the agent gives you and go on autopilot, or you actually want to take the time to explain what's happening and learn from it, right? You have that choice and you've had that choice long before I. Right. And I think there's always people who will be happy to hit the easy button, and always more curious people who want to understand what's going on. 00:28:13:22 - 00:28:30:04 Unknown And that tends to be this crowd, right? We're engineers. We like to understand what's going on. So take the time to learn. Nothing stopping you from doing that. Right. And the agent can be a really great research assistant. Okay, I love this one. I'm often doing this, like, in my office, and I'll point on zoom for effect. 00:28:30:04 - 00:28:46:21 Unknown I'm like, my mastering regular Expressions book is right over there on my shelf, and I haven't had to touch it in a year, which is true, because if you have something that you're not doing every day and you kind of forget, what was this wild card? Or how did I do this, like, pattern match generating regexes with these things? 00:28:46:21 - 00:29:02:05 Unknown It's like pretty awesome, right? And it knows how to do it in most mainstream languages. And it does a pretty good job. So here I'm giving it like some slack for J output. And I'm saying, all right, create a regex that will parse timestamps, usernames, IP addresses, user actions and transaction IDs from log data in this following format. 00:29:02:07 - 00:29:24:04 Unknown And it does it. It gives it to me and it creates a pattern object for me to compile the regex. And it creates a matcher object. So I can actually use it in line in my code. If you're comfortable either creating like a system prompt or giving these things access to schema, they actually do a pretty good job, like traversing schema, understanding foreign keys, and generating SQL statements to write. 00:29:24:04 - 00:29:51:10 Unknown Again, if it's just not a thing that you do every day, this can be really helpful. Code documentation. Yeah. Whether you're generating Ascii doc or LaTeX markdown or GitHub markdown, or just want more comments inline in the code because you didn't feel like writing them when you were writing the code, right? Lots of ways that the agents and assistants can help us with documentation, because again, with plenty of exceptions, this is not a thing we always love to do, right? 00:29:51:10 - 00:30:07:20 Unknown Oftentimes we're in the flow and we're doing great. We're excellent flow state. Everyone's leaving us alone and we've got like focused work and that's like, oh, I have to document this thing now. So the next person can read it. Right. And actually, again, these things are pretty good at interpreting. They're still better at interpreting than they are generating. 00:30:07:22 - 00:30:25:02 Unknown And so to be able to generate inline comments or create Ascii docs or even have agents, now I'm seeing a fair amount of this where you'll have an agent trigger off of a pull request, and then it'll go and gather some context. It'll look at the code diff, it'll look at the surrounding code, it'll update documentation, or at least put a draft documentation in front of you. 00:30:25:02 - 00:30:40:14 Unknown They can say, yeah, that basically it looks right and then send it on its way. It can save you a lot of time here too. So I'm not surprised, to see this, come up. Okay. Brainstorming and planning. I know we already covered this in the recursive prompting section, but it also came up in the survey. Right? 00:30:40:14 - 00:31:09:20 Unknown So I had engineers who also answered that they felt like this was a very high value use case for them. So just a different example, hopefully reinforcing the same point. In this case I've got a totally different problem. And I'm saying I am a product manager and you're a senior software architect. Let's have a brainstorming session where you ask me one question at a time about these requirements that has to be in there again, and then come up with a specification that I can hand to a software developer in this case, I'm saying I want to design an app that'll create an Elasticsearch index for a large table stored somewhere. 00:31:09:20 - 00:31:28:02 Unknown And Cassandra helped me design a bulletproof zero loss system. To do this. This is definitely one of those things where I usually forget something in the planning phase, because there's so much to designing a zero loss bulletproof data system, but look what it's doing. Question number one are we optimizing for full text search, fast lookups, analytics or something else? 00:31:28:04 - 00:31:46:03 Unknown Will the data in Cassandra be static? Append only a frequently updated how large is the data set? Number of rows data volume expected growth? These are definitely the questions that I would be asking myself if I was going through a planning phase. Right? So again, really nice comprehensive way of getting that, early planning, you know, kind of just right. 00:31:46:03 - 00:32:04:15 Unknown And bulletproof. Okay. We talked about this a bit as a second thing that we can do is scaffolding out initial code. Yeah. I mean, sometimes just getting started is hard. It doesn't have to be a perfect scaffold, but like, putting out, okay, these are my base classes and this is how I want to lay out my project and all this kind of stuff that can take time. 00:32:04:17 - 00:32:24:02 Unknown And we get to nit picky sometimes, you know, and oftentimes it's just a decision needs to be made. And then we can just start flowing and coding after that point and and go, you know, fix whatever needs to be fixed later. So I'm saying create a code outline for a Java after to listen on a Kafka topic, create a multicast pattern to three different, endpoints. 00:32:24:02 - 00:32:44:03 Unknown So good old IPS, Postgres, RESTful post and a point and an SMTp endpoint. Great. And look what it did. It structured out my subscriber. It's class for Kafka. There it is. Kafka consumer service. It created for three producer endpoints with Postgres service rest service, email service created app config. I said I wanted it to be Java. 00:32:44:05 - 00:33:02:06 Unknown I didn't specify whether I'd be using Maven or Gradle, so it actually gave me options for both. It's like, here's a XML if you want to use Maven, here's a build Gradle. If you want to use Gradle and I can go from there. Okay. Now I kind of lose that initial paralysis at the beginning. Similar to learning do techniques explaining code. 00:33:02:08 - 00:33:17:14 Unknown Yeah, sure. I mean explain the purpose of each annotation in the spring boot controller class. And then I give it some code. Maybe this is code that a developer wrote five years ago and left. Maybe it's legacy code. Maybe it's just something I haven't seen before. I haven't looked at it in a long time and is breaking it down for me. 00:33:17:14 - 00:33:36:08 Unknown It's like, okay, here's each annotation. You got your Spring Boot application, you got your rest controller, you got you get mapping annotation. Then it's reminding me that the spring boot application annotation is actually just sugar that's comprised of three, annotations like configuration, auto config and the component and going from there. So again, great way of learning new things. 00:33:36:13 - 00:33:55:03 Unknown It's up to you whether you actually want to learn, but you certainly can with these tools okay. So some next steps. First of all use this guide as a reference, for integrating AI into your various workflows. I'll show the PDF again in a moment and you can come find me, determine a method for measuring, and evaluating gene. 00:33:55:03 - 00:34:16:15 Unknown I impact not just utilization metrics, not just. Okay, who's using this daily? Active weekly acted like that's okay for understanding. Like how the the tech is sort of proliferating through the culture. But we really want to correlate that utilization to the metrics that actually matter. It's so easy to get caught up in the hype and forget that this is still about developer productivity and developer experience. 00:34:16:19 - 00:34:29:07 Unknown And so those foundational metrics that have served us well are still very much in play now. So we want to track and measure the adoption and iterate on those best practices and use cases. One more opportunity. If you didn't already get a chance to grab this, 00:34:29:07 - 00:34:35:20 Unknown how should we measure? What should we be looking at? Well, there's kind of three characteristics of metrics that are available to us right now. 00:34:35:20 - 00:34:57:03 Unknown And we got to use all three because they're all going to tell us different stories. Right. We have our telemetry metrics. This is the stuff coming out of the API. So Copilot provides this cursor provides this we helped the anthropic folks actually build their API. So they have some good metrics as well. They're kind of good for measuring the impact on developer output, like how how often stuff is being used. 00:34:57:05 - 00:35:18:08 Unknown But they don't tell a complete story. Like, for instance, everybody wanted acceptance versus suggest until they realize that you have to click accept. And the idea for the API to actually know about that. So it's not necessarily going to be accurate. Or maybe you accepted the code and then you refactored every line of it. Right. So that's also not telling the same story. 00:35:18:10 - 00:35:37:07 Unknown Maybe it made a suggestion and you just typed, oh, that's a good idea, I'm gonna do that. Or maybe you copy and paste it. Right. So the API doesn't know about any of that stuff. And we also need to be able to correlate that utilization to what actually matters. The foundational productivity metrics that aren't just telling us what's happening with the AI, they're telling us if it's working or not. 00:35:37:09 - 00:35:55:06 Unknown So this is where we need experience sampling the ability to get small bits of data and trap developers in their workflow so that we don't force them out and force them to contact switch off to something else. This could be like issuing a PR and having a new checkbox in the PR that says, I used AI to work on this PR, or I enjoyed using AI to work on this PR. 00:35:55:07 - 00:36:08:03 Unknown Whatever it is, we can get these small bits of data. Not great for collecting large amounts of data at once though that all has to be aggregated. We don't want to do a ten minute survey in every pull request, but we do want to do surveys. We just want to do those periodically. That's this last bit of data here. 00:36:08:07 - 00:36:28:01 Unknown We want highly effective surveys. Do I have a very strict definition of that. We maintain 95% plus participation rates in most of our customer surveys, 90% in all of them, even with thousands of engineers. Yeah, I know, I know and we do. It's really hard. And if you want to learn about how we do that, I'm always happy to talk about it. 00:36:28:05 - 00:36:54:06 Unknown But once you get participation rates like that, you can really trust a lot of these qualitative metrics. Is a especially when you cross-reference them to the experience sampling in the telemetry. And so with that in mind, we actually did build the first dd this is our dd AI measurement framework. We were first to market with this. It took what we already knew from our core four metrics, which are comprised of Dora, the space framework, and the dev metrics, all distilled into a single metric framework. 00:36:54:11 - 00:37:14:18 Unknown So we're using the same stuff. We're using oppositional dimensions in this case the oppositional dimensions of utilization impact and cost. And we have a number of prescribed metrics that fall into those dimensions. This is sort of a maturity curve to most organizations will start on the left capturing basic utilization metrics, and then immediately start wondering, okay, well, is this actually working? 00:37:14:24 - 00:37:34:11 Unknown In which case they'll start moving into these impact metrics that you see here. How is utilization correlated to improving velocity, improving quality within the organization? And then, yes, cost. Although I do love to point out that 15 years after the last hype cycle ended in cloud, we're still have brand new companies that are telling you that help you control your cloud costs. 00:37:34:13 - 00:37:45:00 Unknown So we'll see how long that takes. On the flip side, I have already heard horror stories about people burning through like $2,000 worth of tokens a day, so we probably do have to do something, but I think it's going to be kind of on the right of this continuum. 00:37:45:00 - 00:37:52:11 Unknown I had a Q&A slide that came after that and I had to take it out, and I probably should have said something. Anyway. Thank you. I hope that was helpful. 00:37:57:12 - 00:38:12:08 Unknown Me.