GenAI best practices and use cases | Justin Reock

Name: GenAI best practices and use cases | Justin Reock
Uploaded: 2025-10-30T18:44:22+02:00

In this session, Justin Reock, Deputy CTO at DX, explores best practices and proven use cases for integrating Generative AI (GenAI) into software development workflows. Discover practical techniques like meta-prompting, multi-shot prompting, and system prompts to enhance engineering productivity and consistency. Through live demos and real-world examples, learn how AI tools excel in tasks such as stack trace analysis, auto-documentation, and code refactoring, along with strategies to overcome common challenges in AI adoption for engineering teams.

Transcript

[Intro music] Thanks, everybody! Very happy to be here. We have a lot of content to get through in a relatively short amount of time. Unfortunately, no time for questions, but I will be hanging out here. I'll also be at the Stockholm show. So, hopefully, we get a chance to hang out - and talk a little bit more about GenAI. So, we're going to talk briefly about the current impact of GenAI. We're going to look at some challenges - that a lot of organisations are facing right now - when it comes to adopting this technology. We'll look at some adoption strategies, some things that, - especially as leaders, we can take away - and hopefully implement with our teams. We'll look at a few interesting SDLC agent integration use cases. A lot of what you're going to learn in the meat of this session - is applicable to both coding assistants like Cursor and Copilot and things, - and also agents, when you build prompts for agents. So, keep that in mind when looking at some of these use cases. We're going to go over some prompting best practices - and some high-impact use cases. This was part of a study. This was a guide that we actually built - based on interviews and surveying developers - to try to figure out what some of these really high-value use cases were. So, it's all kind of based on data. And I'll explain briefly that study methodology, - and then we'll go into some next steps. Okay, so Gen AI is impacting development, right? We don't know necessarily if it's good or bad yet, - but we look at some of these industry reports, - for instance the DORA report, that came out in April of this year, - which saw modest but positively-leaning results - when we looked at industry averages. So, we saw that 25% increase in overall AI adoption - was associated with a 7.5% increase in documentation quality. Okay, good. A 3.4% increase in code quality. Modest, but at least not trending in the opposite direction, right? A 3.1% increase in code review speed, - 1.3% increase in overall approval speed, - and a 1.8% decrease in code complexity. Okay, seems fairly innocuous. Until we break it down by company. [Chuckles] These are per-company metrics - looking at the DORA metric for change/failure rate. What you're seeing here on the top are companies - who actually increase their change/failure rate by 2%, - which doesn't sound like a lot until you realise - that the industry benchmark for CFR is 4%. That means 50% more defects than the industry benchmark. But then you also have these companies on the bottom - who have decreased significantly the amount of defects they're shipping. So, 50% less decreases. We can't trust the industry-average data that we're seeing right now. But what we can do is work really hard - to be the type of culture and company that's on the top of this graph, - and do our best to be there. It's just not evenly distributed right now. Some organisations are seeing very positive impacts to KPIs. Others are struggling with overall adoption - and even seeing these negative impacts that we saw. And we really wanted to set out and discover the differences. What is the difference and what can we do to help companies improve? At DX, we measure developer experience and productivity. So, we have a nice wide view of how a lot of companies are doing. Obviously, we're looking at correlating this AI usage - to these foundational productivity and experience metrics, - but we also want to be able to provide materials and education - that can help people move those metrics in the right direction. One of the biggest indicators that we found, though, - in talking to companies that are struggling - is that there's just a lack of enablement - and overall education on best practices and use cases. Not only do we have to provide training for our engineers, - we have to give them time to learn and experiment. This is new tech. You can't just turn it on and expect everything to just work great. There's actually a lot of nuance to the way that this technology works. We've also found that some companies just don't really know - what to measure or how to measure. And of course, that's something that DX is very good at. Some overall adoption strategies, - some things to think about beyond, I would say, co-generation. Integrate across the SDLC. Who's familiar with Eliyahu Goldratt? Theory of constraints? Okay, a few of you. An hour spent and saved on something that's not the bottleneck is worthless. And code generation has not necessarily been the bottleneck. So, we need to seek out other areas of the SDLC - where we have true bottlenecks, - where we can apply agents and this technology - to try to increase automation and improve throughput. We want to unblock usage as much as possible. I hear a lot of companies say, "Well, I'd love to use Cursor, - but we can't because of data residency." Get creative. There are ways to self-host these models. There are ways to run these models on private infrastructure. Think around some of these problems. We want to evangelise the metrics. When teams are doing well with this, - we should share that with the organisation. When teams have had a successful experiment, - those teams should be allowed to teach the organisation what they've done. We need to reduce the fear of AI. This technology is not ready to replace jobs. There was already the vanguard of companies that tried this - and they failed, and they're clawing it back. And no one wants to work for them now because they fired all their engineers. So, we need to move into the next phase. We are seeing 8% improvements, 10% improvements. Zapier is hiring like crazy right now - because they know that every new engineer they put in - will be more productive because of a result of this technology. That's the way that we need to be thinking about this, - as something that can augment our teams. That said, AI is probably not coming for your job, - but somebody really good at AI might take your job. So, it behoves us all to learn - about how this technology works and get better at it. And we need to tie this to employee success. And of course, all this means we need to establish compliance and trust - in the outputs that are being created by this technology, - which we can do through better testing, better quality engineering, - and really just understanding where these LLMs make mistakes - and how we can move around them. We'll discuss the strategies for that. But first, let's talk about some fun use cases that we're seeing. Morgan Stanley, there's a great write-up in this in the Wall Street Journal - and also Business Insider. They have a lot of legacy code, including mainframe code, - including, I hate to say it because I'm an old Perl head, - I've been writing code since the 90s, but Perl legacy code, - COBOL legacy code, mainframe natural language. And they built an agent that goes through, grabs context, - and builds developer specifications - to allow developers to help modernise the code. It's not a full end-to-end solution. Again, we're not there yet. But it reads legacy code and creates developer specs. And they're saving about 300,000 hours annually - by just eliminating the reverse engineering step - and understanding how that legacy code works. Faire has built an automated code review system - that they trigger off of a 'pull' request. It pulls context and surrounding code, documentation, - provides 'pull' request comments right as a first pass code review. And they're completing about 3,000 reviews - of relatively low risk stuff right now in a week with that agent. Canva is using this for PRD generation. They actually have project managers that are able to just say, - "This is what I want to change in the system." And it generates very engineering-friendly specifications, - which is often a problem, - getting a good clean handoff from project - and the specs that developers can actually understand. It has MCP servers that expose context and documentation, - connects to Jira and even creates mock-ups in Figma. Spotify is handling 90% of their incidents right now - using agents that understand SRE runbooks. They can detect problems and monitor for alerts, - and then they can suggest runbook steps directly in SRE channels in Slack, - handling 90% of the incidents at Spotify. In fact, they're starting to call their new AI-enabled - or AI-native SDLC like the Spotify 2.0 model. That's kind of what we're moving into. Okay, let's get into the meat of this thing. First of all, again, what we wanted to do - is we wanted to seek out these companies - that were at the top of that graph I showed you. We wanted to figure out what they were doing well - and how we could help other people move to that state. So, we wanted to determine the highest-value prompting practices, - best practices, and also high-value use cases, - and we used data to do this. So, we interviewed a lot of S-level leaders - from these companies who were doing well and just asked them, - "What are you enabling your engineers to do?" "What are some of the prompting best practices and things like that - that you're reflecting that engineers become reflexive in their use of?" And where we found overlap in those interviews, - we put them in this guide. Then we also surveyed developers directly. We wanted to look for developers - that were saving at least an hour a week. By the way, the average time savings has moved up - to about 3.5 hours a week right now. Again, not the bottleneck because we're losing hours - to meetings and contact switching and all kinds of other stuff. But time is being saved. And so, we surveyed these developers. We asked them to stack rank. What are your top five use cases for AI that you're getting reflexive about? And we created sort of a top-ten list that went in this guide. So, this is all sort of data-driven. Now, if you'd like a copy of this guide, totally free, -it's like 65 page PDF, you can download it here, - and it goes through the same practices and use cases - that we're going to go through in this workshop. I do need to move on. If you don't get to capture it, - don't worry, I will show it again at the end of the session. So, you'll have another opportunity. If you miss that opportunity, come talk to me - and I'll show you where you can get the guide. I am proud to say that this guide has become required reading now - for a number of engineering organisations. So, we're hoping that it's been very helpful for folks. Okay, let's talk about some prompting best practices first. Meta-prompting. This is the idea of actually paying a lot of attention - to the structure of the way that you put together your prompt. These are still just probabilistic motors, they're machines. They do way better when the context that's provided to them - and the prompt that's provided to them is really well-structured, - when you give them a lot of clues - about how you want them to execute their workflow, - and moreover, the way that you want them to structure their output. So, you can just go in and put sort of a lazy prompt - and go back and forth and sort of tune the thing, - or you can be sort of thoughtful about the way - that you want the model to execute its work from the very beginning, - and you'll save yourself some time in that back-and-forth. So, you can see here, this is a very simple example. Instead of saying, "Fix this Spring Boot error," - then giving it the error, we're saying, - "Debug the following Spring Boot error, paste the error details in, - and then we're giving it some dash points and saying, - "Identify and explain the root cause." That's the first part of your workflow. Then, "Provide a fixed version of the problematic code." And then, "Suggest best practices - to prevent similar issues in the future." Now, I could end there, - and it might be able to infer the kind of output that I want - if I just stop the prompt there, - but it will definitely give me the kind of output that I want - if I take this one step further and be very explicit - about how I want this output to look. So, I'm saying, "Give me number one error message & stack summarised trace." Number two: a root cause analysis. Number three: fixed code with comments. And number four: give me some preventative measures. And then it will definitely output the way that I want it to - when it goes forward. This can work well for assistants and also work well for agents. So, meta-prompting, this came up number one, overwhelmingly, - from the people that we interviewed. Now this one, though... I mentioned that I've been writing code professionally since the late-90s. I wrote my first bit of code in the 80s - on an old Tandy Radio Shack model, it was GW Basic. I'm on this journey too, okay? But this is one of the use cases - that I have become really reflexive about - and has absolutely changed the way that I work. And this is called recursive prompting or prompt chaining. Now, at a high level, this just means - having one kind of conversation with a model, - taking that conversation transcript, - passing it into a different kind of model - to have a different type of output, like a specification, - and then maybe even taking that specification - and asking another model to scaffold code. And what it's really good for is having brainstorming sessions. So, you can see here in this example... This is just a silly example. I'm saying I'm a mobile developer, you're a senior React Native Architect. And this sentence has to appear, - "Let's have a brainstorming session - where you ask me one question at a time about the following requirements." Then I'm going on to say that I have two separate iOS and Android repos - and I want to convert them all into React Native. This isn't about whether you like React Native or not, just an example. But the point is, the bot will go on to ask me one question at a time - about the problem that I'm trying to solve. We'll have a very comprehensive conversation. It's almost like a two-way rubber ducking session, - where I'm saying what I want to do, - but the bot is actually asking me questions very comprehensively. Now, I don't know about you, but I always forget stuff - in the planning phase whenever I'm sitting down, always. There's some gap somewhere, something I didn't consider. This has been instrumental for helping me make sure - that from the very beginning, I get the right type of specs. So, you're seeing it's asking me, "Okay, number one, - What are the main features and functionalities - of the existing mobile app?" And then it's asking me some more questions. These are all word for word came out of the example when I was generating it. "Final question (for now): are you targeting just phones - or does the app need to support tablets or other form factors?" So, you get the point. I could then take that full conversation - and feed it some meta-prompting as well and say, - "Now, take this whole chat transcript - and come up with a step-by-step specification for how to implement it." "Give me a step number, a summary of converting the feature, - an example prompt, - and even maybe example unit tests for what I'm trying to do,- and then give it the full transcript from before." Then you'll end up with a spec. You could stop there and hand that spec to a developer and go on, - or you could pass that into a code-generating model - and ask it to scaffold out your project. Now, I used to have to do this pretty manually, - and you still do in Cursor, Cloud Code and some of the other solutions. But Amazon Kiro, I got a chance to try the public preview for that. Anybody else tried Kiro yet from Amazon? It's really cool. And that's a first-class citizen. When you start a new project, - it actually takes you through this whole brainstorming workflow first. I was very pleased to see that. One-shot or few-shot prompting, similar to meta-prompting, - except in this case, we're actually providing example code. So, beyond just coming up with a particular structure, - we're actually saying, "Here's an example of the code - that I want you to produce." Stylistically and semantically. So, as opposed to a lazy shot prompt, - which is just saying, "Hey, build this for me," - we're actually saying, "Here's an example - of a structured REST API design." Then I'm giving it a simple REST Controller annotation - or request mapping. Then I'm saying, "Now generate a HelloController - that uses a HelloService layer - that returns the data transfer object instead of plain text - and follows RESTful semantics. So, this is kind of like meta-prompting, except we're very specifically saying, - "This is the way that I want you to write the code when you output." We're going to look at some things to make these models behave better. One of them is having a good feedback loop - for keeping your system prompts up to date. In Cursor, these are called Cursor rules. In Claude, this is your agent markdown, but there's various versions of this. The idea is that you want to find the one - that extends to as many people on your team as possible, - for a couple of reasons. First of all, it helps everybody to have a system prompt - that's up to date and tuned well for the way you want it to behave, - as opposed to just helping you. I've definitely seen engineers who keep a local rule file, - and certainly some of these solutions have Markdown - that'll let you do the same thing. That increases your context window - and it doesn't help the whole organisation. So, you want to come up with a feedback loop - where when the model does something it's not supposed to do, - you have a way of providing that feedback - to somebody who gate-keeps the system prompt - and can keep them up to date from everybody. I've seen people put this in source control. For well decentralised organisations, - it's just a system prompt that's in source control, - everybody can contribute to it, and it has its own version control. Any of that is fine, whatever works for your culture. But the point is, do this. Build this feedback loop. You can see the example here. It's basically saying, - "You've been observed making the following errors." "You're providing outdated Spring Boot versions." "You're suggesting or using deprecated methods." "You're returning snippets that have syntax errors." "So, going forward, always provide code snippets - that are Spring Boot 3.x or greater in this case, - verify that you're not usingdeprecated methods, - double check that any code snippets are syntactically valid." And then I'm even giving it a specific output that I want. I'm saying, "When you're responding, give relevant explanations - of your code choices," and other bits of meta-prompting there. So, a good meta prompt inside of a system prompt,- one that you keep up to date, because it can get stale - as rules change within the organisation. It's just mostly important to have a feedback loop to keep it up to date - and have a way to provide that feedback to whoever is keeping it up to date. Okay, multi-model adversarial engineering. This was one of the first ones. This was, "Can we actually have multiple models - evaluate each other's solutions to see which one they think is better?" The nice thing is, at least right here in 2025, - these models have absolutely no ego about what they're doing. They will totally admit when they've done something wrong. Although your mileage may vary, - but this is when I've given this workshop a few times now, - and I have had people say, - "If you tell Copilot that this is a ChatGPT solution..." Maybe the marketing people might've gotten to it a little bit - and they're guarding their own brand... So, it may work better if you don't say where the solution came from, - but you get the idea. We're saying, "Here's a solution from ChatGPT, - evaluate its correctness, note any potential improvements. We're feeding that to Copilot. Then we're doing the opposite thing with ChatGPT. We're saying, "Well, here's what Copilot said." We're already moving... This is very manual, - but we're moving into architectural solutions for this already - with things like agent-to-agent and AI gateways. And some of these solutions too, like 'reasoning models,' - are already doing some of this work behind the scenes as well. But just something to be aware of, that if you're not quite sure, - you want to make sure you're driving the best solution - and you already have multiple... Like, all of us are doing multi-solution because everything's moving so quickly - that we can't really land on one anyway. You may as well just go and have the models test each other - and see which ones they think are the best solution. Okay, multi-context prompting. I spoke to one engineering leader - who said that when they use Whisper or voice-to-text - or just click the microphone button in the solution, - that their engineers are 30% faster. This is almost like ultimate vibe coding, - because you're talking to it, and things are just like appearing. The point is, don't be afraid to move beyond text with these models. Think about interacting with voice. Certainly think about uploading pictures. They're getting better and better at understanding pictures. I was talking to a company who's already set up an MCP loop - between their browser tools and the back-end. So, if something fails in the front end, - it literally just automatically screen caps developer tools, - sends it into the agent, so it can understand the browser error - just by reading the screenshot. I've been doing that too with front-end work. This is a silly example here where we have a decision tree. "What do I do today? Am I indoors or outdoors?" If I'm alone, I should read a book. And if in a group and indoors, I should play a board game. And if I'm outside and alone, I should take a run. And if I'm in a group, I should go play pickleball. That's a big thing in the U.S. right now. [Chuckles] But the point is, instead of trying to type out - this full decision tree, - don't be afraid to just upload a picture. These models are really getting better at this. So, think about different contexts in which you can engage the model. So, beyond the system prompt, - we also have another lever here we can pull - for making the model more or less deterministic. This is called altering the temperature of the model. Temperature is heat, is entropy, is randomness. That's where you connect the dots there. The thing is, when these models predict the next token to give you, - they're not just saying, "This is it," there is a big matrix of them. And then the randomness that's applied for picking the token from that matrix - is the temperature. So, you have some control over whether it's going to pick the first one - or whether it might pick one that's later in the list. It's a crude way of explaining it, but it's sort of how it works. The temperature moves between the numbers 0 and 1. So, 0.0001 would be considered a low temperature. Don't use 0 and 1. Weird stuff happens when you set those absolutes. But you want to put stuff in the middle. And 0.9 would be considered a high temperature. So, you can see if you combine a system prompt - with a good deterministic setting for the agent - or even for the assistant, - and you can experiment with this, by the way, - like anyone using LM Studio or Ollama - or Docker Model Runner, highly recommend playing with these. You can run a lot of open-source models - right on a decent laptop or even a gaming machine - and get out some interesting stuff. But you can always set the temperature and see what's happening. I have a low temperature in this example here on the right, - where I was saying, "Create a JavaScript method - to render a gradient of colours from blue to red." Not sure how well you can see that code, but character for character, - it's exactly the same when I set a low temperature. So, I ask it that same prompt twice, - I get exactly the same output both times with a low temperature. Now I crank that temperature up to 0.9 and look at this. They're both valid solutions for the same problem. But you can see here on the left, it's starting with an HTML block, - and it's using some CSS to do the rendering, - whereas on the right, it's taken a wildly different approach. It's just using JavaScript. It's pulled in a canvas object and it's manipulating that. Again, both of these are valid, right? But these are very different depending on the temperature that I've given it. So, you can now see where system prompt and temperature working together - can give you some more determinism, - some more predictable behaviour from these models. How about some high-impact use cases then? So, I mentioned that really just came from interviews, - figuring out what SVPs are doing - when they've been rolling this stuff out successfully. This one's more straightforward: just speak to engineers - that are already saving time with this stuff. Give me your top five use cases that you're using right now - that you think saves you the most time, and we turned it into a top-ten list. Now, I'm not going to go through all of these right now. But number one was stack trace analysis. Interpretive use case, not even a generative use case. And another one that I've had to learn to be more reflexive about. I'm an old-school Java developer. I was doing J2EE stuff. I'm used to hundred-line stack traces - that I have to go line by line and try to figure out what's wrong. And now if you put some of these things in agent mode, - like Cursor, for instance, and the build fails, - it's already doing this for you. It's already interpreting this. It will save you a lot of time. It's usually pretty right. You can see here this example. This is just one where I had a build fail, - and I asked it to analyse the stack trace. And look what it did, "You're missing dependencies." And here are the dependencies that you're missing. And by the way, this is a Gradle Java project. So, here's what you should update in build.gradle. And it's giving me some clues as to make sure - that those dependencies are going to be available. So, the next time you find yourself kind of like, "The build failed." "Let me go through, figure out what's wrong," - give it to the agent or assistant and see what it does. I know that this is hard sometimes because we like solving puzzles. We like figuring out problems. This is why became engineers, right? But we also have to start asking ourselves philosophically, - "What is toil now?" And if toil is something - that the agent can do accurately for us that saves us time, - it's our responsibility to use it that way. Because we have it. So, I know it's tough, - but this one actually works really well. This is another one that I've tried to become more reflexive in. Refactoring existing code. Sure, why not. You've got maybe legacy code - or a code that was written by another engineer - that you want to make more efficient or more readable. And that's effectivelywhat I've done here. Now, silly example. I'm just having it calculate a limit up to a certain point - and the sum of that limit. So, if I put in three, - I wanted to add one plus two plus three, it should spit out six. But then I ask it to improve it for readability and efficiency, - and it reminds me that I've got this great IntStream library in Java, - and I could be using that instead of creating a 'for' loop. Now I've got one line of code. As long as you know what IntStream is, it makes the code more readable. All right, mid-loop generation. I mentioned before that it's a cool brainstorming workflow - to go from conversation to spec to scaffolding code. Oftentimes that scaffolding will give you a function layout as well. Sometimes it'll just start filling in the functions for you too. But the next time that you've got to write something - that's at least fairly boilerplate, think about just, - "Could I just write the function header and maybe give a little comment - about what this function is supposed to do - and allow the bot to just fill in the middle?" Like, I already know what I want the function to do. So, maybe I can have a suggestion from the agent. It'll fill it out and save me some time. I can always go back and tweak anything I don't like. But it is another one to try to get reflexive about, - especially when you're doing something that's like quasi generic, - like in that case, generating a Fibonacci sequence. Why not? And it does a pretty good job. Good old Computer Science 101 Fibonacci sequence. And there it is calculating that for me. Okay. Controversial topic here: test case generation. But it came up in the survey. It's just the data. I'm just reporting on the data. Don't shoot the messenger. But honestly, this has become like a first-class citizen - in some of these solutions now. For instance, Copilot has had this for a while, - where it just generates test cases based on this. Look, with plenty of exceptions, - this is not always our favourite thing to do. We want to write the code and the functionality, - We need the tests, don't get me wrong. We can't skip them, we got to have them, right? That's how we keep our code working well. We especially need them now - when we are going to be generating some code with AI - and we want to be able to trust the outputs put into production. But again, not always our most favourite thing to do, right? So, see what the assistant does in terms of writing unit test cases. You can even, and I've experimented with this one - using front-end languages too that use like Jest... You can even say, "Take this class and give it 90% code coverage." Not even generate the unit test. Here's how many unit tests I need you to generate - till you get this coverage. It'll do it. It'll go into a loop. It'll start generating tests. It'll run Jest in coverage mode. It'll spit out the output, see how close it is. It'll read the output and be like, "I needed 90%. I'm at 85%." "I better generate some more tests." So, an interesting thing to experiment with. Learning two new techniques came up on this list as well. I mean, the thing can sort of understand the code in front of you, - and you can ask it some questions about trying to learn a new thing. And this isn't really any different - than what we've been doing for a while anyway, - which is like going to Stack Overflow or going to Google or whatever. It's really the same thing, a little bit faster data retrieval, - kind of meeting us where we're at without having to go to a browser. But we're saying in this case, "I've got five years of experience - writing Java in Spring. Great, I've given it some context. Show me how to create a Java 24 virtual thread in Spring. New thing, kind of new thing in 24. And it's doing it. It's okay. To create a virtual thread, - you need to use executor's new virtual thread for task executor. And it's giving me a bit more context about that. It's even giving me some example code. And this is here for me at three o'clock in the morning, if the mood strikes me. I don't have to wait necessarily to get the answers that I want. And it also understands the context of the code in front of me. Now, I got a question a lot with this one. People ask me, but is that learning if you just let the AI do it for you? It's a valid question, I think. But it's also the same question as, is going to Stack Overflow learning - when I'm just copying that in and trying to do something from that? That's on you, right? That's your responsibility. Whether you want to just take whatever the agent gives you - and go on autopilot, - or you actually want to take the time to explain what's happening - and learn from it, right? You had that choice, and you've had that long before AI. I think there's always people who'll be happy to hit the easy button - and always more curious people who want to understand what's going on. That tends to be this crowd We're engineers. We like to understand what's going on. So take the time to learn. Nothing's stopping you from doing that. And the agent can be a really great research assistant. Okay, I love this one. I'm often doing this in my office, - and I'll point on Zoom for effect, I'm like, - "My Mastering Regular Expressions book is right over there on my shelf." and I haven't had to touch it in a year," which is true. Because if you have something that you're not doing every day - and you forget what was this wild card, or how did I do this pattern match... Generating regexes with these things is pretty awesome. And it knows how to do it in most mainstream languages, - and it does a pretty good job. So, here I'm giving it some SLF4J output, - and I'm saying, "Create a regex that'll parse timestamps, usernames, - IP addresses, user actions, and transaction IDs - from log data in this following format. And it does it. It gives it to me. Here, it creates a pattern object for me to compile the regex, - it creates a matcher object, so I can use it in line in my code. If you're comfortable either creating a system prompt - or giving these things access to schema, - they actually do a pretty good job traversing schema,- understanding foreign keys and generating SQL statements too. Again, if it's just not a thing that you do every day, - this can be really helpful. Code documentation, yeah. Whether you're generating AsciiDoc or LaTeX markdown or GitHub markdown, - or just want more comments in line in the code - because you didn't want to write them when you were writing the code... Lots of ways that the agents and assistants - can help us with documentation. Because again, with plenty of exceptions, - this is not a thing we always love to do. Oftentimes we're in the flow and we're doing great. We're at excellent flow state. Everyone's leaving us alone and we've got focused work. And then, "Oh, I have to document this, so the next person can read it." And actually, again, these things are pretty good at interpreting. They're still better at interpreting than they are generating. And so, to be able to generate inline comments - or create ASCII docs or even have agents... Now I'm seeing a fair amount of this - where you'll have an agent trigger off of a pull request, - and then it'll go and gather some context, - it'll look at the code diff, it'll look at the surrounding code, - It'll update documentation or put a draft in front of you, - so that you can see if it looks right and then send it on its way. It can save you a lot of time here, too. So, I'm not surprised to see this come up. Okay, brainstorming and planning. I know we already covered this in the recursive prompting section, - but it also came up in the survey, right? So, I had engineers who also answered - that they felt like this was a very high-value use case for them. So, just a different example, hopefully reinforcing the same point. In this case, I've got a totally different problem. And I'm saying, "I am a product manager and you're a senior software architect." Let's have a brainstorming session - where you ask me one question at a time - about these requirements." That has to be in there again. And then come up with a specification that I can hand to a software developer. In this case, I'm saying, "I want to design an app - that'll create an elastic search index - for a large table stored somewhere in Cassandra. Help me design a bulletproof, zero-loss system to do this." This is definitely one of those things - where I usually forget something in the planning phase - because there's so much to designing a zero-loss, bulletproof data system. But look what it's doing. Question number one, - "Are we optimising for full-text search, - fast lookups, analytics,or something else? Will the data in Cassandra be static, append-only, or frequently updated? How large is the data set, number of rows, - data volume, expected growth? These are definitely the questions that I would be asking myself - if I was going through a planning phase. So, again, a really nice comprehensive way - of getting that early planning just right and bulletproof. Okay, we talked about this a bit. As a second thing that we can do, scaffolding out initial code. I mean, sometimes just getting started is hard. It doesn't have to be a perfect scaffold. But putting out my base classes, project layout and this kind of stuff, - that can take time, and we get too nitpicky sometimes. And oftentimes it's just a decision needs to be made, - and then we can just start flowing and coding after that point - and go fix whatever needs to be fixed later. So, I'm saying, "Create a code outline for a Java app - that'll listen on a Kafka topic, - create a multicast pattern to three different endpoints." So, good old EIPs, - Postgres, RESTful POST endpoint, and an SMTP endpoint. Great. And look what it did. It structured out my subscriber as class for Kafka. There it is, Kafka consumer service. It created three producer endpoints with Postgres service, - REST service, email service, created app config. I said I wanted it to be Java. I didn't specify whether I'd be using Maven or Gradle. So, it actually gave me options for both. It's like, here's a pom.xml if you want to use Maven. Here's a build.gradle if you want to use Gradle. And I can go from there. Okay, now I kind of lose that initial paralysis at the beginning. Similar to learning new techniques, explaining code. Sure. I mean, explain the purpose of each annotation - in the Spring Boot controller class. And then I give it some code. Maybe this is code that a developer wrote five years ago and left. Maybe it's legacy code. Maybe it's just something I haven't seen before or in a long time. And it's breaking it down for me, "Okay, here's each annotation." "You got your Spring Boot application, your REST controller, - you got your GetMapping annotation." It's reminding me that the Spring Boot application annotation is just sugar - that's comprised of three annotations - like configuration, autoconfig, and the component, and going from there. So, again, great way of learning new things. It's up to you whether you actually want to learn, - but you certainly can with these tools. Some next steps. First of all, use this guide as a reference - for integrating AI into your various workflows. I'll show the PDF again in a moment, and you can come find me. Determine a method for measuring - and evaluating Gen AI impact, not just utilisation metrics, - not just, "Okay, who's using this daily active, weekly active?" That's okay for understanding - like how the tech is proliferating through the culture. But we really want to correlate that utilisation - to the metrics that actually matter. It's so easy to get caught up in the hype - and forget that this is still about developer productivity - and developer experience. And so, those foundational metrics that have served us well - are still very much in play now. So, we want to track and measure the adoption - and iterate on those best practices and use cases. One more opportunity if you didn't get a chance to grab this. So, how should we measure? What should we be looking at? There's kind of three characteristics of metrics - that are available to us right now, and we got to use all three. They're all going to tell us different stories. We have our telemetry metrics. This is the stuff coming out of the API. So, Copilot provides this, Cursor provides this. We helped the Anthropic folks actually build their API, - so they have some good metrics as well. They're kind of good for measuring the impact on developer output, - like how often stuff is being used, but they don't tell a complete story. For instance, everybody wanted acceptance versus suggest - until they realised that you have to click accept in the IDE - for the API to actually know about that. So, it's not necessarily going to be accurate. Or maybe you accepted the code and then refactored every line of it. So, that's also not telling the same story. Maybe it made a suggestion and you just typed, - "Good idea. I'm going to do that." Or maybe you copied and pasted. So, the API doesn't know about any of that stuff. And we also need to be able to correlate that utilisation - to what actually matters. The foundational productivity metrics - that aren't just telling us what's happening with the AI. They're telling us if it's working or not. This is where we need experience sampling, - the ability to get small bits of data and trap developers in their workflow, - so that we don't force them out - and force them to context switch off to something else. This could be like issuing a PR and having a new checkbox in the PR - that says, "I used AI to work on this PR" - or "I enjoyed using AI to work on this PR." Whatever it is, we can get these small bits of data. Not great for collecting large amounts of data at once, though. That all has to be aggregated. We don't want to do a ten-minute survey in every pull request. But we do want to do surveys. We just want to do those periodically. That's this last bit of data here. We want highly effective surveys. At DX, I have a very strict definition of that. We maintain 95% plus participation rates - in most of our customer surveys, 90% in all of them. Even with thousands of engineers. Yeah, I know. And we do. It's really hard. If you want to learn about how we do it, I'm always happy to talk about it. But once you get participation rates like that, - you can really trust a lot of these qualitative metrics, - especially when you cross-reference them - to the experience sampling of the telemetry. And so, with that in mind, we actually did build the first DX... This is our DX AI measurement framework. We were first to market with this. It took what we already knew from our core four metrics, - which are comprised of DORA, the SPACE framework, - and the DevEx metrics, all distilled into a single-metric framework. So we're using the same stuff. We're using oppositional dimensions. In this case, the oppositional dimensions - of utilisation, impact, and cost. And we have another prescribed metrics that fall into those dimensions. This is sort of a maturity curve too. Most organisations will start on the left, - capturing basic utilisation metrics, - and then immediately start wondering, "Well, is this actually working?" In which case they'll start moving into these impact metrics you see here. How is utilisation correlated to improving velocity, - improving quality within the organisation? And then yes, cost. Although I do love to point out - that 15 years after the last hype cycle ended in cloud, - we still have brand new companies - telling you they'll help you control your cloud costs. So, we'll see how long that takes. On the flip side, I've heard horror stories about people - burning through like $2,000 worth of tokens a day. So we probably do have to do something, - but I think it's going to be on the right of this continuum. I had a Q&A slide that came after that, and I had to take it out. I probably should have said something. Anyway, thank you! I hope that was helpful. [Audience applauding] [Outro music playing]