Supercharging Cross-platform UI Development with Gen AI | Peter Schneider

Name: Supercharging Cross-platform UI Development with Gen AI | Peter Schneider
Uploaded: 2024-11-19T02:00:00+02:00

UI Development tomorrow will be nothing like today. Every step in the UI Creation Process will be accelerated by Generative AI. This presentation will outline how software makers can optimize time to market by utilizing AI and computer vision. Peter Schneider will describe opportunities how AI-powered capabilities can help software developers to do what they love the most: be creative and write great code. The presentation will outline UI development acceleration areas such as UI design to code conversion, coding, and unit/GUI test case generation with practical examples.

Transcript

let's talk a little bit about accelerating UI development um I work for the cute company and we got kind of obsessed with what happens with Gen and software development we are powering a million developers that's currently roughly I mean we're building UI framework IDs other tooling for million developers 70 Industries so if you're your premium German car with a touchscreen probably powered by cute the ultrasound device in your medical dock probably powered by the cute UI framework your coffee machine in the office sometimes my colleagues say that's hard in the developed world to be 50 m away from anything else powered by cute because most of the touch devices embedded devices and some high performance uh crossplatform applications like CAD programs they're powered by cute so just to set the scene we are and have to be obsessed about software development and what's next because that's our Core Business right we are we are the biggest uh software company in Finland on the pH stock market so we need to be obsessed about what happens and I'm a product manager and I have the fortune that I was allowed to play around with Gen and what it does in the the uh software development cycle and giving us well advice and experience what happens uh in the area with particularly with large language models and Co generation so I want to talk about that a little bit so now I mean you know all about the get GI co-pilots um but of course the the UI development and gen plays at least in these three areas a big role right UI design code generation we all talked about we hear about and then of course quality insurance is also starting to play a role so we see that if you look around and and you you will find applications that based on human prompt they will create complete UI right you just say create me app for for and kind of air2 andb Airbnb kind of app and it's able to write you a software application the whole UI different screens navigation in between they can do that well of course their marketing department was very good if you try it out yourself you will notice okay some human intervention is still necessary you need to tell sometimes okay do this and that so the world is not quite perfect yet but you see already where this is heading right in terms of UI design it's Revolution going from a human prom just English language of what you want and it starts throwing you the first prototype the first ideation so me as a product manager I'm all of a sudden capable of potentially communicating my vision what I want to the even to the UI designer just with a single prompt and these things are happening they're not perfect yet if you try them out you you might be disappointed sometimes but you see already where the train is heading of course we all know next topic is of course we know that writing software is a is a thing that everybody has 30 I guess in the room for about GitHub co-pilot K is the next uh new kit on the blog in terms of the coolest hype on in terms of soft gener generation very important they're adding value and we're going to talk more about that through my presentation so so that's an important area and of course for us as cute uh it's a super relevant area and last but not least also cute is nowadays very invested in quality insurance we are do started code analyis we're doing uh guy automation testing so also we are looking in terms of oh now gener I can not only write unit test cases but I can also write uh in the future maybe potentially uh guy test cases so so writing your push that button move that slider change that stack so so these are three areas that that there's a lot of happening and these are the things that been interested in but you will notice that these are only three errors this actually gen will impact the whole software development cycle and we as a cute company we are just super interested in the part from the Handover from design let's say from all the way to the the automation of the testing of the the user interface this are this is our core business so so we am on the right hand side of the CLE but you see already jir playing a significant role in other areas and if you go over there to the e code boost you will see the gentleman showing you how for example you can grate user stories it will create you the security requirements it does the DPO work and sets the task for those so you will see kind of agentic solutions with a lot of different AI assistants now coming together and forming the and accelerating the whole product life cycle so where will that get us I mean if you think about building an embedded device like a uh cockpit a car cockpit a cluster UI with an ivi screen that used to take two years nowadays the Chinese are doing it in six months but anyway this development of a touch UI for an embedded device we have been moving for months now really moving forward with agile software development and automation of different process we're moving into weeks of a single iteration right I mean we know all the adile Sprint so so we can do things much faster but this is all with the existing technology this is how software has been developed for the last 30 years right yes we have agile we have certain automation capabilities but how will software development look in 30 years from now I mean we have done software pretty much the same way for 30 years we have learned new things we've made things a bit more complicated we added a lot of things code documentation unit test creation and we automated a lot of things creating after our teams being in small and big format but essentially we are still doing the same thing I did 30 years ago in terms of software develop so where will we go I don't have the answer but certainly the iterations will be faster than weeks right from the moment that you say that I want a certain UI to function to having a prototype to run that prototype on your actual Hardware this will be po potentially within days going away from months to weeks to days we don't know how fast because what's happening is actually rather interesting and there was a recent study from McKenzie that when they asked thousands of of uh company employees or or respondents hey who of you is using J at work right oh we have like 68% all cool right that's that's like the large majority almost two it's two-thirds of the people saying J is in we're using that but then we the same McKenzie asked the people as well how much of them are using soft engineering okay we lost a lot of one here 133% that's only marginal fraction so what happened how do people look like about 50% of the people that claimed that use geni why AR there there's challenges one of the things is of course J is not aware of your product right if you're into proteins or if you're mixing pains or if you are doing metal studies or forecasting your your retail automation J and these CH they don't know your product so for a lot of people that write the software unless they're writing in Python might not be the best Dy tool let's say they have a bit more different languages such as Lua or Cobalt or something else or or rust all of a sudden these tools for for software and coding assistance are not that good anymore so that might explain a little bit but there's also a lot of companies that we talk to they've worried about IPR IPR leakage now all these big players whether it's curer whether it's GI up copilot uh anthropic they're all black boxes right you're rooting your code there you're giving them their code to so it can predict what would be the next line of code be for you you're giving them their Co your code so they can write you do a test case so they all big black boxes and companies that live from differenti at UI design company that their way of serving their customers is their competitive Advantage they are for sure bir about they all have played around with it with K up copilot but many say this stuff doesn't go into my production code quite yet there's also another reason why they don't want it to go in their production code because they don't know what's behind that black box what has been done and what what training data has gone into these models so will there be a challenge then in 2 years 3 years down the road saying that well actually some of the code that your your llm generated is actually infringing someone else's patent even GI up cop admits 1% of all responses is a direct copy from the uh the training content so you don't want to be in that 1% so that's another reason of looking at these Co generations with a critical mind and maybe that explains why why so few people actually using it in the daily life in the code generation but like I mentioned we are cute we are also needed to think about what what do we tell our customers right what's the right model so I spent a lot of time in the last 9 months of looking at different models and their capabilities and I want to tell you a little bit about our journey and finding that one out because I have that problem as well that General AI models don't understand our product very well so the C company has a product that's the UI framework and is actually we have a a programming language called cute modeling language or qml that's like HTML for touch devices it's Essen just lots of button sliders stacks and they run very well very efficient on on the devices with limited power capability but it's also very easy to write so it's essentially programming language we have that it's known by a lot of developers like I said we have a million developers out there but it's our language right it's it's like a it's a derivative of HTML so now we need to check how good are these common large language models in in terms of programming ql it's like learning finish right how good can this large language model talk finish most of them will fail some of them will be good so we needed to study how good they are so we did so we created a test case HS write me software for button slider uh mandatory text input field a stack view a busy indicator so we created 100 tasks that are typical for writing a software uh UI software so we gave the 100 Tas and then we did Benchmark different kind of models that are out there and what you see here is now star coder this is for example one of the open source models then there's mistol and you see then the the GitHub co-pilot which is certainly the market leader at least in in in spread in what people have tried and what is out there it's it's nowhere close to being the leader right I mean if I would need to suggest to customer hey you want to pay $40 per developer to have an AI assistance well I would point you currently to anthropics claw 3.5 son because it's clearly better than get that copilot in creating qml code so we did St study and test these systems in a in a quantitative method how good are they and you're seeing that already now oh what's in everybody's mind with gab copilot not the leader so I can't can't say that use this one if you have access and can use for code generation when you ask for code use somebody else and even you see the Jud gpt1 pre you yeah the next best to slice bread in terms of Chain of Thought thinking and whatever not that much better actually like the original one it's it's it's scoring just I think two or 3% better than the original 40 model so the the incremental improvements are very limited but that's not the only thing that's in that chart and that that that Benchmark data there's a lot of things we need to remember and one of them is guess how much European models are in there so if CEO the EU a a Act is coming really at the full force there's going to be more of a battle between the US and Europeans in terms of models I mean now already we can't access for ex the latest Lama models anymore because meta forbids us so what if meta anthropic and so they say in 2025 or 2026 well you're not going to have any of our models because I don't like your legislation or your regulation there's only one model in here that's that's essentially M that's the only European French model so if that kind of trade conflict would would uh inflate and accelerate all of a sudden we're down to one model and particular not the strongest one but there's also another problem in there in this chart which is how much models are there on the list that we actually can figure out what has gone in there for the training data also one and the star coder that's the only one way you can check what's the training data that's gone in there so you can be pretty sure that saying okay I like the code has gone in there there's no IPI valuation or I don't like it problem is it's actually the weakest one out of the box in in creating qml code so code for my customers but that's the only thing that we have in terms of fully transparent or the only thing European so there's a lot in this chart and things we need to consider when we are telling our customers what to use but also as the typically say Size Matters right so this is no different so of course the bigger the better I mean this is a simple mass and we've tried it out trying out the small uh version of a certain model serieses and trying the bigger surprisingly the bigger one always does better but there's also sustainability impact right of course the bigger the model the more energy it burns the more energy consumes but also the inference time so the time before it can predict something is longer so you user experience of course weaker you know if you have a mega big model like the Llama 70b if you ask it something it will take much longer to respond than if you ask Lama 30 B that is much smaller so the inference time increases so again if you're choosing For Your Right model for your product and doesn't matter doesn't need to be software can be anything what you need to to predict you need to keep these things in mind user experience transparency fit to your legal requirement and your actual quality of output which is a lot what we can actually quantify in terms of our Benchmark so these are things we need to study but of course we we then ask us next question okay we we found the right model what do we do now next in terms of then how can we make it better right we we are not the meta company we're not Apple we're not Google uh haven't got like 50 billion lying around to pre-train a model of that size so there's a model called Laura fine tuning so essentially it's like a fine tuning on the last miles so this is what we tried out so just just to see how it works so we fine tuned in for example the Llama model or the the European model M and you see how much better gone I mean we gotten 10 percentage Bo Improvement maybe that's a lot maybe it's not a lot but if you think about how much billion of money meta so good old Facebook and Instagram company have put into building Lama in the first place in terms of compute and then the few bucks that we put in there with with essentially like a research project we gotten up by 10% so now we are coming with a fine tude model an open source model like Lama very close to what I showed you earlier what these um these large models and the biggest one like Claude clae 3.5 Sonet can do so we're getting very close to these these levels just by providing good data and again I think I've heard three presentations so far today talking about data is important well yeah absolutely because when we do fine tuning we're using a million tokens in terms of human valid dat and written code so yeah that's high quality codes so million token is not much but essentially you can improve the model performance significant and that helps and again this is again just for our product right when your product is different like you're creating paint mixes or you're studying proteins or you need to predict whatever medical resolution or a forecast or a banking investment decision again you need to provide the train that I'm just telling it from our point of view and what we can do for our customers so is there a lot in terms of what we've learned in terms of actually quality of output so how good is the ql 100 Benchmark relevant for transpar Arcy relevant in terms of legislation you need to then think about transparency of where the training data comes from and of course the ability to train with additional training data so there's a lot of Dimensions you need to think about when you think about just the code generation now you can multiply that issue of course to different use cases right I mean if you want to create an unit test cases if you want to automate U if you want to create user interfaces for your industry you need to keep all of these in mind and this you need to go through the same journey of benchmarking identifying of course companies like us we will help you but also the use case matters what I talked about now on everything result that I showed in terms of the uh coding performance is actually uh what is called the instruct performance so the chat performance right I ask it can you write me a slider for a user interface that is doing this and this and this well in reality a lot of code generation is actually not talking English but is actually only talking code and this is these are these uh models that only can talk code they don't speak English but if you see gu co-pilot suggesting the next line the next button or something like that that's actually then uh what is called Cod completion and for this one again you did actually need a different model so all of a sudden our model that we looked at like llama 70b being the best one for open source and we find it we found that oh that was not the best so we had to switch model and go to Cod Lama same her of of um of llamas but we actually had to study in terms of okay now we need to model for the different use case so if I ask something can you help me with this so if I have an uh get a copil chat kind of experience and I'm asking in something for for an expert advice I need to use one model but if I want to do code compilation on the flyer while somebody's typing code and suggesting the next line actually I need to use a different Model H so again these are the things you need to learn you need to think about use case and then find the right model because like me noticed that it's actually different for expert advice on qml and code comp on C we need to use different models because the big llama model actually doesn't speak code it speaks English but it doesn't speak code that's a problem and here on my my Mor mod model this is my last inside information as well you see a curs small like I mentioned C is one of these um most hype kind of uh coding assist is really really good I mean I don't want to downt talk it but there's also something funky in there is how how is it possible that when I choose incursor the only model it's allowed to use Cur small which is their own in-house model how come that curs of small has the same performance done anthropics Sonet 68% that can't be consider I mean there a this probability right I mean always these benchmarks you do them twice you're going to get one or two test cases going to fail more or less there's always some variation in there so I checked the Privacy regulations in terms of what's behind curer and ker even says in his privacy uh rule rules documentation that even if you choose to use only one model because you're comfortable for it in terms of transparency and who who runs the data and who has the data Cur is allowed to root them anywhere else and they will on your behalf so I don't know whether actually they didn't use Cur small which I had set up under the configuration and they send it somewhere else but it was really OD the outcome that is so close than the run up and the best one in in terms of the qml performance so again it shows us and they honestly say that we will root it there way we want so if you own IPR and you want to be sure where your data goes again you need t