Pragmatic LLM usage with Nicholas Carlini
Nicholas, you're here.
Nicholas Carlini:I am.
Bryan Cantrill:This is great. This is like you don't like a a you're a regular now.
Nicholas Carlini:This is you like a This is yes. Very surprising to me.
Bryan Cantrill:Morton Condracci. What am I am I wrong to be making McLaughlin group references to myself here, Adam? Is that a very small demographic? Yeah.
Adam Leventhal:I am a 100% out on that one. Yes.
Bryan Cantrill:Oh, come on. We can't go to McLaughlin Group around here? You know what? What what references can one make? If if not to Jack Vermont, Martin Kondracki, actually, Pat Buchanan, and, who's the, in addition to John McLaughlin.
Bryan Cantrill:Sorry. I'm making it worse. I I'm I I I'm missing the social view that this was a reference you should not be making. I I'm sorry.
Nicholas Carlini:I did
Bryan Cantrill:not ask you to elaborate on this. I asked you to stop, please. So I know. Nick West, welcome back. As you can see, nothing has changed.
Nicholas Carlini:Thank you.
Bryan Cantrill:We're still jackasses. But this was a great piece that goes over the and so for those who haven't listened to it, we we had you in in March to talk about adversarial, machine learning. I I love that episode. That was so good. Thank you.
Bryan Cantrill:And I it was great to talk about your research, and I I mean, I think that you I think, you know, and we and we I think we had we had Simon Wilson on. We also kind of hit this where we have been trying to hit that that pragmatic middle ground with respect to LLMs in particular and be kind of wary of the of the the potential downsides of this, but also see this as a potential tool. And, now I think your blog entry just absolutely hit that right in the bull's eye, really hitting that wait. What was the impetus for, for writing this, if you don't mind me asking?
Nicholas Carlini:Yeah. I mean, it's basically basically this. I feel like most people who I talk to are on one extreme or the other, and are either, like, I'm terrified of these things, they're going to end the world, or nothing good has ever come from this. This is only hype. It's the latest thing that the blockchain people moved on to, and they just wanna do this to, like, make a quick buck and move on to whatever the next thing is.
Adam Leventhal:Gotta keep some of those GPUs. Right?
Bryan Cantrill:Right. This is like game Stop. This is like a meme stock. Yeah.
Nicholas Carlini:This is
Bryan Cantrill:not I agree. Right. This is like I know.
Nicholas Carlini:In some sense, like, they're not wrong for some of these people. Right? Like, they're like, there certainly are a lot of people who are here only for that reason. But I feel like because the most vocal voices are one of these two people, someone who's normal kind of, it gets to it has to decide, like, which of these two futures do I believe in and and it just ends up getting very put off by the whole thing. And I guess what I was trying to do is just say like there is a middle ground here like you can you can both accept that these models are are brittle.
Nicholas Carlini:They they don't work in all kinds of situations. There are real harms that that may come from them. There are reasons you may not want them to exist in the first place. And yet, they also have some value. And I think it's important to understand this, especially, you know, for the security person.
Nicholas Carlini:You know, studying the security is something that no one uses doesn't matter and like, say acknowledging that these things have value is important to be able to say, I do think people will will use them in the future and so we should actually study them the more because we believe people actually look at them.
Bryan Cantrill:Right. Which is to say, like, look at these things as as a tool, and, like, what can this tool do? I have to think if I can just go, like, just get this off of my chest here. I really do feel that part of the problem is in the nomenclature. And just the idea of artificial intelligence, part of it's both allure, but it its challenge has been it it draws us in to anthropomorphize it.
Bryan Cantrill:And by doing that, we end up in one of these extremes. Because it's like I if, like sorry. If I'm meeting my replacement, it either needs to be my overlord or I need to dismiss this thing as a dunce. Like, I can't you're introducing me to this kind of, like, this alien intelligence. Like, okay.
Bryan Cantrill:Well, what if we called it something else?
Nicholas Carlini:Would you be Yeah. I I I agree. But, I mean
Bryan Cantrill:Well and and, actually, I thought it was interesting because I love in your I mean, like, look, we're we're clickbaity around here. You know? We we admire, game respects game when it comes to clickbait. And I like the fact that, like, how I use AI, there's almost like a a there's a juxtaposition in there. Right?
Bryan Cantrill:Where you're on the one hand using this term, on the other hand, like, really indicating that you're using it as a tool. And then really throughout the piece, you refer to LLMs, which I think is,
Nicholas Carlini:great. I put a footnote at the first time I used the word AI somewhere where I said something essentially like, I I I've intentionally used the word in the title because I wanted people who like AI to read this and were like excited by this. And I also wanted people who hate the word to rage click on it. I'd be like, what is this idiot going to tell me? And so, like, let's, like, read this and hopefully get both of these people to at least look at the thing.
Bryan Cantrill:You know, I swear I did not read that footnote before making that comment, but you're yeah. The footnote is is exactly I I love the footnote. You're like, look, this is not I'm doing it for marketing. Okay? I'm doing I'm doing it for the SEO, doing it for the clickbait.
Bryan Cantrill:And again, we're, we would be complete hypocrites if we, we've we've done done this many, many, many a time. And I also love that you got AI in kinda air quotes there. I I mean, man, there's so much to love love about this. I I do gonna almost open with you one of your conclusions because, Adam, I think this kinda reminds me of when people are evaluating, and, again, I'm gonna it here, so I apologize for for, the, you know, the policing mind is becoming the criminal mind. But but when the when you you know, interviews go sideways when you are asking, like, how could this person fail as opposed to how could they succeed?
Bryan Cantrill:And I love the fact that you're, like, evaluate what LOMs can do rather than what they can't. Like, if you are it's too easy to focus on what they can't do. And and for any tool, you can focus on what it can't do. It was just interesting.
Adam Leventhal:So so, Nicholas, that was such a great line, and I I I that resonated for me as well, Brian. And in particular, you you could have followed up with some other examples. We're saying, you know, this LLM can't even add 2 two digit numbers. It's like, yeah. Well, you humans can't, like, divide 64 bit numbers in your head, either.
Adam Leventhal:So, you know, you're not good at everything yourself.
Nicholas Carlini:Yeah. I thought I thought
Adam Leventhal:it was a great imperative to to see what it the value for what it is.
Nicholas Carlini:I know. Like, I I think it's important for research to look at what models can't do and ask how could we improve them. Like I understand what value, like why would someone write a paper that says this? I mean, it makes sense. You want the things to be better.
Nicholas Carlini:So let's let's measure what they can't do. But I just feel like on online, there's so many people who, like, even on okay. This this was on Hacker News, and there were some people who were like, but the model can't tell me whether 0.8 or 0.11 is bigger. And I'm like, when was the last time they honestly needed to ask a language model if 0.8 was bigger than 0.11? Like
Bryan Cantrill:This this toothbrush can't even drive a nail.
Nicholas Carlini:I know. Yeah. Like and like, I understand, like, it's like, it's surprising to you that the model can't do this thing that humans see as trivial. Yeah. But tools have uses and you you figure out what the tool is useful for and if it can solve your problem, I don't care if it can't do something else.
Nicholas Carlini:And, like, it's kinda wild to be like, you know, 10 years ago, like machine learning could do nothing. And like, I think the problem with part part of the marketing of language models and everything is because now people expect they can do everything. So when they point out something that it can't do, like they're surprised, which I, I don't know from my perspective, you know, it's amazing they can do anything. And so I point out the things they can do is kind of impressive, but, yeah, this is sort of the
Adam Leventhal:Right. Your your example of writing a poem with only with letters that start with the letter a. It's another one, right, where you say the fact that it can even give it a shot is radically impressive.
Nicholas Carlini:Yeah. So I don't know.
Bryan Cantrill:Yeah. It's I think it's
Nicholas Carlini:just an important thing to keep in mind. And, like, I'm I don't wanna dismiss this, but yeah.
Adam Leventhal:It's such a lazy critique as well to say, you know, it can't do these seemingly trivial things because I think there are deeper critiques.
Nicholas Carlini:Yes.
Adam Leventhal:You know, around like hallucination or pointing you in the wrong direction or giving you things that are so plausibly true, where and one of the questions I have for you, Nicholas, if you had instances where where you felt like it wasted your time, because you gave so many terrific examples of it saving you time. And I and I agree. Like, some of these things where you say, well, that's you know, a a a critique could be, well, that's easy. It's like, yeah. Sure.
Adam Leventhal:It's easy. It would only take me an hour or 2, but it took me 2 minutes. So that was really valuable.
Nicholas Carlini:Yeah. I mean, it happens all the time where, you know, it tells you here's the answer and it's just wrong. But, and this is again a case where, like, how many times have you searched for some error message and hit some stack overflow thread, and you found the person who suggested the answer and it's like accepted, and then you later realize like, no, that person was like absolutely incorrect. There was nothing to do with whatever the error message was. Someone just decided to, like, I'm on.
Nicholas Carlini:I I understand how things could be incorrect on the Internet. Like, this is news to me. And so Have
Bryan Cantrill:you seen Adam Theo Schlossdagle's closing plenary from the 2011 surge?
Nicholas Carlini:This is,
Bryan Cantrill:like, one of my favorite talks of all time.
Adam Leventhal:I don't think so.
Bryan Cantrill:Alright. So Theo was at surge, which is this great conference that he ran for a couple years, then just got too much. It's, like, stopped doing it, but Serge was great. And so, he is gonna give the closing plenary, but he had no idea what he's gonna talk about. And meanwhile, he's chewing my ear off on how, like, miserable his previous 3 days had been.
Bryan Cantrill:And I'm like, Theo, this is your closing plenary. Like, right now, you're giving me your closing plenary. So you just need to go up on stage and just, like, give that. And that's what he did, and it was great. And it is so funny.
Bryan Cantrill:And because just to your point, he got down a part he wasted, like, a, like, a day, wasted a day of hard intellectual labor because he he found a fix for his problem on the Internet, and someone had replied, that works for me. Thanks. He's like, okay. So clearly, like, this is the right fix. It actually was the wrong fix, and he came to the conclusion that, like, whatever asshole said that works for me.
Bryan Cantrill:Thanks. It's just wrong. And it's, like, if you really wanted to waste humanity's time, and maybe this is what'll happen when we actually when the singularity has actually arrived, and the bots are gonna run around finding incorrect fixes and replying to them being like, that's that exactly has fixed my Linux audio problem. Thank you. And meanwhile so yeah.
Bryan Cantrill:No. Exactly. You can easily get this stuff. And I think the and, Adam, that's a great question. And I guess, Nicholas, my follow-up would be it.
Bryan Cantrill:I think you kind of answered it. When it is pointing you in the wrong direction, how far down that path did you go? Because, like, Theo went really far down that path because he was convinced that he was wrong. Whereas I feel like when it points me in the wrong direction, it's like, okay, that's that's pretty good and wrong. Like, that's fine.
Bryan Cantrill:I'm I'm not gonna but, Nicholas, what's your experience? Like, how often have you ended up down?
Nicholas Carlini:That. I think for the most part, like, I I never really trust the thing very much. You you you ask for what you think it should do and you it gives you some instructions and if they sound plausible and then they're like the kind of thing you might find online, then I'll try it. But, you know, maybe 1 or 2, 3 turns down the path, if it ends up that it's not solving the problem, I just move on to whatever the I'll go back to searching the Internet for when the random thing I'm looking for is, or just debug the problem myself or whatever the case may be. Like, I'm There may be a possible future in which going 4 or 5, 10 rounds deep with these things, telling to fix problems will make them be actually fixed.
Nicholas Carlini:But for right now, if it hasn't figured it out in 1 or 2 rounds, I I just give up on it and just go back to something else. Yeah.
Adam Leventhal:What are the piece of advice you had given was, you know, if it gives you an answer, you know, to you you said that, for example, to a a compiler error, and you try it, you say that didn't work, you know, it gives you a different answer. And and that I've had that experience too where you say, the other day, I was try I was looking for a rust crate that did a particular thing, and I described the thing. And it said, this rust crate will do it. Here's some example code. And I typed in the code, did something completely different.
Adam Leventhal:Like, did not do what it described at all. And I just said, well, that didn't work. And then it's amazing how quickly it folds, saying
Bryan Cantrill:It totally fell.
Adam Leventhal:Oh, I'm sorry. Like, that was my mistake. Yeah. Right.
Nicholas Carlini:Yeah. No. And then this happens quite a lot. And if if it gets it right in the first or second try then I think it's reasonable. But but, you know, I I found for the current generation of things that, yeah, it it won't go.
Nicholas Carlini:And so I feel like you you could easily if you were to just trust this thing blindly. You could end up in some truly terrible cycles. I'm I'm I'm from some research stuff just like asking models to self correct themselves, like, many, many, many times deep. And sometimes they they can just keep on going and finding all kinds of things that, like, you know, in so oh, like, if you tell them it's wrong, it's it's always gonna be very sorry. It's like, oh, I'm sorry.
Nicholas Carlini:I made a mistake. You know, it turns out that, you know, I'm gonna do this other way. I just like rewrite the entire program in a completely new way, not fixing any of the problems that were actually problems and just try to do something entirely different and, like, you could just keep saying wrong and it will just it will continue to be sorry the whole way through and never fix anything. And so I feel like, yeah, 1 or 3 times that I just I just stop.
Bryan Cantrill:It is deeply apologetic, which I actually really appreciate. Know, the other thing that I I'm not even sure if this may just I I I mean, I maybe had this may be a placebo effect. I love to ask it if I can do what I'm about to do before I ask it to do it. Do you do this, Adam? Like
Nicholas Carlini:Oh, yeah.
Bryan Cantrill:I I like
Adam Leventhal:to I say, hey. I'm using this thing. Are you familiar with this thing? And it's like, yes. Just ask your question, idiot.
Bryan Cantrill:It's always no enthusiastic. It's like, oh, I would love to help you do this. Yes. Yes. Upload the image for me right now.
Bryan Cantrill:I could yeah. That's great. I would love to I would love to do that. Can you help? Yeah.
Bryan Cantrill:Yeah. I'll put the PDF. Just do that, and I will I'll help you. It's like, oh, great. I just like you know, I like the kind of the the please and thank you, and it's very apologetic.
Bryan Cantrill:So I guess that anthropological organization, I like.
Adam Leventhal:For something raised on the Internet, it is amazingly polite.
Bryan Cantrill:Well, it was it was raised on the Internet, but then it was it was sent away to a to a military academy. I wonder when it was very prickly disciplined after I mean, if if Microsoft Tay was, these LOLs in the in the 18 years, they were definitely sent upstate for a while and have they've emerged, rehabilitated. Very or at least polite. That's right. At least apologetic when they send you to your death.
Bryan Cantrill:So let's go through some of these examples. Because the other thing I have seen, because I really loved about this, is that you uploaded well, I'm curious, these are more or less your entire transcripts. But I actually it was really great, because I feel like that I we don't see as much where, you know, people can say, like, I built this using an LLM, but you don't get to think, okay, What was all the give and take? What were the follow-up questions? What did all of that dialogue look like?
Bryan Cantrill:And is this pretty I mean, how much editing did you do before
Nicholas Carlini:you Yeah. This is Really?
Bryan Cantrill:Oh, that's
Nicholas Carlini:pretty cool. So this is why I never saw I also did not intend for anyone to ever see these. Like, I was not like, Right. I don't think this far ahead. I was not like last year being like, let me type some charts to these models so that in a year, I can, you know, release them to the world and sort of make these models look artificially good or bad.
Nicholas Carlini:Like, these are, like, things that I wanted to get solved. And I was like, how I use the thing and then then just, yeah, posted completely unended it. Versions of these things. I will say that there are some that I did not release because they solved problems for me, but I was, maybe, 10% less kosher than I should have been and, like, you know, have, you know, like IP addresses of like, you know, internal servers or something like that. I I maybe maybe we all want to to be revealing to the Internet.
Nicholas Carlini:But, all of the things that I did post are unedited.
Bryan Cantrill:That's great. And then I do what element are you using here? Not that it not that it is particularly amazing.
Nicholas Carlini:These are with, GPT 4, but Claude 3.5, Sonnet has recently also been really, really good. And, yeah, I I try to basically try to use whatever I have. I try to use all of them. I know, like, they're all high quality. I don't I mean, like, I feel like on the margin, they they they now, like, if you look at, like, the current state of the art, like, all, all, the, the meta Facebook Lama 3.14 or 5,000,000,000 Gemini 1.5 Claude, 3.5, Sonnet, and GPT 4 o are, like, you know, within percentage points depending on how you benchmark, which one's better, but, like, they're all they're all pretty good.
Bryan Cantrill:Yeah. That's that's amazing, which is great. I mean, that's amazing. What a what a luxury that we all have, that we are able to Yes. I mean, it really is extraordinary.
Bryan Cantrill:And some of these transcripts are really long and interesting in terms of like, you know, as you are kind of, and I mean, honestly, pretty amazing that you're able to kinda fine tune the things that that it's giving you without it completely losing its mind. I mean, that's that's really Yeah.
Nicholas Carlini:No. This is, like, the thing I think is, yeah, really surprising to me where, like, it'll give you a thing and I'll just be, like, that's great, but, like, I realized I told you something that like was actually wrong. I didn't want that, I wanted something different and you don't even have to like be super explicit with them. You just like make this more like this other thing. And it's like, okay.
Nicholas Carlini:And if it turns out it didn't do it right, then you just tell it like, oops, my bad. I actually wanted this other thing.
Bryan Cantrill:Would it that's a good point because I feel like and we, you know, we again, it's easy to dwell on the things that gets wrong. I I don't know if you had I definitely have had things where I have like, I'll ask it a question, and I'm like, that's a terrible question, Brian. Like, that is not that is you can't answer that. Like, that's a dumbass question. And then it's like, oh my god.
Bryan Cantrill:It actually didn't actually did. Like, I get I did not give it enough information to give me a good answer, and it gave me a good answer anyway. Like, there have been there I mean, there have definitely been moments like that that have been really,
Adam Leventhal:yeah, quite the same. Where I'm like, how could I I mean, no human would infer context from the blather I just gave this thing. Like, how like, what am I what kind of optimistic?
Nicholas Carlini:But, yeah, it figures out. Why I really like it is when often, like, framing a good question is, like, hard, especially like finding like this is skill like I think most people don't remember that they had to learn was like how to look for something on the internet. It's like Right. Most of us can, like, just, like, do this, but, like, it's sort of like you're playing this weird game of, like, what are the keywords that the answer is going to have? Like, this is it, like like, most people are pretty good at, like, finding these things now.
Nicholas Carlini:But, like, being able to just be in the middle of a problem and just be, like, here's the thing, it's not working, fix it for me, or like whatever and just like getting a competent answer back without having to take your mind away from the task you're solving. I find really valuable because every time you have to like go look something up it like breaks your flow just a little bit and just like the every little bit of barrier you can reduce here is just very valuable for me.
Adam Leventhal:The the though, that's
Bryan Cantrill:a very good point because that's, like, a lot of what you're doing here is kind of, you know, we always talk about the friction in software development. And that's what you're really hitting over and over again is, is just reducing that coefficient of friction and making things, where on on stuff, you'll be like, oh, god. I gotta ramp up on this new framework, or I've gotta I'm getting this error message. And then and just being able to reduce that friction and and allow you to just go faster, which is what you've kinda done over and over.
Nicholas Carlini:Definitely. And, like, like, there are lots of things where, like, there are a couple of examples I put here are things I, like, would just I would not have built if not for the ability to use these things. Like like like okay. So that's one of the examples I I last year launched made some, like, quiz thing of, like, guess how well gpt 4 can solve these questions. And I I did that again because I thought at the time people were wildly underestimated capabilities of models.
Nicholas Carlini:And it's like it's it's a web app. Like, you know, it like has a quiz where you sort of drag a slider. Do you think it's gonna get it right? Yes or no. And you have a next button.
Nicholas Carlini:Like, this is not interesting. Like, this is not fun programming, like, you know, HTML with, like, these kinds of things. But it has to get done. Like, I want to do the quiz. But like, I have to write, you know, a back end in some programming language, I have to have a front end that like looks passable for a person, I have to like all these things.
Nicholas Carlini:And like, I don't wanna spend my time doing all of that I wanna do the 10%, which is the interesting piece, which is like, let's let's demonstrate some interesting capabilities to these models to people and letting the model take away all of the stuff that would have been painful for me. You know, like, the last time I learned a new thing on the internet was like jQuery and like the blocks model and being told like, you know, like, tell these tables, use divs. Like, I'm like very well out of date on this thing. Like, I don't wanna have to learn the new thing. Just like, tell me how to do it and like, I that's like, it's not important to me right now.
Nicholas Carlini:The important part thing is something else is make the rest of it work for me.
Bryan Cantrill:That's right. And it's kinda like the I also feel like and I you got a lot of examples like this. You talk about, like, for boring tasks or, things or to automate things where it's like, look, you know, realistically, I'm just gonna leave the if you didn't exist, you, large language model, didn't exist, I would this would just be painful or worse or absent. It's not like the world wouldn't stop. But, boy, if you can do this for me, great.
Bryan Cantrill:I mean, then it would be I would I actually do want to do this. But it it's it's just something that is like, I'd say it's like, I it's it's boring, and I also feel like I got a lot of that stuff where, like, look, I'm doing this because I, like, I know that I yes. I can go figure out this framework. I could go, as you said, like, back in the day, you'd buy the O'Reilly book and have it open on your lap and type in the examples. And, yes, if I wanted to do this framework for a living, that's what I would do.
Bryan Cantrill:But, actually, I just wanna do this thing to prove some other point. Yeah. I mean, like,
Nicholas Carlini:I I would still do that today. Right? Like, you know, if I wanna learn some real thing, you know, I would get the book and actually read it since, like, actually learn the thing. But like, there are so many things that you want to do that like, this is like just something off the path that just you need to get done in order to have the thing be solved for you. And, you know, that's yeah.
Bryan Cantrill:I also find that it's like it it GPT can there is a lot large language models can often take something that is well documented and give you better narrative around. You know? Like, I've I I've been, you know, I've definitely used LOMs to write gnuplot. Good plot is very well documented, but the but the documentation is a little idiosyncratic.
Nicholas Carlini:Yeah. All of the first yeah. All all of my plotting now for, like, except for, like, the final plots that I put into papers, but, like, almost all of, like, the intermediate plots for things are definitely generated by these things. Because, like, you're like, just like I want this thing in red, I want this thing in blue, just like make the plot for me. I don't need to know if like is ds.size, is it like point 1 or should I be doing 1?
Nicholas Carlini:Like, I don't I don't really remember. I guess I could look it up if this is, like, an e m or p x, but, like, just, like, make a plot for me, and then I will come back and tell you what you got wrong and then fix fix it.
Bryan Cantrill:And meanwhile, Alistair's being like, you guys are not allowed to do another episode. I can do plot. Are you? This is more than one complete episode. I can do plot.
Bryan Cantrill:You would
Adam Leventhal:like Surprise. It's a good new plot episode.
Bryan Cantrill:Yeah. It actually turns out even you you thought every episode was about LLMs. Actually, every episode is ultimately about canoe plot. But, the it it and I also think that when you're when you're dealing with that kind of tedium, you are also most likely to be distracted. You know?
Bryan Cantrill:I mean, I think that we live in an age where there's so much distraction all the time. I mean, I think it's one of the just challenges that we have with modernity. That that the fact that we're all working from basically a television set, And, you know, there's plenty to be distracted by on the Internet and everything else. And when you're doing something tedium the tedious, it, like, it it it can require just, like, a lot more, energy, but not in kind of an uplifting way. And so you when you've got something that can actually, like, really help just jump start you on that, you just are able to go so much faster.
Nicholas Carlini:Yeah. The the blank page problem is just right, like, very hard of, like, you have this thing, you need it to go and, like, you know it's gonna be painful and, like, you just don't wanna do it. But if you if you have something that we can just, like, get things started and, like, get things going for you. I just find it so much easier to to get get moving with these things.
Bryan Cantrill:Yeah. That was a great
Adam Leventhal:point you made, that that blank page problem. You know, not not just taking the tutorial or the quick start or whatever, but the next, like, 3 steps in the direction that you're trying to go. And as you said, like, you know how to program. You can kinda take it from there, but getting a little more pointed in the direction you actually wanna
Bryan Cantrill:go. And, Adam, I find that this when I'm using, like, a new crate, I I mean, I find that the examples that GPT will give me for that crate are often much better than the examples in the crates documentation.
Adam Leventhal:Oh, yeah. I mean and I use it a bunch with CLAP. I don't know if you use CLAP much, but the, the Rust crate for building CLIs with arguments and stuff like that. When I'm trying to do weirdo stuff like making various arguments depend on other ones and different combinations being invalid, I find it it's just, like, it just knocks out for me. It's so much easier.
Bryan Cantrill:And it was your question of do you use clap? Was that a question to Nicholas? Was that, like, a question to the chat, or was that an actual question to me?
Adam Leventhal:I don't know why I asked you.
Bryan Cantrill:I mean, I'm like I I feel like I'm a guy up, like, a clap OG at this point.
Adam Leventhal:I don't know why why did
Nicholas Carlini:I say that? Of
Bryan Cantrill:course, you Yeah. You're all clap all day. I'm all clap all day. I mean, I've been using clap for so long that I was using clap before GPT was available. But it was actually it's great to be able to use NLM for for clap because there are yeah.
Bryan Cantrill:There are a lot of there's a lot of of interesting options you can put on there. So, yeah, I'm I'm all in clap. Yeah. So, the on on a couple of these one question that you one that you did not have on here that, I have, using do you use LLMs at all to, and I guess you do mention this because you talk about about, to, simplify code, but as code review, to just like, hey, could you take a look at this code and, you know, give me your comments on it?
Nicholas Carlini:Yeah. This would be, okay. So maybe, a little bit so so I write research code, which means the code that I write usually does not need to run-in 6 months. So this is, I guess, one of the big caveats of, like, everything in, like, what I'm saying is, like, you know, the things that I'm doing are usually things that are one offs that, like, need to work for a small amount of time and then the few things that I'm gonna release publicly, like, I have time to commit to make these things better. So yeah, I tend to not do this.
Nicholas Carlini:Another really good example of this is writing test cases. Like a lot of people really like these things for just like half of all test cases are the same, you know, past the empty string, past the thing that's like, you know, that almost out of bounds. I would have these kinds of things that like these models are great at doing that. But, I don't my test cases and so that's not a thing that, I, a problem. I have used these for, but, you know, maybe I should maybe it would be better like maybe research code would be better if everyone just use these things to automate writing test cases for them.
Nicholas Carlini:But at the moment that's not a thing that I I do.
Bryan Cantrill:So which of these surprised you the most? And actually, let me put my thumb on the scale. The one where you uploaded 500 lines of c plus plus and it simplified it was mind blowing to me.
Nicholas Carlini:Yeah. I did not expect okay. Maybe there's 2 that really sort of yeah. I didn't expect for the work. This one was one of the most surprising ones to me.
Nicholas Carlini:Yeah. No. So so what I did for this one for people who who haven't read it is, there's a program called Gulley. It simulates the game of life. I like the game of life.
Nicholas Carlini:I wanted to be able to call into Gulley, which is c plus plus from within Python. And so Goli has a command line interface and so I just took the entire 600 lines of Goli or whatever it was. That's the CLI and just said, like, here's the thing. I want to be able to run this one function that would be called from this particular command line arguments, like get rid of the rest of it. Just like make it that.
Nicholas Carlini:And it like gave me basically perfectly running code that like did the thing which yeah, no, I I I very much did not expect that this would work. But, I guess the the other thing that I've become pretty used to doing is, like, let me just take a task that I don't know if it's gonna work at all and just give it to the model and hope. Like, in the you know, if it gives me the answers, great. I save myself 2 hours. If it doesn't work, fine.
Nicholas Carlini:Like, I I accept the fact that these things are imperfect, then I'll do it myself.
Adam Leventhal:Right. You wasted 30 seconds.
Nicholas Carlini:Yeah. Then and then
Adam Leventhal:The one that blew my mind was when you typed in a bunch of assembly code and said write some Python that corresponds to this assembly. Yeah.
Nicholas Carlini:That was for some other project where some people didn't wanna release the code for their some defense that they had written so that they had a compiled Python binary, but but they didn't release the source code. You know, Python is byte codes. It's like it's not that hard to go from Python byte code to Python source. Like, it's it's it's it's obnoxious, but, like, it's not like that hard. And that one I okay.
Nicholas Carlini:I'll admit, I had gotten halfway through doing it myself. Like, I have gone through, I don't know, like, several hundred lines of code, a couple of files. Before I was like, why am I doing this? Like, at first, I tried to take an off the shelf decompiler that will, like, do it before me automatically and it did 1 or 2 files. But, like, it the decompilers of Python 37, and I had Python 39.
Bryan Cantrill:It's like, oh my god. I always wanted to guess that. I'm like, the decompiler is gonna have some dependency that is completely it gets us to it when it's like, oh, my God. Right. Exactly.
Bryan Cantrill:Right.
Nicholas Carlini:Of course. And so I just, like, I copied the disassembled the the assembled, like, the Python bytecode, and said here's the thing, just tell me what the function is. And, it basically got it perfectly. I recently actually did this for I have some complicated row reduced echelon form thing that I just like just as a test. I just like some, math thing that, it was like 300 something opcodes and that it can do that, like, the recent most recent club 3.5 Sonnet can even do that one, which is like, yeah, really impressive.
Nicholas Carlini:Like, on one hand, it's really impressive. On the other hand, like, going from Python bytecode to Python source is, like, not hard. Like, if if, you know, if I if I gave you the thing, like, you could do it. Like it's like mostly a matter of keeping track of like where are the jumps and which variables are currently on the stack and all these kinds of things and just it's not hard but like it's just painful and annoying to do. Not the kind of thing you wanna be spending your time doing, but, yeah, model can automate these kinds of tasks for you just great.
Nicholas Carlini:And I really like them for these kinds of things.
Adam Leventhal:I think where I was impressed with that one was that it stayed on task and, like, didn't hallucinate some some incorrect answer. Like, that that's one where it felt like, context first required, and it might not you know, it might give you something plausible but wrong. So I was pretty impressed that it nailed it.
Nicholas Carlini:Yeah. No. And to be filled with here, sometimes it is wrong, but, like, you know, if you anyone who's running around any any decompiler understands that, like, most, like, decompilers work in 1 of 2 ways. Either the decompiler gives you a technically correct, but is entirely useless, decompilation which like converts the assembly, jump to a c go to and like just like does this instruction by instruction is like, you know, you know, variable x gets variable y plus z go to label 5.
Adam Leventhal:If I wanted assembly I would have left it in assembly.
Nicholas Carlini:Exactly. Or there are the disassemblers that like try and give you reasonable source code and oftentimes these are just not sound. Like the decompiler tries to do the right thing it like does what best as it can but like it may not be completely sound. And oftentimes the reason why I'm decompiling this as a security person is not because I need the thing to be perfect. It's because I wanna find the vulnerability in the thing and I don't wanna read the assembly.
Nicholas Carlini:I wanna read something like is roughly human interpretable. And so I'll decompile it and and if it turns out that like it has a couple of minor bugs like that's fine I'll fix it. But like what I want is to be able to read the thing to know what it's doing so I can figure out what where I might want to look further and you know in particular I wanna be able to discard as much of the code as being irrelevant as possible and look focus only on the pieces that are relevant and be being able to, like, get a high level idea of what everything is that's going on is very, very nice.
Bryan Cantrill:Yeah. That's that's amazing. So another question I got for you is, you know, we part of the reason that these LOMs are great at these tasks is because there's a lot of information it can consume out there on the Internet that is germane to Python and Python by code or, you know, what c plus plus and so on. When people are developing software systems because one of the the kind of the questions I've had for myself is, does this make documentation? You know, we think that documentation is kinda less important in this world, but maybe documentation is actually more important in this world to give future LOMs something canonical that they can go actually interpret.
Bryan Cantrill:And and I mean, what's your any kind of read on that?
Nicholas Carlini:No. I definitely agree. I think this is one of these cases where this this might be a case where, more documentation might be even better because if you can give as much help to the model as possible to be able to let it help you do these things like that, that might be great. I mean, the other people have pointed out other potential problems of these models of, suppose that someone creates a really amazing package, you know, next year, but, like, no one uses it. And suppose that because no one uses it, there's not a bunch of really great examples of it online.
Nicholas Carlini:And so the models don't learn how to use this package really well because no one's using it. Like, you might end up in a world where this this thing is great, it exists. In a normal world, people might be using it. But if everyone is relying on these models to use only the things that are popular, you might be able to have things that should be which are legitimately very, very great and, like, better than everything else that exists, but just the models aren't good. Therefore, people don't use it.
Nicholas Carlini:Therefore, there's no training data. Therefore, the models don't use it. Therefore, you get in this loop and like it might become very hard to become like, for for to be able to release these kinds of things that are new that that are actually much better that people don't use because of this reason. So I I don't know. I think there's people rely
Adam Leventhal:on LLMs, and then, it's harder to break through because the LLMs don't can't be helpful. That's interesting.
Bryan Cantrill:Okay. But no. I you know what? I gotta say, I think that we have already had problem. We've had that problem for 2 decades plus more where people adopt something because other people are using it, not not based on its own merits.
Bryan Cantrill:I mean, how many times that have we heard, like, I'd love to use Lumos, but I also wanna be able to Google my stack trace, and I can do that with Linux. Totally.
Adam Leventhal:And Agreed that we have the hurdle. It just may be raising that hurdle.
Bryan Cantrill:Well, but I what I think is amazing is actually that for you know, because we're Nicholas, we've got our own operating system, Alooma in Alooma's drive system Helios. And, I you know, I I don't mind. I can't talk about it here or not, but, like, have you asked GPT, like, a Lumos based questions? It's amazing.
Nicholas Carlini:I mean,
Bryan Cantrill:it's really oh, it's it's it's extraordinary. It's very good. Okay. And it's like, like, that's not because there's a necessarily a ton of stuff on the Internet about it. In fact, there there's probably more and so what I but the the stuff that's out there is actually much more canonical, and there's much less just kind of like randos out there.
Bryan Cantrill:So, I mean, it's it's kind of interesting that it's I think it's gonna make it easier to adopt some of these platforms on their merits. So I actually wonder if it's actually better for smaller platforms.
Adam Leventhal:Oh, yeah.
Nicholas Carlini:No. This this could be very much much true too. Like, it's hard to predict what which way these things are gonna win out. I think they have this very legitimate arguments that it might be, you know, someone writes a new framework, they release really good documentation for it. The model's trained on the framework training data of, like, of of how it works, and all of a sudden it's great at it.
Nicholas Carlini:And now all of a sudden everyone can use this new thing because because it's and they don't have to learn how to do it themselves and, you know, they can sort of get get used to it piece by piece. I I think that's also entirely possible. I just I don't I feel like lots of people in the space, have one thing that they believe to be true, and then they see the evidence of the contrary and they refuse to change their mind. I just sort of want people to accept that, like, either these two things may happen. We don't know yet, and I would like to see evidence towards 1 or the
Bryan Cantrill:other. Totally. Yeah. In a way, it is gonna be I I mean, it it I think it will be, it it's just gonna be interesting because I think it is gonna change things, but it's not clear to me that it's gonna change them for
Bryan Cantrill:the worst.
Nicholas Carlini:It may be better
Nicholas Carlini:It may be worse.
Bryan Cantrill:So, p five, I'll commit you that you joined us up here. Do you have a do you have a question for Nicholas or a comment on your own usage?
p5commmit:Yes. Since everybody is using so many LLMs for programming, how do we know we're writing the right thing?
Bryan Cantrill:Well, yeah.
Bryan Cantrill:I mean, so that's a great question, and I think that mean this kinda goes to to, Adam's question at the top. Like, how do you know that it's get it's pointing to you the right way? And, Nick, I don't wanna speak for you, but I think for a lot of these tasks that you're looking for it to do, it one, it's kinda pretty clear if it's completely wrong off the jump. And 2, it's like actually, if it's totally wrong, as long as it's, you know, I as long as it doesn't waste a day of my time, it's actually, it's fine. I I can actually I I don't know.
Bryan Cantrill:Because what what would be your answer to that?
Nicholas Carlini:No. I I'd say the same thing. Yeah. Like, either it solves it for me and, you know, it's you you can just you check if it's if it's right. And, like, lots of things are just, like, there's so many things that are easier to verify than write yourself where, like, you you could sort of, like, look at the code and be, like, yeah, like, that's directionally correct.
Nicholas Carlini:Like, maybe there's a bug, maybe this is missing off by 1 or whatever, but like, I'll I'll figure it out. Like, I can like check the output, especially for like, things where the checking output is easy, where like, you know, it's like creating HTML page for you or something. You like you look at the page, you know, does it have the thing that I wanted the right location? Yes. Okay, great.
Nicholas Carlini:Like, I'm done. Like, this is like, you know, one of these cases where I'm using it in, like, if it's gonna be for some case where I'm writing it for, like, some some critical piece of logic where if it's wrong and some subtle way then everything's gonna fail. Like, I just I want to ask it to do that. Like, you you have a choice to use the thing or not use the thing as the case may be.
Bryan Cantrill:What's with the and this is I do wonder if this is this is one of these kind of interesting consequences because, you know, you're asking it to rewrite things in different languages and so on. Boy, when you ask it to rewrite it in Rust, you'll get a lot more compile time guarantees that you have something that's approaching correctness. I would be much more nervous about deploying JavaScript that was written by an LOM because it's so easy to have an issue that's kind of that that's lurking in the weeds that we won't find until we execute that code. With Rust, you're gonna find that a lot earlier. And I wonder if that's not gonna you know, one of the changes that this could bring is more I mean, if certainly, if you are interested in adopting Rust, you should absolutely be be whipping out these LLMs to help you translate programs.
Nicholas Carlini:I'm actually kinda Certainly. Like, this is yeah. This has been great. Yeah. I I started off by asking them to write c because they weren't quite good enough at Rust and has switched to Rust, like, for like, so a lot of things I write like, here's nothing I do a bunch.
Nicholas Carlini:Yeah. I was I'm one of the only mentioned was I have some Python program, It would give me the answer correctly. It might take an hour or 2 to run because you know, I have like half a terabyte of data to process, but like, I'm not gonna spend my time fixing it because my program is gonna run exactly once. So, like, I'll I'll hit go and I'll go to something else. But now I can just say, you know, I can write my Python program, copy it to the model, please rewrite in Rust, and it'll just give me the Rust, which is, you know, whatever a 1000 times faster because basically just it's a for loop.
Bryan Cantrill:Yeah. Interesting.
Nicholas Carlini:And it compiles and it's especially great is like I can I can ask it for so the machines that, I I work on often are, like, pretty beefy? Like, okay. Like, for some things I'm doing, I have, like, a terabyte of RAM and, like, a 196 cores. And the the kinds of things where, like, doing a single threaded, like, loses you a factor of 200 in performance. And there is absolutely no way that I'm going to ask this thing to write a parallel c implementation because if there's a race condition like I am never going to catch this.
Bryan Cantrill:But if
Nicholas Carlini:I ask it to write it in Rust like they're just there's not gonna be a race condition. It's gonna be correct, like either either won't compile or it's right.
Bryan Cantrill:Right.
Nicholas Carlini:And like this is this is amazing. To be able to to have this have this sort of not like complete guarantee because, you know, it's possible to shoot yourself in the foot. But, like, the number of cases where something actually could go wrong is, like, so much smaller.
Bryan Cantrill:Well, it's a lot harder and if especially if it's not gonna give you I mean, if it starts giving you a bunch of unsafe rust, that would
Nicholas Carlini:be, you know I'm sorry. Okay. Yeah. Right. But at least then you know.
Nicholas Carlini:Like, you look at the output, it's, like, unsafe for the entire program.
Bryan Cantrill:Like, it's, like, okay.
Nicholas Carlini:Nice try. No thanks.
Bryan Cantrill:Back to military school for you until you come back with a with a less on Safe Frost. But yeah. I mean, if if you've got, like, purely Safe Frost. And then also, like, okay. So you get some compile mess compile time message that you may or may not understand.
Bryan Cantrill:You've got something you can iterate back from your okay. I I tried to do this thing, and here's the error I got. Can you explain this error message? Or can you give me more context about it? I mean, there's just so much, so much opportunity to to kind of understand what it is actually trying to do, and then get some get some actual confidence in it.
Adam Leventhal:Nicholas, for all the stuff are you doing that you're doing, are you kinda pasting it into one window, looking at the response, copy and paste another? Are you using it integrated into I think you said e max was your editor of preference.
Nicholas Carlini:Yeah. I'm one of those people.
Adam Leventhal:So I used to be one of you too.
Bryan Cantrill:The you're in a safe place. Welcome. Welcome.
Nicholas Carlini:You're in a safe place.
Bryan Cantrill:We were in the spirit, of world peace. We welcome you.
Nicholas Carlini:Yeah. So for most of these ones, this is actually being pasted into an editor because I wanted to have things for that I put online that I well, so there's 2 two reasons. 1 is I wanted to be able to have proof that I actually asked a model and didn't edit the outputs. So all of the things that I did have a link to, the l l m I that it came from where, like, you can see on, like, their servers, I actually did run it like this. So that's that's half of the reason.
Nicholas Carlini:And the other half of the reason is for starting new projects, I oftentimes will start from the editor just to talk to the thing for a little bit. And but I do have, okay, you know, I'm an Emacs person. I have my own sort of set of, macros that I can, type to, like, query the model interactively in my my editor. But, like, those queries are almost are, like, very hard to talk about with other people because it's, like, I'll highlight, like, 5 lines of my code and be, like, make this change. And it's very hard to show someone that.
Nicholas Carlini:Right. It's much easier to show someone here's this sort of like thing that I did from scratch. Here's why you can look at it and like understand what I'm doing and not just be, like, make these 5 lines of code different.
Adam Leventhal:Yeah. I I I confess I'm asking because I'm a bit LLM curious in this regard. There's some of my colleagues, and I've seen folks having it real integrated into their editor, and I'm very tempted. So, just seeing if you had any tips there.
Nicholas Carlini:Yeah. No. I I I I like it. Yeah, one of the things I think I mentioned here is, like, so I I have it. So I think I have 3 different modes of the way that way I have it set up for Emacs.
Nicholas Carlini:One is I I have one key cam key command that will, copy, which will look at the entire file and I can just ask something about the file and just go to a conversation. One of those I have to do it with the current selection, so I can select something and it will replace the selection with whatever I with the output of the model is. So I use this a lot being like, here's the thing, it does this, like, oops, you know, rewrite, please make this change. And the final one is one that, will look at the entire emacs window and just like feed whatever I currently see, which is great for when you hit an error where like, you know, like something has gone wrong. Like I already have open the relevant code and the error message.
Nicholas Carlini:Just like look at everything, I'm not gonna reformat it just like here's everything I see, like please fix. And it will just like go and and do this and yeah, I like I I I like this. I think it's useful for me again on the reducing friction side. Maybe it's wrong, but, like, I oftentimes will just, like, ask for the answer and I'll try and start looking. And I can go skim what the model tells me and, you know, maybe it's right, maybe it's wrong.
Nicholas Carlini:It's it's probably wrong more often than not in these cases where, like, you know, you have some complicated error. But I feel like part of the reason why I do this is because currently these models are still getting better, you know, with every new release, and I want to be in the habit of if if you if it's giving you the correct answer every time that you're asking it, it means you're not asking it to do enough things. And so I I want to be in the habit of sort of being surprised on occasion so I know where the limits fall. But
Bryan Cantrill:That that is good guidance. And also alright. So this is you've got, you know, hot keys to allow it to did you have did you have an LOM write the e list?
Nicholas Carlini:Some of it. So I, yes. I know common list, but not e list that well. So I have it I had it right. Like, I don't know all of the the the the this is the thing that it's great for.
Nicholas Carlini:Like, what is the API to, you know, move the cursor, you know, up to the beginning of the last mark in Elisp. Like, I I don't know. I I don't even know how to ask that question. Like, like, how do you find this on the Internet? But you just, like, you type this sort of, like, you know, completely random thing and, like, it will, like, give you what the API is.
Nicholas Carlini:And so yeah.
Bryan Cantrill:But also, like, at least for me, and I guess I'm I guess I'm reviewing myself as okay. Fine. They've forget world peace. Welcome to Vem Town. The e as in e as in e list.
Bryan Cantrill:I I actually don't want to know the answer. I do not want to crowd out any other fact in my brain with any Elisp. So if you could please give me this answer and I could just, like, cut and paste it so I never have to see it, that would be great. I I'm just I'm not, is because it's, like, it's things where, like, I'm actually not curious about this. I'm cure I'm a curious person.
Bryan Cantrill:I'm interested in many things, but, like, not this, not right now. So just give me the E list macro Yeah. No.
Nicholas Carlini:This is yeah. Again, another thing with these things are great for is like, like, yes, like, just make it happen, please. I don't need to see how how it were. I don't know. No.
Nicholas Carlini:Once the way the bodies are buried, just, like, make the thing happen.
Bryan Cantrill:Well, I think so. Another so, Adam, I know that you didn't ask me this question. I'm I'm not I'm I I, you know, I don't wanna sound heard over, but I guess I am a little. But the the you know, as you know, I am, of course, living in the worst living in my own filth respect to this, and I've got no in editor integration whatsoever. Maybe this is why the question was not asked.
Bryan Cantrill:But so I'm I use basically the web interface to chat GPTs. My kinda go to. So, I will say one of the advantages is, of course, it has your full history there. And, and I I and, Nicholas, I don't know if you this is the kind of way you did this. I'm just like, I'm just go through my history and see what I've been asking this thing.
Bryan Cantrill:But your blog entry did prompt me to go back through my history, which I really haven't done. And, god, I've asked it some very and, of course, there there's, like, no context for some of the things I've asked it. So I've Adam, I've told you about this about about asking Chat GPT Life advice. Like, I really recommend this. Have you ever done this at all?
Adam Leventhal:I have to, actually. And since, and Mike Caffarell is in the audience. I think he we were talking about this when he was oh, now he's now he's up on stage. But when he was on, you were talking about, getting parenting advice, and he described it as giving the norm core answer, which I thought was just a a great term and exactly what it's good
Bryan Cantrill:for. Well and so I actually in in the spirit of that and, again, I've got no context around this, so I do not know. I I actually need to figure out the exact date I asked this, but I apparently, at some point, asked in the last 30 days, asked chat gbt4, during a meal, what is the best time to bring up a potentially awkward subject? And it gave me and this answer is extraordinary. The answer it it gave me an extremely good answer, but I'm like, what?
Bryan Cantrill:Chat Sheet, can you tell me why I asked you this? Why did
Nicholas Carlini:I ask
Adam Leventhal:to sleep. Oh, that's wonderful.
Bryan Cantrill:I I also, I asked you to explain I've asked you to explain tweets
Bryan Cantrill:to me. This is really pathetic.
Bryan Cantrill:I feel You
Adam Leventhal:need a hot key for that. You need your Chrome extension. Ask it ask it to build you 1.
Bryan Cantrill:Totally. A Chrome extension like a yes, yes, no. It actually although said that said on tweets, of course, because on anything that got kind of, Internet and current and emi, it's gonna it's gonna struggle, but it's also will be, like it will kinda give you some good places to go look. So sorry, Nicholas. I didn't mean to break us down into the gutter here.
Nicholas Carlini:No. This is fine.
Mike Cafarella:Hey, guys. I have a question for Nicholas. How's it going?
Nicholas Carlini:Hey.
Mike Cafarella:Hey. So, Nicholas, you know, a lot of the stuff that you're talking about in terms of using LMS for coding, I I do a lot of myself. In particular, I was happy to hear someone else is doing the compilation and, annotation of the results. We we've done a little bit of that ourselves. But there's one question I had which was around, like, marshaling the context.
Mike Cafarella:I think you answered this a little bit when you're talking about, like, just sending to the LOM anything that you see currently in your e max window. One use case that I would like to do a lot of, but I don't really know how to do a great job of it, is when I'm in some project where most of the files that are relevant to some question are, like, not in the editor window or they're in some kind of repository, You what I would like to have is some kind of system that, maybe a controllable one will, like, marshal the right headers or marshal the right, just the right, external docs to answer the question when I pose it. Have you have you played with this at all? Like, have you tried to be experimental or interesting in how you use the evidence to provide to the the model when you ask the question?
Nicholas Carlini:No. This is a great question. This is I think one of the cases where what I do is different enough for most people that I think, yeah, I haven't explored this where the code that I tend to work with as research code is usually fairly small self contained kinds of projects where it's not, you know, big giant behemoth of things. And I think this is one of the areas where you don't there there's a large aspect of language models that doesn't need the models to become better for the things to become a lot more useful. And this is one of them where I think there are now starting to be people who are making plugins for editors that try and do this.
Nicholas Carlini:That like, you know, the model has a 100 k context window, it can look at a 100,000 roughly words, but, like, you know, every token costs some amount of money to show the model. So like show the model the minimum amount of things so that you're not gonna blow the budget on whatever you're making, but it still can answer your question and like how do you do this? Do you look at all of the recursive dependencies that seems expensive? Maybe only one level deep. Maybe you try and be smart and ask the model, should I look at this file?
Nicholas Carlini:You know, there's lots of lots of standard programming engineering that, like, takes this as a tool and tries to make things a lot better using this that I think is it in, like, it's an entire area where I don't know. I'm surprised that there are not more startups that just are doing this now. Maybe they are and I haven't seen them but, like, this this as a direction of like, take the model as a tool and build a big giant thing around it that uses the, for lack of better word, like intelligence or, you know, like the the ideas from the model and puts the rest of the programming stuff around it to make it have something interesting. I would really like to see more people doing this thing.
Mike Cafarella:So I think I think co Copilot when you use Copilot in Visual Studio, or in Visual Code, like, Copilot will give you different answers when you have different tabs open. So, like, if you have, you know, files a, b, and c loaded at the same time that you're working in window d, like, it will use that evidence, but I don't think it's doing anything beyond that. So, like, there's something very basic going on there, but you'd like it to be a little smarter. Maybe you say, like as you say, like, it follows dependencies to some distance. Maybe it goes and grabs, like, the recent test results or, like, last night's unit test or something like that.
Mike Cafarella:There there's all sorts of clever stuff you could do. I'm not sure, like, what the what the absolute latest, and greatest does, but there seems to be a lot of flexibility that's possible.
Nicholas Carlini:No. I agree. I I feel like, you know, this is definitely the case. What I have, looking back in 10 years, I will almost certainly cringe at, like, you know, the, like, that's all you were doing. You were just like literally copying the entire text that was visible on the Emacs buffer just like pasting that to the model.
Nicholas Carlini:Why why didn't you sign more clever? But like, yeah, I'm I'm sure that people will have a bunch of things that you're doing this, once they sort of get around to putting it all together. I I just don't happen to be doing that right now.
Bryan Cantrill:Yeah. And this is the kind of thing that will actually really help. I mean, as practitioners, we're often in these big systems trying to make small what seemingly simple small changes to large systems, which can often take too long because of the tedium. So it and or just the the need to comprehend the system. And the the more we can get some help on that, and and especially just like ramping up on a new system.
Bryan Cantrill:Can you help me comprehend this thing? Can you, I mean, I would love to be able to just, like, give it code basis and, you know, can you give me some design documentation for this? So maybe like, listen. If I give you the Python bytecode, you're able to give me the Python. Why can't you give me the design documentation from the system?
Bryan Cantrill:That kinda that that would be, I think, really, really helpful for the practitioner.
Nicholas Carlini:They're starting to get close to this. I mean, like like there used to be a technical limitation, you know, so last year or so, like the most you could fit into the models at a time was, I don't know, let's call it a 1,000 lines of code, something like that. Now it's like, you know, a 100,000, like, they're like, it's getting to be the case that these models technically support the ability to pass large amounts of input. The question is just we need people to start building the things that actually give the models the ability to do this in a way that is not enormously expensive, you know, and it's not gonna and and to be clear, the cost you should be comparing against is the cost that the person would have taken to do the thing. So, you know, if you could save a programmer who you're paying whatever salary you're paying an hour, like, you should be willing to pay a reasonable fraction of that, in l l m costs.
Nicholas Carlini:I think people are would be very aghast at doing this right now, but I feel like the this will at some point people will will become more used to to this, but I feel like it will take some time for that to happen.
Bryan Cantrill:Yeah. And and right now, anyway, you're able to use these things are I feel like all operating at a loss at some level. I mean, Adam, do you feel bad at all when you can see that it's really grinding on something? You're like, ugh, god. Sorry.
Bryan Cantrill:I'm just generating a lot of greenhouse gases on what is custom asking you to do some because I mean, you can get it like, you get and start uploading images and asking it to process it and just I mean, like, you can really get it to start grinding.
Adam Leventhal:I mean, when I start going on some tangents where I'm like, this is this is irresponsible. When I'm having it, like, generate images in the style of the Simpsons on a thing that I think would be funny, like, to exactly me, like, what am I doing? Like, not just with my life, but with everyone else's.
Nicholas Carlini:Totally. This is gonna be like
Bryan Cantrill:I mean, or, you know, obviously, Adam, you and sorry. Mike and I are the same are exactly the same vintage where it's just like, you used to have Big Macs. You see eat Big Macs out of styrofoam? Are you sure that's right?
Nicholas Carlini:It's like,
Bryan Cantrill:no. No. We did. No. I'm I'm I'm sure that's right.
Bryan Cantrill:I definitely we definitely did, and I agree with
Nicholas Carlini:you. It's like, it does not
Bryan Cantrill:make any sense at all. Are you really, like you would have it just write poetry as Homer Simpson? Like, do you did you not realize the amount of society's precious resources that we're
Adam Leventhal:I did. I did. I kinda knew. Yeah.
Bryan Cantrill:I did.
Adam Leventhal:We knew we were doing the wrong thing, but it was so funny. I mean, we thought it would be, but it wasn't, actually. It wasn't that funny.
Bryan Cantrill:Yeah. So, Chris, you got do do you have a question or a comment on how you've been using these things?
Speaker 5:So, I am definitely a self proclaimed Copilot addict. Tab is my new favorite key in the editor. But a different use case that I think I would like to inject, like, you know, code completion is really cool, and that's, like, changed the game. A big one for me with I have disabilities with my hands, but I feel like it's not talked about enough that the speech to text models combined with something like chat tpt to do, like, post processing and editing has been totally game changing, in terms of you can just say technical documentation and words and code editing, and it it can do it like it's never done before. And, like, the step up it's like a step function change from using Dragon or one of these other softwares.
Speaker 5:Really, it's been an incredible, productivity boost.
Bryan Cantrill:And so when you say speech to text, are is this your speech? Are you talking to it? Or are you getting something else that's spoken and having that like, watch having watch a presentation or what have you?
Speaker 5:So this is, like, I'm speaking, and it's producing the code. And then what you can do is hook
Bryan Cantrill:this up. Yeah.
Speaker 5:You know, that that's pretty good, but it lacks a lot of, like, sentence structure and grammar sometimes. But it's it's pretty good. But then you can, you know, handcraft custom prompts. You then feed that through chat gpt with, like, some macros, and then you get out, like, beautifully formatted text that's upper English, and it, you know, knows how to say all of your technical words. And you can even put in the prompt all of your technical company information in terms of, like, you know, if I if I say something silly like this, like, you correct it for me and and do these kind of things.
Speaker 5:And so you can just, you know, in Slack or you're writing documents for work, you can just start recording, ramble for 2 minutes, and get, like, an up a couple nice really, you know, succinct paragraphs that are edited for you.
Bryan Cantrill:Yeah. That's, that's amazing. And is this so I gotta say, Adam, are you comfortable talking to the computer?
Adam Leventhal:I'm I'm getting there. You know what's, you know, it's weird. I'm more comfortable talking to chat GPT on my phone for whatever reason than I would be sitting at my desk talking to my computer.
Bryan Cantrill:Because I am not at the I mean, this may be a generational thing, and I may be because my kids are extremely comfortable talking to to I mean, the the number of things that they or even, like, they even though they complain about Siri, for example, they still use it all of the time. And so I I I think that the yeah. The, and, so I think that that's it. I I need I need to start living in the in the future, at least, professor.
Adam Leventhal:Have you tried the chat gpt, like, interactive audio mode on the phone?
Bryan Cantrill:I haven't. Have you and you were raving about that. Right? You you tried that.
Adam Leventhal:Raving raving in a way. So when Simon was on the show, he he was talking about, you know, he'd go for these long walks with his dog, and and when he get back to his desk, would have I'm gonna I I assignments in the audience, but I'm gonna hyperbolize and say, you know, thousands of lines of code written, whatever. So I was I was taking the dogs for a walk, like, the night after that show, maybe the night that that same night. And ended up getting into a 45 minute argument with Jet GPT about JSON schema and, like, trying to prove some incongruence that it was insisting was true. So I I I'm not raving about that experience necessarily.
Adam Leventhal:But I've had other good ones where, you know, for doing, actually doing some trip planning and some some travel planning. It's been terrific and and had a good interaction with it.
Bryan Cantrill:Is it like maybe is it less apologetics? Normally, I feel like it backs down. I didn't I don't
Adam Leventhal:I I
Bryan Cantrill:It it it it is it like, hey. Listen. Like, listen, asshole. Asshole. If you wanted the answer, why are you asking?
Bryan Cantrill:You're you're asking me for the answer. I'm gonna give you the answer.
Adam Leventhal:It does back down. I think I've gotten into some of these situations where, like, either I'm not being firm enough with it where I'm trying to prove it wrong, or I'm just giving it too much rope, but it just digs in its heels. I I had another one. I think I told you this where, again, I think after either I think it was last time, Nicholas, you were, you were describing some use of LLMs, and I decided I was gonna have an an LLM write the show notes because writing the show notes for these podcasts is kind of a pain in the neck for me. And it totally gaslit me.
Adam Leventhal:So, wow, this is a really long transcript. It's gonna take me several hours to do this, you know, but it's gonna it's gonna take a little while, but I'll I'll go work on it. I'll let you know as soon as possible. And I thought, oh, wow. That's that's terrific.
Adam Leventhal:So I just I don't do I need to do anything else? I asked and said, no. No. No. I'm good.
Bryan Cantrill:I'm good. I I I yeah.
Nicholas Carlini:And and
Adam Leventhal:I said, how will yeah. And I said, how long is it gonna take? It's like, well, it's gonna take it's gonna take a while. The next morning the next morning, I checked in, and and I was like, you know, how'd it go? Like, where
Bryan Cantrill:I would go.
Adam Leventhal:Is it gonna be done? And so the and then it just explained. Like, I don't work like that. Like, I don't work when you're not here. You think I'm working when you're not here?
Adam Leventhal:No. I I am call and response. What did you think I was doing exactly?
Bryan Cantrill:The night manager told me I should check-in in the morning. Like, I don't know. I don't know who that is. I would my shift my shift started at 8. I could tell you that no one would tell you that.
Bryan Cantrill:Yeah. I
Adam Leventhal:I said I must clarify that as an AI developed by OpenAI, I don't have the capability to perform real time processing or analysis that takes hours or days. It's like, why didn't you tell me that last night, buddy? Like,
Bryan Cantrill:was it apologetic then? That's this is God. This is great.
Adam Leventhal:It was it said I must clarify. I don't detect an apology in there. I feel like it's
Bryan Cantrill:it's gonna be God, this sounds like an apology that are written by me. That's like like, this speaks more like a clarification than an apology. It's it's like more like you're restating your position more emphatically. That's, so it and is what was that on? Was that on GPT
Nicholas Carlini:4 or were
Adam Leventhal:I I got a I think it was 4 o.
Bryan Cantrill:4 o. Yeah. Because I think of the I mean, as the with these other maybe we'll try because that would be a great that'd be so helpful to have it. And I mean, I we we must be close on because one thing I had not tried, have you tried uploading RFPs to it and seeing and getting it to
Nicholas Carlini:No.
Adam Leventhal:I haven't done that. This is the box side request for discussion where we go on sometimes briefly, but sometimes at at great length on on focused topics.
Bryan Cantrill:Yeah. I I maybe because I, you know, I honestly, Nicholas, some of these examples just caught me off guard in terms of the amount of things you were giving up. I just don't think I realized that the windows were this large. The and
Nicholas Carlini:Yeah. There's nothing I just think people should try. It's like just, I wonder if it can do this thing.
Bryan Cantrill:Well, let's just Yeah.
Nicholas Carlini:Let's give it in a couple of examples. And if it turns out that it does a thing great for you, then great. You know, you've learned a new thing. If it turns out that it doesn't, then, like, accept the fact that the tool is imperfect and, you know, wait for the next one and try again. And, like, you know, I'm I'm sure this will be one of these things.
Nicholas Carlini:I'm not looking forward to this where, you know, people who grow up with these are so much better at just like knowing and trusting that like which kinds of things these models can do, which kinds of things that they they can't. We're like Yeah. You know, I have to like remind myself now, like, this is a task that seems like it should be easy, but I don't want to do it. I should ask the model and see if it gives me the right answer. It's like that's not my natural response right now.
Nicholas Carlini:I have to, like, remind myself to do this, and I'm sure that the people who are growing up with these things will will treat this as, like, just another tool that any of us would immediately jump to, you know, instead of just open the debugger, you know, the first reaction, let's ask the model. And, like, this kind of thing and, you know, maybe it's wrong, maybe it's not, but like like yeah. I think this would be a this is the thing I think people should do more just to get experience. Like, yeah. Maybe it works, maybe it doesn't.
Bryan Cantrill:Well, I I think that that's a very good point in that, you know, because I this is, like, the way I watch my again, my kids use Siri all the time without frustration because they've got a very good intuition for the kinds of things that it can do and the kinds of things that it can't. Whereas, anytime I interact with, like, where I'm speaking, I feel like have you seen the the scene from Curb Your Enthusiasm where Larry David is interacting with Siri?
Mike Cafarella:I don't
Adam Leventhal:Yes. I
Bryan Cantrill:I'm I'm I'm probably that the chat. I mean, it they're like, this is I feel like this is my all of my interactions with with Siri are just, like, just end in absolute combat. And I I think, Nicholas, you're making a very good point in terms of that that rising generation. You know, there's been a lot of concern that, like, oh, god. You know, LLMs do everything.
Bryan Cantrill:So, you know, these were gonna be the the the helpless Charlie Brown heads in James Meggins classic, parlance. Whereas I actually think that they're what folks are kinda growing up with this or coming into software engineering with this are actually much better at viewing it as a tool. And the I mean, like, my my kids used, and I I actually we I mean, my my 17 year old ended up with a GPT getting a a paid GPT account because he's like, GPT 3 is I I'm actually asking these things frequently enough that I actually am really seeing the material differences between GPT 4 and GPT 3, and not to just not just to cheat on schoolwork.
Adam Leventhal:Not merely. Not merely to do that.
Bryan Cantrill:And not merely, I think I've said this here before, but if I haven't, I guess I'm gonna out him, for writing next door posts and trolling next door, with which, I mean, admittedly, like, it is so good at that, if you ask it to write a next door post as an indignant adult about a ridiculous issue. It's so good at it, and it gets so much reaction that accurate
Adam Leventhal:corpus there.
Bryan Cantrill:Oh, and, I mean, like, the teenagers literally got bored with it. They're just, like, this is I'm I'm fishing in a sock pond. Like, there's just no sport to this. This is not actually that interesting. I got it because, I mean, that we had the at one point, they were just, like, showing me next door posts.
Bryan Cantrill:It's like, this post has got 50,000 engagements. I'm like, oh my god. Okay. Yeah. The so I mean but they're really integrating it into their lives in ways that does not feel like they're I mean, it really feels like it it's much more in the spirit because of your post of, like, really using it as a tool.
Bryan Cantrill:And for I mean, for us, like, I mean, Adam, when was the point where you really needed a search engine to write code? I mean, if we're, because I was, like, a little bit
Adam Leventhal:Like, when? Like, every day. What are you talking about? I don't know.
Bryan Cantrill:No. No. But the the the like, when in in your career? Like, that obviously did that and started the nineties. I mean, we weren't well, you you weren't writing a code using hot bot.
Bryan Cantrill:Right? I mean, it's like or, like, that was there or whatever. But it was like I feel like by the I mean, pretty early 2000, I mean, I'm you know, by 2010? I don't know.
Adam Leventhal:I think 2010, that I mean, that feels like for sure. I mean, it was part I remember doing, I was running interview process, you know, hiring folks, and, and the question came up around that time, like, well, why do we expect people to have memorized all of these APIs? Right? Like, I don't memorize these APIs. You like, the cert the search engine is part of your development process.
Adam Leventhal:Like, why would you test someone's ability to write these things in a way that is totally unnatural for the way for the way they actually operate?
Bryan Cantrill:Totally. And where it's like, I would use a search engine for this. It's like, why? Yeah. Use an OM for that.
Bryan Cantrill:Yeah. No. That's interesting. I mean, it does really in terms of how it changes the hiring process.
Adam Leventhal:Yeah. Yeah. Nicholas, reading your post, and on this topic, it made me think, what should we be how should we be teaching differently? Whether it's CS or
Nicholas Carlini:general very hard question. Totally. I do not envy the current academics who have to teach.
Adam Leventhal:No. It's something that's totally taught. Like, I I mean, I see this through my own, you know, recent high school graduate where, you know, he's like, well, you know, half the class is just cheating all the time. And he has an a English teacher who, is pretty aggressive, but the way her answer to it was everyone writes in by hand in my classroom. Right?
Adam Leventhal:She doesn't accept papers that were turned in overnight or whatever. It's just all the all the writing is live and all the reading is at home. So there are certainly some reactions, but I guess really what I was thinking of was with regard to computer science curricula. How does it change? Like, what are you teaching differently?
Adam Leventhal:What what kind of skills
Nicholas Carlini:Yeah. Are are
Adam Leventhal:are, you know, changed?
Nicholas Carlini:I mean, this is an excellent question. I'm I'm I think no more qualified answer this than anyone else, but I think, you know, if I had to give a speculation for an answer, I would say, this is the kind of thing where it's hard to know because, you know, the like, why do we cheat teach, you know, young kids how to, like, add and multiply? We don't do it because this is the thing you're gonna have to do in your like daily lives. Like the the reason you do this is to teach them the rigor of mathematics and like all these kinds of things. Yeah.
Nicholas Carlini:Where you you teach people to do things not because that you want to teach them that skill, but because having learned that skill has made you, better at being able to do other kinds of things. And I do worry that it's possible that it will be important to learn how to program, not because this is something that we expect that you will be doing every single day, but because the the clarity of thought, and the ability to debug something depends on your ability to have have written it. On the other hand, maybe it turns out that what you need to teach people is how do you interact with the models, how do you debug code when the model gets stuck, and, like, maybe it's okay if people don't know how to write something completely from scratch, but are okay jumping into a code base that they didn't write and just, like, fixing the problems. I I really don't know. I think this is one of these, very uncertain questions that, you know, a lot of people who I I work with are gonna have to figure out because they're the ones who are actually teaching the the students, but I I fortunately am not doing that.
Bryan Cantrill:Let let me suggest a third path, because I feel that and, you know, back when especially when we're doing a lot of university recruiting, you would see really strong contrasts in undergraduate curricula based on how lab intensive they were. And, you know, Adam, you and I were are blessed with an alma mater that always had a philosophy of being very lab intensive, and I feel like it always showed. In order to be lab intensive, I mean, Adam, talk about, like, tedious monotonous code, a TA code. You know? And all of the for those lab intensive courses, there were undergraduate TAs that were doing a heroic amount of work to allow the course to be lab intensive.
Bryan Cantrill:It's really hard to make a course that's lab intensive. And, you know, one thing I might suggest is, like, hey, maybe these LLMs allow a lot more courses to be a lot more lab intensive. Yeah. And, allow people because I do think, Nicholas, to your point, I mean, I think that when you debug these systems, you are forced to learn it. And so let's let's get a lot more aggressive about what we expect undergraduates to do from a lab perspective and really do this stuff, make this stuff on their own.
Bryan Cantrill:And then there's I mean, I feel like there's there's never been an excuse, especially with, you know, you're you're kind of paying a top dollar price, and you should be able for for your undergraduate education, and you should be able to to demand real, intense labs, I think this gives us an opportunity to build those labs on the cheap. And I think that that, you could actually do a lot more, and we should have higher expectations for those courses in terms of of having students do way, way, way more and giving them the apparatus
Nicholas Carlini:where they
Bryan Cantrill:can go do it. I mean, Adam, I don't know what you think.
Adam Leventhal:I I think it's a great perspective. And I think you're right that, we were blessed to have gone to a very fancy eve school that that was able to provide these things, but has do LLMs democratize that significantly? I mean, even beyond the way that information has been democratized in the in the last, you know, 25, 30 years since we were going to school, does it make it so much more approachable for any school to replicate that or for folks to do that outside of the outside of a school environment?
Bryan Cantrill:Totally. This
Nicholas Carlini:is one of the things where, you know, just like the ability to scale up some of these things is something I'm very, yeah, very excited about. I'm a little cautious on. I don't know to what extent I would fully trust these things to actually be good teachers in some way or like, you know, like some people are very excited by this. Andre McCarthy, I think has recently just like his his, who very important to my person, but okay. Okay.
Nicholas Carlini:Who has recently started, like something where he's trying to do ML for teaching. And I thought that Khan Academy is trying to do this too. And I think like if it works, it could work very, very well. It could also work very poorly. I'm, you know, we'll see how how that whole thing goes.
Nicholas Carlini:But I guess one of the things I am very excited about for this for programming in particular is it's like, it's really hard to get new people into programming, when the with the complexity of all the things you have to do to make something that looks interesting today. Like, like, imagine, you know, 30 years ago, like, you could do so like, like, you could write a text engine, like a text, a little text game that, like, people would find fun. Like, you know, like, it doesn't, like, require crazy, crazy stuff. Now, if you if you wanna, like, write something that, like, looks exciting, like, it's, like, legitimately very hard, where, like, it's gonna be hard to get people interested and, like, show them, like, a fun a fun demo that they can actually make. One of the things I'm very excited about the opportunity for is, like, you could imagine showing, you know, people who don't know that much about programming, but, like, can, like, express, like, make me a game that does this kind of thing.
Nicholas Carlini:Totally. Help me, like, I I want to make the person, you know, have a weird hat, like, give them a weird hat. Like and they can like get interested in in doing the thing and then it will get stuck and then they'll have to start having to figure out like, okay now I I I have my thing, like my my my my idea is on the screen. Now I just need to like start making small little little tweaks to it to to make things better. And like this is how how I got into programming Yeah.
Nicholas Carlini:Is like I like my both my parents are programmers, and like my father would like have some some program that he would, like, he would he would help me write, which basically meant he would write. And then, like, I would, like, make small little tweaks to, like, I I found, you know, amusing as a little kid and, like, I I from there, I could, like, actually it gave me a reason to want to write a larger program. And, like, not everyone has, you know, parents who can do this for them and, like, the ability of just, like, getting the first version of a thing that that like gets people just like even if they can't can't do anything more after that, just like excited by the idea that they can have a program that the computer can do a thing for them. Like, this is a thing that I think I'm I'm very hopeful will will be a thing that works.
Bryan Cantrill:100%. And I think it gets us back to an era that Adam and I really came up in where it's like my first programs, like like, actually many of my generation were from enter magazine where you had a basic program that was a game for and you would literally I mean, you you would type the basic program in, and you would get the free game. You know, you'd get the space invaders or whatever. And when it was broken, you had to figure out why. And, you know, I don't think anyone was concerned, like, oh my god.
Bryan Cantrill:Like, you know, the youth of today are just gonna I you know, when you're on the job, are you just gonna be waiting for the next enter magazine to, you know, and I think but I you were able to build something. So, Nicholas, just exactly your point, Like, the ability because I think we did go through this era where my I don't know. Like, actually, my concerns honestly with the Chromebook is great as the Chromebook is is that it's kind of impenetrable by design. And the idea of actually allowing these LMs go build something fanciful for a kid that they're gonna find interesting. And then they're like, okay.
Bryan Cantrill:I wanna tweak this, so I need to understand it a little bit better. I I mean, I've I've I've tried to, you know, your your your children are much more intrinsically motivated, Adam, but I've always felt that the if I could somehow get, like, getting the Wi Fi password as, like, an escape room that would also teach them something in the process. Do you know what I mean? Or like As
Adam Leventhal:your proof of work. Yeah. Right.
Bryan Cantrill:Yeah. Yeah. Yeah. Right. Or or like the other thing, like whenever we get a new device in the house, my kids race to the parental controls to lock me out.
Bryan Cantrill:So, like, if, like, basically, if you don't discover your parental controls, your teenagers will discover it for you. And then next thing you know, you're asking their permission to change, like, wait a minute. What happened here? I think that, like, the the, you know, this allows us to kinda cut through those layers of abstraction and show how stuff works. And I, and I also think then then getting individual help, and you mentioned this at Nicholas in your post, about, you know, as a tutor, Sal Khan has got a terrific piece on this, and I'm gonna drop in.
Bryan Cantrill:I'll I'll drop in a link to his TED talk on it. But, Sal Khan talking about how this can completely revolutionize tutoring. And, you know, if you've got someone who needs additional help, and I've been just being able to iterate, I think it's it I think it's quite possible that this is gonna really uplift way more people.
Nicholas Carlini:Especially from the perspective of, like, how many people have a question and don't ask it because they think they're gonna make themselves look stupid.
Bryan Cantrill:Totally.
Nicholas Carlini:And, like, but, like, you can just ask the computer now. Like, they're just not gonna judge you. It's gonna be very, like, this is one of the kids. It was like, it's it's very it's nice that it's apologetic. You know, like, you ask for an answer, you say, I don't understand.
Nicholas Carlini:It's gonna go, oh, sorry. Like, let me try and explain it to you another way. And, like, you can just keep saying, I don't understand it. It will be entirely pleasant.
Bryan Cantrill:It will continue to blame itself. It will at no point be like, Jesus Christ, you're thick.
Nicholas Carlini:You are
Bryan Cantrill:I mean, come on.
Nicholas Carlini:Right. And and, like, this this ability of, like, you know, like, I I do this for many things where it's like, it's here's a very basic question. I don't know the answer. I'm sure that this is entirely trivial. Like, just like, help me understand this thing that, like, I I don't I'm just not I don't get it.
Nicholas Carlini:And just like you can just like ask and you have to be a little careful because they're currently in a world where, you know, you can say help me explain this thing, and it may just lie to you. But, you know, I'm I'm hopeful that as they get better, they will do this less. And also it's true that for easier things they do this much less. And, yeah, no, I'm very optimistic about this as the thing that will, like, just allow people to to ask questions that they want because they don't understand something very simple, and and they don't wanna bother someone. Like, you know, a lot of these questions, like, when I yeah.
Nicholas Carlini:Okay. So I I've been doing some electronic stuff. And, like, the last time I did electronics was, like, 2,009 or something. I don't know. Like, I I there's so much stuff that I don't know that, like, is entirely trivial.
Nicholas Carlini:Any any person who studied electronics in the world can enter this. And at the same time, I'm not gonna call someone up and be like, so I have this resistor and, like, it says, like, you know, like, 601 on it. What does that mean? Like Right. Just but, like, I I could.
Nicholas Carlini:Like, I know someone who knows the answer. I'm not gonna do this. Like, that just, like, why would I like just like, no one does this, but, like, you know, you know, you could just ask the the model, like, gives us, like, incredibly dumb question. Just like, tell me what the answer is. I'm I'm sure that this is easy and it's known.
Nicholas Carlini:Just, like, help me out here. And, like, they're great for that.
Bryan Cantrill:Right. Absolutely terrific. And the I I also think and, actually, Adam, I'm just going through my own history. I I only forgotten that I had done this is that I did actually upload an entire RFP I'd written, and I asked for its comments on it.
Adam Leventhal:Oh, nice.
Bryan Cantrill:And it gave me some pretty thoughtful comments. And, I mean, I disagreed with some of the thing. I it like, it gave me comments in a direction that I kinda disagreed with. So I kind of explained, like, no. No.
Bryan Cantrill:I like, those are those would be fine comments, but here's why we're kinda asking for this. It's like, no. No. That makes sense, and it gave me different feedback that was I thought you it was good. I mean, it was it was I I don't think it was life changing feedback, but, it was it was also it wasn't hallucinating.
Bryan Cantrill:And and, I think it's like, it allowed me to write effectively better docs, or or get a doc that was tighter. I'm getting more eyes on it. Right? Yeah. The I also did I've this is I I guess I should have done this earlier.
Bryan Cantrill:Man, my my chat GPT 4 history is definitely wild. I feel like it's even weirder than my my, search history. At one point, you know, Adam, we've we, we talk about the an analog determining next to use in the oxide rack. Nick is in a network, a network interface card, And we, Robert, our colleague, Robert Bostocki wrote a terrific r f RFT called the 4 nicks of the apocalypse. Broadcom being war, of course.
Bryan Cantrill:I think Intel was was pastelance. Marvel I I can't remember what Mellanox was, and Chelsea, it was famine. But the at one point, I actually wanted, like, are the actually, what are the additional NIC vendors? And I did the thing that there are just not many companies that make enterprise grade next high speed next. I did the thing that that's and I was actually wondering if you do this at all, because in your but I a technique that Simon talked about is, like, the name 10 more.
Bryan Cantrill:And then you just keep saying, like, name 10 more, name 10 more, name 10 more.
Adam Leventhal:Oh, it must've started making did it start making up Nick Fitness.
Nicholas Carlini:Oh,
Bryan Cantrill:and then it gets it get so deep into weirdness. I mean, it's great. It's just like, at no point
Adam Leventhal:is it like, look, pal. There are not 10 more.
Bryan Cantrill:There are not 10 more. Like, that's not actually a reasonable request. You're like, okay. Here I go. And it, like, it's pulling things that are like, wow.
Bryan Cantrill:Yeah. That's a super obscure company that actually has been out of business for 5 years. And but, I don't know. Because do you do you use that technique?
Nicholas Carlini:No. You definitely this is the thing they all all the time of, like, you know, here's the thing. I do this. So I do this a lot for, writing paper titles now. Like, I don't know.
Nicholas Carlini:Papers is like, it's it's such a hard problem. Because like, on one hand, I wanna be descriptive. On the other hand, you know, I I want it to be something people are gonna wanna click on, but I don't want it to be clickbait. So, like, I'll just, like, give it, like, I'll give it the abstract of the paper I've written and, like, 4 or 5 titles and be, like, just give me 50 paper titles. And I'm not gonna, like, take any of them probably.
Nicholas Carlini:But, like, I just wanna, like, see, like, what's it gonna come up with? And, like, you know, yeah, the first couple are gonna be, like, almost exactly what I've written, but, like, slightly different. And at some point, it'll come up with something that, like, it's a little interesting. And then then I'll, like, have seen just, like, more ideas, and then I'll I'll close it. I'll go write the ones that I want and come up with a a thing that I want.
Nicholas Carlini:But yeah. No. This is, I guess, one of the cases where, I found this really useful for, like, a very obscure use case. But, like, it's it's helpful for for me. But I do this, yeah, all the time for these kinds of things where where you want it to give you an answer to a thing, but you know the first couple are gonna be just like boring and generic.
Nicholas Carlini:And so you just ask it for like, yeah, a large number of them. And by the by the 40th, it's like really going off in the wild. We can do some truly crazy things. But, yeah, if that's what you want, I think it's this is a great use case
Bryan Cantrill:for that. Yeah. Okay. So then, another question for you, of course. And I think you mentioned this even in the piece that you use the LOM to kinda write some of the apparatus for the blog entry.
Bryan Cantrill:Did you use the LOM at all as a part of writing blog entry? I don't.
Nicholas Carlini:Yeah. Two reasons why. 1 is I like so for writing code, oftentimes, I just want to be correct. And I don't care about the style of it, because I'm gonna get rid of the code pretty quickly anyway or whatever. But for writing, like, I want a particular tone and a particular voice to be consistent throughout.
Nicholas Carlini:And so because of that, I still want to write all of the words myself. But that is something I think like, models are not yet good enough at, but they could become better at. But, like, I I I'm sure that it'd be just fine in lots of contexts where you just want it to, like you just need text. But, like, for many of these things, like, I I care about how it sounds, especially for text that I'm gonna put online. Like, I I want I want people to be engaged with it, and I think I have a good idea of what people are going to find interesting and not get bored by.
Nicholas Carlini:And so I want to to do that, but at the same time, what I will do, yeah, is this question of here is what I just wrote.
Bryan Cantrill:Yeah. What do you think?
Nicholas Carlini:Tell me some things that, you know, a random person on the Internet, might be confused by, which is like you know, like so I I tend to write mostly on on security or on machine learning. And so I I will one of the things I will do when I'm trying to attract both people is I will copy and paste my thing and say, you are a, a regular computer security researcher. Tell me what machine learning words are confusing to you because I will just use things that, like, are obvious to me now because I forgot that I learned them. And I'll do the exact same question repeat the question but just say you are the other person what what things are confusing to you and like I find this can be very helpful. Occasionally I write things that, where I I've broken something and I'll ask, like, how's the tone of this?
Nicholas Carlini:Like, am I just, am I being nasty? Or like, is this like, okay?
Bryan Cantrill:Oh, interesting.
Nicholas Carlini:Because, like, it's completely devoid from, like, you know, all the my feelings. It will it will tell me, like, you know, yeah, like, that's it's probably taking it too far. Like, because I don't I don't wanna be mean, but, like, you know, it's it can be hard to, like, tell. It's like, you know, I I can do this before I go ask some friends to read it for me and just, like, get, like, an an objective. Like, it has no skin in the game here.
Nicholas Carlini:Like, it's sort of, like, it will tell me what what, if if the tone of this is is off.
Adam Leventhal:The security and the rest of this device.
Bryan Cantrill:It gives you advice that's terrible or do they disagree with, just disregard it. I mean, I think it's I mean, I
Nicholas Carlini:think Exactly. Right. You know?
Bryan Cantrill:As an editor is a really good use of it.
Nicholas Carlini:Yeah. Right. Like, occasionally, everything that I write, it always tells me, you mix text that is both too formal and, not and and too casual. And I'm like
Bryan Cantrill:Go to hell.
Nicholas Carlini:Yeah. I know. Like, the reason why is because in the introduction, I'm being casual and then I slip into, like, academic writing mode in the middle. And I'm like, this is not important enough to for me that I'm going to go and make the entire thing completely consistent. But, like, yeah, like, if you don't like it, you just you you ignore it.
Nicholas Carlini:But, like, you know, at least it tells me the thing and, like, you know, another example where I I did something that I knew was gonna be, like, and the model told me, you know, some people might not like this. And then inevitably, there was a comment online somewhere, someone complained about exactly that thing. Like, I knew that they were gonna not like it and, like, I don't care. Just like this is the way the thing is is I've I've made my decision. But, yeah, for lots of other things, it will, like, find find cases of things where where I I could have that things are unclear.
Nicholas Carlini:There's things like this. And, yeah, I find it useful for that. Not not as useful for because coding for me. But, you know, maybe this is also a fact of, like, I'm I I use it mostly for coding, so I know how to use it for that. Like, I think it'd be very interesting to ask someone who writes if they find tricks that work for them in particular.
Nicholas Carlini:Like, what is the equivalent of of boring writing for someone who who does something like this where they would actually find it interesting? I I don't know what that would be, But yeah.
Adam Leventhal:It's a great middle ground that you're describing just in terms of still having it be your own craftsmanship. Right? Your own your own tending of your garden while still getting that feedback. It it reminds me a little bit of, I don't know if you guys have been watching the Olympics at all, but, Google has this awful ad where it's saying, you know, Gemini is gonna help my daughter write a letter to her hero. And it just drives me bananas.
Bryan Cantrill:Oh, but I thought I I've seen all of the awful ads in the Olympics. I definitely watched the Olympics read this, like, nonstop ads, and I okay. And they are also nonstop AI. Stop. Nonstop.
Bryan Cantrill:Oh my god. The IBM AI ad is terrible. The Reuters AI ad is terrible. I love, like, they have these ads over talking about how AI could revolutionize their business, and they're using it's like they they literally should have used AI to generate the ad. It would have generated that ad because, like, you got people in the office that have got, like, this 3 ring binder that are looking at.
Bryan Cantrill:It's like, oh, I you know what? Look. Before we get to AI, how will you just bring the information age to your own? Like, actually, I've got some ideas. I've got some other ideas.
Bryan Cantrill:Oh, so okay. So this is a Gemini this is a Google Gemini ad. I think Google Ads are really pretty good, but this one is not. This is not.
Adam Leventhal:It is. I mean, it's sort of, like, well crafted, but the the premise of having, you know, your daughter, craft a letter to their hero using Gemini, I just find gross. I just find that gross.
Bryan Cantrill:That is the worst possible ad.
Adam Leventhal:Yeah. So there you go. It's a winner.
Bryan Cantrill:I also wanna like, now I'm really curious. I actually I really don't wanna I like I don't wanna know if I, like, wanna ask my kids who their heroes are. I'm gonna be I mean, I know, like, for the 17 year old, it's gonna be some UFC person I've never heard of for the but I'm a I'll have to ask the 12 year old who her who her hero is. But, yeah, that is that is not a very good use of that. It is which sucks because, you know, Nicholas, I mean, I think as you pointed out, like there are so many good uses for this stuff.
Bryan Cantrill:And I think the other thing that I would just add on that is because especially, you know, I think we can see the hype cycle really turning on this quickly, and we're gonna see, like a, we are going to see, companies disappear for sure. It looks a lot like the dot com cluster thing. And I think that, you know, when we're talking about because, you know, all of the things that you talked about, like, how many of those would you pay for a startup to build something for you on? At least for me, I look at those and I'm like, not many, honestly. Like, actually, what I'm paying for is I I'm gonna pay OpenAI, or I'm gonna pay for Mistral, or what have I'm gonna pay for, like, one of these models to use.
Bryan Cantrill:That that's definitely money well spent. But I like, beyond that, no thanks. Like, the the the power of this thing is that I can use it to build up the things. So, there's gonna be a bit of an extinction event here.
Nicholas Carlini:Yeah. No. I I yeah. It's hard for me to again, I'm not the person who has the most intelligent take on this, but this is yeah. Hard to believe that, like, the only way that things can go is up.
Nicholas Carlini:I think lots of people like to say that they're I mean, we've already seen this. So many people say, this thing is using AI and it turns out, nope, they just hired a bunch of people and told them to do the thing. And, like, it's just like it bothers me when they do this because all it means is the people who were who wanted to not like it just point out this is another case where the thing has, like, not actually delivered. And, you know, it just, like, tell me when you do the thing actually, give me the actual use cases. But the problem is like a lot of the use cases are just not yet as glamorous as they want.
Nicholas Carlini:And like, no one's gonna run an ad. Like, watch this person write code a little faster because it did this thing that was boring for them.
Bryan Cantrill:Oh, it's like did you say that that lovely Google ad where the little girl wanted her Python bytecode to be rewritten in terms of NIF assembly that was so touching. I mean, it was like
Nicholas Carlini:the And so, like, a bunch of what people try and talk about is, like, some aspirational, like, this is what the future might be or here's, like, a a fun thing that, like, might be, and then people just look at it and go like, but that's not what I actually want out of it. And it's like, it's very hard to to pitch some of these things. And, you know, in the same way, you know, yeah, same thing as like the internet, you know, 25 years ago or whatever. Like like I truly imagine that these things will become a lot better, but, sometimes it can be hard to really pitch the value.
Bryan Cantrill:And in the dotcom bust, you know, there were a lot of things that happened in the dotcom boom that made no economic economic sense that actually did make sense in the limit. And there were a lot of things that were, you know, the idea that you would, you know, have groceries delivered to you or have pet food delivered to you and seemed, you know, with webvan or pets.com, but now we do we do that today. And so the don't I think for people should not confuse the coming bust for the the the technologies themselves not being useful. Because I think as you pointed out, they are useful today. If you're not using them in your workflow, you should be.
Bryan Cantrill:And, because I loved your advice about you should be getting get over your skis a little bit. Ask it to do something that is beyond its capabilities so you know where the capabilities lie. I think that is very, very good advice. Well, Nicholas, this has been awesome. This is a great blog entry.
Bryan Cantrill:We love it, Gwen. Thank you so much for being willing to join us here, and it was it was a lot of fun to see that discussed yesterday. I gotta tell you, I really appreciated it because I was very off the grid. I was backpacking with the with the scouts last week on the John Muir Trail, so I actually like, god, what are we gonna talk about on Monday? And I'm like, oh, thank god.
Bryan Cantrill:Nick was sort of. Oh, god. Please. I hope I hope he's the way to come on Oxide and Friends. Oh, thank god he is.
Bryan Cantrill:Oh, thank god. So I really, really appreciate it. I will say, by the way, just on that note, I did ask, Chat GPT on the, ascent routes on a particular we we climbed Split Mountain, which is a 14 or here in cal in California. And, do not ask, these large language models, routes of us that I mean, it it gives you it very confidently gives you information that is absolutely wrong that if you were to not look at your eyes or not look at a map, you would actually potentially die. So, the, for the for those of you following along at home, the the south summit of Split Mountain is actually highly technical.
Bryan Cantrill:It is not a walk up, so the North Summit is the walk up. That's what I'd like to say. So LLMs, please interpret my speech and get it right. But, Nicholas, thank you very much.
Nicholas Carlini:Sure. Of course. No. It's, yeah. Great to talk about these kinds of things.
Nicholas Carlini:It's nice to try and get people to sort of understand some amount of middle ground on these things that, you know, they have utility, but, they're not perfect. But, you know, maybe maybe they'll be better in the future, but at least, like, try and and and you like, as programmers, we should try and use whatever tools we have. And, you know, there there was a time when people were like, you're using a garbage collected language. You're not managing your own memory. And I feel like you this is the same thing that you're gonna hear, like, there gonna be people who are gonna be, like, gonna be saying, like, you should never use a language model.
Nicholas Carlini:You'll only write code that, like, you have personally, you know, verified correct and, like, you know, have done, like, all of the things, like, by hand. And, you know, someone else is gonna be using the model, and they'll they'll be doing the things a lot faster. And, you know, in some cases, it's true that you want to do it by hand, but in other cases, you'll be good enough to just ask the model right before you. And I feel like this is, like, the next version of that kind of thing happening.
Bryan Cantrill:Totally. Totally. Well, thank you again for joining us. Adam, thank you as well. And, the good to be able to wedge a new plot into yet another episode.
Nicholas Carlini:Yep. Thanks. Always.
Bryan Cantrill:Alright. Thanks, everyone. Talk to you next time.