Oxide and Friends | Transcript: What's a bug? What's a debugger?

What's a bug? What's a debugger?

June 21, 2021 / 01:06:13/S1 E7

Speaker 1: 00:00

So while Brian's doing that and, Brian, I know you have this problem. And and I I hope the Twitter spaces folks are, like, hearing this, but I am like unable not to click when I see it a space happening. Like, I I've realized that this is actually a like a deficiency I have now.

Speaker 2: 00:17

So that you you always go like stay shopping. So you always go like.

Speaker 1: 00:21

Oh, yeah. Yeah. Yeah. No matter no matter how like like no matter how tedious the title, like I'll click. Like you don't even need a click baby title.

Speaker 1: 00:31

Like I I am

Speaker 2: 00:32

But you you get this, like, halfway state where you kinda, like, stick your head in the room and see who's there and then decide whether to stick around or not.

Speaker 1: 00:38

Oh, yeah. I mean, my my actually, my my other first of all, I do this typically while trying to attend to a 3 year old. So that doesn't go so good. And then second, and Brian, I know you have this problem too. I find myself ducking in and out.

Speaker 1: 00:54

Right? Like, I try to put it down. Like, that's it. I'm not interested. And then I sit down the phone.

Speaker 2: 00:59

Well, okay. So I also am, like, petrified. I wish that I mean, I obviously get a lot of requests for enhancement for them. But the I also feel I find that I become accidentally load bearing in spaces that I enter. Where and and then all of a sudden, like, I feel like I you know, actually, I kinda was just here for a second.

Speaker 2: 01:20

I really but now I feel like I kinda can't leave. I gotta, like, stick it out and, which I think has made me a little bit too selective on the on the spaces. Kinda I I should I if I had a 3 year old, of course, my, my existing options would be so poor that I would be going into every space I can find. I mean, that, I think, is a very natural toddler parenting technique. It's like, please Oh, you guys are

Speaker 1: 01:43

you guys are talking about licenses? Oh, yeah. Yeah. Yeah. This is

Speaker 3: 01:45

where I Oh,

Speaker 2: 01:46

this is wayward's sake. Oh, yeah. No. Go on. This is like I absolutely.

Speaker 2: 01:50

I if someone is reading Hacker News comments aloud to me, yes. I that's I'm I'm here for this. This is much better than a 3 year old.

Speaker 1: 01:58

Don't get me wrong. Some of my best friends are 3 year olds.

Speaker 2: 02:01

Oh, man. And you gotta, like, you've got a real, like, a real 3 year old. Like, you've got I I I I Adam's I mean, god love Joshua. But so ever first of all, you should know that whenever I'm on the phone with Adam talking about work or whatever, Joshua is usually berating you from the other room asking if you're talking to me. Like, do I

Speaker 4: 02:21

Who who are you talking to?

Speaker 1: 02:22

You talking to that guy again?

Speaker 2: 02:23

Yeah. Like, I get, like, hang up on him. You're like, hey. Listen, kid. Like, I know this is I know.

Speaker 2: 02:28

It's

Speaker 1: 02:28

I mean, the number I mean, fortunately, we're we're sort of emerging from this pandemic life. But, the number of times I've needed to apologize to people I'm interviewing, explaining that my naked 3 year old really needs to wave before we can move on with the interview.

Speaker 2: 02:42

And it's just easier to let the Wookie win on this one. Like, I just, like, look, if you could just let the naked 3 year old wave to you, please, it's gonna be faster. Faster than you'll turn it off. Well, as always, we wanna make sure we get, like, new people in here, new voices, what have you. So, just like, we always we kinda call on folks who we know are here that that that, we know from other spaces, but definitely don't hesitate to raise your hand and hop in here because we're we definitely don't have much of a set agenda.

Speaker 1: 03:12

And so, Brian, we said we'd start with this writing a Linux debugger. We're I you know, I I assume that you had plucked this off of the top page of Hacker News today or something, but but but I didn't see

Speaker 2: 03:23

it there. I saw this. I okay. So I I is this a trap? Do I have to, like, review where I came across this?

Speaker 2: 03:29

Because I didn't I didn't mean it to be Okay. So I like, look. I've been spending some time on lobsters. You know? Like, see that what's, you know, what's wrong?

Speaker 2: 03:38

I feel like lobsters, for whatever reason, feels like Hacker News feels like a gateway drug to lobsters. Feels like the much more distilled hardcore hacker news. Anyway yeah. You got yeah. The I I I saw it on Lobsters.

Speaker 1: 03:48

And I think I also saw it

Speaker 2: 03:50

on the on the Twitters. I feel like people were tweeting about it recently.

Speaker 1: 03:53

Yeah. But I I, you know, I've I've been on both of those places.

Speaker 2: 03:56

I don't

Speaker 1: 03:56

think there's any shame in Twitter or Lobsters. You know? We we've all been there.

Speaker 2: 03:59

We've all been there. So, yeah, I saw this. I I it's an older piece, but it was just making the rounds. And it's it's good. It's a good little series, for a bunch of reasons.

Speaker 2: 04:11

I mean, it starts out with ptrace, of course, because you have to, but I feel like it we can we can get in and out of that pretty quickly. Ptrace sucks. Next. Moving on.

Speaker 1: 04:20

Yeah. Yeah. We promised ourselves we wouldn't we this wouldn't just be the /procevangelism space again.

Speaker 2: 04:25

No. I think we already did that one. But That's right. Maybe maybe twice.

Speaker 1: 04:28

But that's okay. Not not a third time for sure. But I I don't know if you had this so you you wrote a debugger. I mean, arguably, multiple debuggers.

Speaker 2: 04:38

Multiple debuggers, I feel. Maybe, actually, I feel that I'm on at least my 3rd debugger. No. 4th. Yes.

Speaker 2: 04:43

Oh. I'm on my 4th debugger.

Speaker 1: 04:45

I wasn't even counting the one you were working on currently. Jeez.

Speaker 2: 04:48

Yeah. Copy, please.

Speaker 1: 04:48

But but was was was MDB, the the first debugger you worked on?

Speaker 2: 04:53

No. I I mean, honestly, I feel that thread 1 in school was the first debugger I ever worked. Oh, thread 1. Right. Right.

Speaker 2: 04:58

Right. And having to yeah, I I just, like I've spent my career trying to writing software to understand software. Yeah. Yeah. Yeah.

Speaker 2: 05:06

I can't get that out of my, like, I can't get that out of my marrow. I think that that's just like I I and I I don't know. I I think other people have got a better ability to deal with complexity in their heads. I just have to, like, understand what the software is actually doing. So I think I'm just, like

Speaker 1: 05:24

I I I don't think that's probably true, Brian. I mean, I think that, like, the the the complexity of like, I think it is very unlikely that people are actually holding these things in their head. I think it's much more likely I don't know that people are using this these tribal patterns or this kind of cargo cult debugging rather than, you know, getting getting to the root cause of these things or being satisfied until they get

Speaker 2: 05:45

to the root cause. Well, I do think that with software systems, it's really hard to know what they're actually doing. And, I mean, so I I do feel like my first exposure to this was actually as an undergrad. So I worked on this on a a debugger. In fact, my first debugger to, although, actually, you know, this is my second debugger.

Speaker 2: 06:01

You know my first debugger. The I I wrote something that I thought was extremely clever at the time called Sift that, over that plowed the Plitz So you could, the plates the procedural linkage table, and I I I I would have this thing that you would l b preload. It would overwrite the plates with its own indirection table so you could see all of the the, the the dynamic library calls you were making, which I thought was fun. And useful. But I so the thread was was I I built that as those part of my thesis project to understand this whole multilevel threading model where you have many user level threads on top of fewer kernel LWPs, lightweight processes.

Speaker 2: 06:45

You know, all these assertions being made, and I I no one had written any tooling to actually understand what the binding was between a thread and an LWP. And I don't understand how anyone could actually, like, build a system or make all the assertions they were making without having built that tooling. And indeed, like, not hugely surprisingly, when I built that tooling, it revealed. Of course, you turn on the the light, and it's like, yeah. This thing is not doing at all we what anyone thought it was doing.

Speaker 2: 07:15

So, anyway, that was I would say that was the that was the first debugger. And it I was having to do all sorts of just dirty things to get to I mean, dirty, dirty, dirty, dirty. Dirty dog. Dirty, dirty. I like I was and I was like dirty in a way that felt exhilarating as an undergraduate.

Speaker 2: 07:40

You know what I mean? Like, whenever Yeah. Yeah. Yeah. Yeah.

Speaker 2: 07:43

I've got a 16 year old now that has, like all 16 year olds, is has an impaired brain and is unable to make proper decisions. And I try to remember that, like, actually, I've got my own track record of of terrible decision making, and I feel a lot of it is back there at that that project. But I in particular, I, I can't even remember how I thought this was a good idea, but, I ended up mapping devkmn read, write, and effectively participating in the TNF locking process. I really like I can just kinda, like, plow. I which

Speaker 1: 08:19

Wow. Yeah. That's exciting.

Speaker 2: 08:21

Oh, it's very exciting and so stupid. And I can't even remember why I thought that that was the only way. But I

Speaker 3: 08:28

was so I

Speaker 2: 08:28

was using TNF, which is this trace normal format. Do you ever do you ever use TNF? I know we've had this conversation. We had this exact No. No.

Speaker 2: 08:34

No. No. I I I

Speaker 1: 08:35

feel like I only was in TNF to get directions to get away from TNF. I think I was only there literally as we were like retrofitting DTrace components like in into some of those existing, like, how like, hash defined, invocation?

Speaker 2: 08:52

TNF, which stands for trace normal form, it was a facility in the operating system that Adam's charmed life meant that he never actually had to use. But it was, it was really, really rocky. It was very hard to parse. It was also it was written and I think this is, like, part of the problem with debuggers. And I this is, like, everyone can, like, raise their hands and and and just just dogpile me where you disagree or whether I said it was overly provocative.

Speaker 2: 09:18

Debuggers are historically written by compiler folks and not systems folks. And not to be, like, over not not to create kind of an overly false dichotomy. Not to, like, turn us against one another into tribal warfare. But I do feel that the that debuggers as a result are designed to debug the problem that compiler folks have the most familiarity with, and that's a compiler.

Speaker 1: 09:49

I think that's absolutely true. Irrefutably true because those those are the problems they're most familiar with and probably the problems that they that they have and they're facing on a day to day basis. Yeah. And I think that, you know, I wandered into the middle or or, the middle of, like, your journey at, in Solaris through a bunch of these debugging phases, you know, of which, like, DTrace was 1, but before that, MBB and CTF and some of other pieces. But all of those really motivated by the unobservable problems that that you had seen.

Speaker 1: 10:18

I mean, you that you had experienced. Pardon me.

Speaker 2: 10:20

Right. Well, and I think that, like, it just the debuggers are just not they're they're designed for, like, reproducible problems, way too frequently. You know? I mean, and, like, I love this this the the blog series that we're kinda kicking this off with. I I think it's great.

Speaker 2: 10:38

Yeah. And I I mean, I really like it, but it is definitely designed around in situ break point debugging. And I I just view in situ break point debugging as kinda like one sliver of debugging that's useful for one particular and somewhat unusual class of bugs. That's actually not the kind of debugger that I wanna use most of the time.

Speaker 1: 11:02

Well, and in particular that interactivity. And I think that in his like, you know, where there's a human in the loop on every decision point. And I think in the in the last section, the advanced topics. So I think starts to allude to the scriptability or automation within some of this debugging. And I think that that's where that I don't know.

Speaker 1: 11:22

That's where things get really interesting. Like trust. When you're running trust using break points to examine user land process. You're still able to, like, do that do those breakpoints programmatically where things are happening in multiple threads without the human

Speaker 2: 11:49

stop the system. And as soon as you stop the system, there are certain problems you can no longer reason about. Yeah. But alright. So what what was your what was your first debugger?

Speaker 2: 12:00

As long as we're talking about first debugger so we can

Speaker 1: 12:03

Yeah. I mean, I think I think it really was the work in in MDB. I think that when I You

Speaker 2: 12:10

know, I'm trying to give you the libdis alley oop.

Speaker 1: 12:14

No. No. No. No. No.

Speaker 1: 12:14

Oh, well, you know, okay.

Speaker 2: 12:16

That's not a debugger? Wait. Okay. Okay. No.

Speaker 2: 12:17

Let's just see what is the debugger then. Honestly, what is the bug?

Speaker 1: 12:19

No. I'll take it now. I'll take it now. I'll take it now.

Speaker 2: 12:20

You know, I I

Speaker 1: 12:21

think I have complicated feelings about libdis. So libdis was a, and and here's the alley oop back. You know, Brian's great idea that was my internal project I

Speaker 2: 12:31

wanna really spend a bit.

Speaker 1: 12:32

In in in 2000. So, you know, a long time ago. But the the the concept was, we've got all this program text laying

Speaker 2: 12:44

Are you there? I muted you. I actually hit the wrong button. Goddamn it, Twitter spaces. I tried to scroll down and I hit mute everyone.

Speaker 5: 12:53

Oh, that's good. Okay.

Speaker 1: 12:54

So so lib libdesk, was the the idea was rather than just taking the binary, the bits associated with instructions and dumping them out is ask you for humans to understand, rather interpret them in some structural form, so that you've got like these, these, like the components that you can manipulate, and then try to infer different things about the program. So for example, watch where values flow through registers and are transferred into in and out of memory and passed to different functions to be able to do stuff like like say, where did this value come from? What did it used to be? And and not rely on, like, the compiler, not relying necessarily on the compiler leaving around those, tidbits in dwarf or in other places, but rather to be able to infer that just from, like, what you saw in the program text. Which is what

Speaker 2: 13:44

we see. Like, it's like Ghidra. Have you played with Ghidra yet, by the way, Adam?

Speaker 1: 13:48

No. No. I thought Laura was here.

Speaker 2: 13:50

I was trying to scroll down because Laura was here, and then that's when I like, literally, Twitter spaces, the button I need to click on the additional people that are here is is it is underneath the mute everyone button in some, like, act of total cruelty. I cannot see who else is here, but the Laura was here and has Ghidra a bunch and and used that to, ultimately, to really aid in this vulnerability that we found. The l b c d five vulnerability. But I feel like that's this is like a proto Ghidra coming it's like the r a community, the reverse engineering community is in a lot of really interesting stuff that I think we should be using in debugging a lot

Speaker 1: 14:29

more. Oh, yeah. I'm I'm I'm struggling to remember, but there was this reverse engineering tool that, that had to do with, like, memory analysis that, a colleague of mine at Delphix, you know, used for debugging purposes and and come submitted to their conference. But it it really had not been used in that way before. But I agree that there's a ton of crossover in particular when it comes to this, you know, these, these hacking tools for applying them for understanding complex pathologies.

Speaker 2: 14:58

Okay. So let me ask this. I actually I invented this question earnestly. What is a debugger? Because I realized that my because I feel Beatrice is a debugger, but I don't I maybe I'm the only one.

Speaker 2: 15:09

Do you you you do you do it that way?

Speaker 6: 15:10

Teachers is absolutely

Speaker 2: 15:11

Okay. Good. Alright. So at least 2 of us do it that way. But I don't think, like, most people view it that way.

Speaker 1: 15:17

Well, then what is debugging?

Speaker 6: 15:18

No. That's what I mean.

Speaker 2: 15:19

But this is what I mean. Like, I feel like it is a debugger. I I feel like we like, it's a kind of a regrettable term, actually.

Speaker 1: 15:26

That's interesting. You're right. Because it because it it does connote a a a certain activity which is like the the software engineer building the code and and trying to understand, in some ways, de minimis problems as as they do that. Right. Where whereas but then it's it it it's kind of a very natural stepping stone to go from that to more complex issues to more complex issues and more complex environments and so forth.

Speaker 2: 15:52

But I I just I feel that, like, you want some I I wish we had a term that was aiding us in the understanding of what software is doing. As I mean, I I agree that's, like, complicated. That's too many words. I mean, the the debugger is much shorter term. But I I feel like as I'm thinking about it, I think that's kinda, like, part of the problem.

Speaker 1: 16:10

You're right. And it's not necessarily the moth flapping its wings on the transistor on on like the relay or whatever. It it because it doesn't it it implies a problem when there may not be a problem. It it may just be I want to understand how the system's operating independent of whether it's it's doing it badly.

Speaker 2: 16:27

That that's right.

Speaker 4: 16:28

Is that the joke about introspection?

Speaker 2: 16:30

Right. Yeah. I mean, like, yeah. Introspection or or or, like, you know, it's it is, you know, what is a a CT and an MRI and a PET scanner are all what? Those are

Speaker 1: 16:44

all Diagnostic tools?

Speaker 2: 16:46

There you go. Yeah. I feel like we we so it's a yeah. I just I don't know. I feel that, like, the term is gonna hang up.

Speaker 1: 16:55

But we can't really take observability because it means something a little I mean, I don't know. It's nice and it's close enough, but, those folks have ever really owned it.

Speaker 2: 17:03

Yeah. You're kinda trolling the hell out of me on this one on observability. I want just because I feel that, like, if someone was using observability to talk about software before I was, I don't know who it was. You know? Like, I don't know where I got that.

Speaker 2: 17:16

And and the and I'm not no. I'm not trying to be, like, self organizing about it. But the the people go to observability, and then they go to the Wikipedia article for observability, which is a control theory article. And so they talk about observability and it's which is a mathematical property, and that's not what we're talking about. This is not a mathematical property.

Speaker 2: 17:37

Observability is our ability to see software as far as I'm concerned. So I mean, to me, like, is you know, PS is a diagnostic tool.

Speaker 1: 17:50

Yeah. Is that a bugger? Yeah. I mean, I I I think you've yeah. I I agree it's a diagnostic tool.

Speaker 1: 17:57

It's definitely stretches my mental model for what is debugging. But you're right. Like, in in part of the debugging endeavor, you're running BS. You're running Petri. Like, you're you're like, all of these things let you see what the system is doing.

Speaker 2: 18:08

Well, and I swear, I mean, with the with the debugger that I'm currently writing, the which is for our embedded all Rust system Hubris, appropriately enough, because we're doing a de novo operating system. I am writing the the the debugger appropriately not enough humility. And just like the ability to get a task list out of the system has been really valuable. That's a very, like you you can debug many problems by getting an annotated list of tasks. And like with MDD being able to do a colon colon PS was super valuable.

Speaker 2: 18:35

It's super valuable.

Speaker 1: 18:36

But, you know, the, the analogy with MRI or CT or whatever, I feel like it's really appropriate because it must have been in the in the nascent days of those technologies that you could find all kinds of pathologies and ideologies that were just not observable before. I mean, in that case, literally. But like, but but each one of these new tools, I remember, you know, a formative moment in my career. I was probably 22, 23 using DTrace with a Sun customer on their application. And just, you know, I didn't even understand what I was looking at, but it was so valuable to them.

Speaker 1: 19:16

They're just their jaws were on the floor.

Speaker 2: 19:19

Were were you at were were we together at Walmart? Do a demo. We're doing a demo, and we are, like, do writing on their system, which is great. Right? As you're pointing out, it's always so much fun to, like, look at someone else's app because you're like, I don't know what I'm looking at.

Speaker 2: 19:35

Like, yeah. I don't know. And they're like, oh my god. We've never seen this before. Like, that calls this other thing?

Speaker 2: 19:39

Like, that's it. And the the this so I was doing it on their live system, and they were guiding me a little bit about, like, you know, how to aggregate, where strings were hiding out. And we got to the point where all I remember is, like, one of the the result of this aggregation that we had was, like, they were, like, departments in a department store. So they were, like, lawnmowers, like, ladies' lingerie, 15. You know?

Speaker 2: 20:03

Like, raincoats, a 142. I was like, wow. This seems really cool. And the one of the guys at the back there was like, there's a bug in Detrace because, that that output is wrong. And I'm like, like, I don't have like, men's raincoats is not, like, output that we generate, by the way.

Speaker 1: 20:21

Like, that's not If you use strings if you use strings that phrase binary, you're not gonna find any mesh

Speaker 2: 20:25

of rainbows. There'll be no mesh of rainbows. He's like, well, it's impossible because, like, those two systems don't talk to one another, and there's no way that system can be I think we're aggregating by IP address in that that one, he's like, those systems are talking to one another. So your thing is generating the wrong IP address. Like, that feels unlikely.

Speaker 2: 20:40

I mean, I'm not gonna say it's impossible that it's a detrace bug. And then you could see his wheels just, like, grinding for, like, 5 minutes in the back. And, buddy, he's like, actually, I know what's going on. That's actually a really serious issue that we need to understand. He's like, but I think I know what's going on, and those systems should not be talking to one another.

Speaker 2: 20:55

So it was, like, one of these things where it's just like and, you know, I had a really, interesting conversation with Cliff Moon. I don't know if you remember him, but he had this company Boundary

Speaker 1: 21:04

Yeah.

Speaker 2: 21:04

That was doing a bunch of network observability. And the and it also is, like, the super basic observability that we are really, that is still too uncommon about, like, just what's talking to what. Because I think you can learn so much. I'm curious to know to what degree people use those kind of tools on a regular basis because it seems extremely valuable to be able to determine what's talking to what because you can see, like, wait a minute. Like, we this database rollout that we thought is happening is not happening or it's not being phased.

Speaker 2: 21:34

It's happening too quickly or it's happening as we thought it would, which is very reassuring, you know. You know,

Speaker 1: 21:40

one of the interesting lessons that occurs to me is that as as the debugging tools get more sophisticated and customizable and require more,

Speaker 2: 21:48

I don't

Speaker 1: 21:48

know, sort of program ability or intervention. It also opens the door to to really drawing the wrong inferences or to getting, you know, to to think you're seeing something but actually having measured completely the wrong thing. And certainly, I mean, I've seen this, you know, as recently as yesterday with Dtrace where I was working with someone who, had written a script. They thought they were looking at one thing, but in fact, we're looking at another. It also calls to mind, like, bad debuggers I've worked with in the past, which have violated what I think of as the cardinal rule for debuggers, which is don't kill the patient.

Speaker 2: 22:24

Oh, okay. Yeah. Yes. Don't kill the patient. That is actually the I okay.

Speaker 2: 22:27

Yes. I thought you're gonna say the cardinal rule was to not lie. But, actually, there is a rule that's more important than not lying, which is not killing

Speaker 5: 22:33

the patient.

Speaker 2: 22:33

Like, actually That's right.

Speaker 1: 22:34

If you

Speaker 2: 22:35

have to choose between killing the patient and lying, maybe you should lie. But the, hopefully yeah. Don't kill did he kill the patient? No. What killed Oh,

Speaker 1: 22:42

well, this was years ago. I mean, for I was using the debugger that I'm not gonna mention because it I I feel like I'm sure it has evolved since then because this is kind of the early days of Go. But I had some process that was spinning out of control. I had no idea what it was. Oh, man.

Speaker 1: 22:56

And and then at the time, you know, it's one of these these batch jobs I was running, that was, like, in hour hour 7 of 23 or something like that. And Oh, no. You know, someone was, like, use this debugger, and I poke it at it immediately. No all the walls started crashing down.

Speaker 2: 23:16

Oh, no.

Speaker 1: 23:16

You know what

Speaker 2: 23:17

that

Speaker 1: 23:17

like it didn't just pull down itself right it it pulled down everyone else with it.

Speaker 2: 23:21

So That is bad. What happened? Do you know? I mean, unfortunately, like, when something like that happens, you're like, I don't even want her to bucket because, like, I just they're never doing that again.

Speaker 1: 23:29

Honestly, that's where I was. I was like, you know what? Like, fuck this thing. Like, I I'm never running this tool again, and and I may never write go again.

Speaker 2: 23:38

Well, okay. So not killing the patient. This is actually really a really important thing because I think that the this is something that it was always an ur principle for us. Yeah. And you don't have you don't get any free kills of the patient in production.

Speaker 2: 23:54

Like, if if if you kill the patient, no one is ever gonna run that again. And it actually doesn't matter how much and you and it's just like you're saying. It's, like, I'm sure it's improved since then. It's like, well, maybe it has, maybe it hasn't. But you're not gonna know because you're not gonna run it again ever.

Speaker 1: 24:07

That's right. It's improved for someone else because it hasn't improved for me.

Speaker 2: 24:10

Right? Yeah. It's like, I'm not gonna let it kill I I I can't do that. I can't let it and not killing the patient can actually be harder than it sounds when you are not trying you're trying to do production debugging, not in situ debugging in in development. I mean, it actually it sounds obvious, but it's not like it's electing to kill the patient.

Speaker 2: 24:30

It is it is controlling the process. And then and then presumably, it's the debugger died, I would assume. Yeah. The debugger tossed while its target was in some, like, either nonrunning state or it died in such a way that it brought it target on with it.

Speaker 1: 24:48

I mean, on what I it's they're trivial examples of this. Right? Even in this in this great, in this great blog post they were talking about, you're talking about how you write break points. And you write break points by scribbling over some program text with the hex CC c or or a break point instruction or an illegal instruction, whatever. And remembering what instruction was supposed to be there.

Speaker 1: 25:06

Well, now if you, the debugger die, then along with your death goes the knowledge of what those instructions were supposed to be. Yes. And and it may even be that I've set no break points. But the instruction but the but the, debugger is interested in knowing when dynamic libraries are loaded or when you fork a process or when system calls happen. And so, once the debarger is dead, if any of those turds are left around, then, like, I I've got a time bomb of the process.

Speaker 2: 25:35

That's right. And you've got a sig break point. You've got a signal that you don't see very frequently because you're not supposed to see it, which is that you hit a break point when you've got no process manipulating you. Yeah. So the kernel kills you.

Speaker 2: 25:46

And you're like, what the hell just happened? Yeah. Which I mean, that I mean, it's just one of these it's super easy to kill the patient is the problem. The problem is when we're debugging the patient, we are actually taking the patient in and out of death all the time. Well,

Speaker 1: 25:58

it does I mean, in in some ways, the paradigm of of debuggers, you know, from ptrace and then their spouse and then carried in lots of other places, is a little bit busted. Right? Like, the the thing like, the the notion that the debugger is now become load bearing in the execution of the program Yeah. Is a pretty grave responsibility. And and obviously with like with DTrace, I mean, both because we had to and because it was the right thing the right way to build it.

Speaker 1: 26:25

You know, stuff like the original instructions associated with particular address in a program live in the kernel. But there's no reason why there couldn't be a breakpoint facilities or debugging facilities that are, you know, built into the kernel or sort of glued onto the side of, of the process. So it it was its own fault boundary.

Speaker 2: 26:43

Well, so, you're highlighting another challenge, I think, for debuggers that I think the the size piece also, highlights, which is that part of the challenge I'm I explained the things I asked myself. You must ask yourself this as well. It's like, why is our debugging technology not better than it is? Or or why is the the the better technology that we have not more widespread than it is? And I think I mean, to make this very concrete, it is a tragedy of our domain that we do not debug postmortem routinely.

Speaker 2: 27:17

The fact that would that go ahead. Fully vaccinated, Nate.

Speaker 3: 27:23

Yeah. I didn't mean to interrupt, but this is I I kind of forked off on this this thought a little bit earlier when you were talking about, about something else, and then you've kind of come back around to the same thing. So it's actually pretty good timing. But, my thought was how you're talking about the early debuggers and things that you worked on many years ago. And those of us that are old enough, all computing was single threaded on a single machine.

Speaker 3: 27:46

Now most computing is not. Distributed systems and debugging them is obviously a whole different ballgame. But exactly what you're talking about, how debugging technology has not kept up with that, that's these are these are always viewed as, like, transient problems. They're horrible to try and reproduce the conditions for after it's been observed. You know, there's there's telemetry for tracing, the, you know, the path of a a call through a distributed system, and that helps a lot.

Speaker 3: 28:15

But it can still be really, really it's just really, really complicated and really difficult to set that up. But debuggers and introspection tools in general are, are automating exactly that. And we just haven't kept pace with with the current way that we do architecture with those tools. And I'm wondering if anybody has any experience with things like that in a in a really complicated environment. And the the thing that started me on that path was thinking about you said, you know, cardinal rule being debuggers should not kill the patient.

Speaker 3: 28:47

Well, what if interfering with it does kill the patient? Because it's waiting for a heartbeat or or interaction with some other system. Yep.

Speaker 2: 28:55

Absolutely.

Speaker 3: 28:56

And and those are fiendishly difficult to to work around when you're trying to observe it. And and it may be, you know, very Heisenberg in the way that when you go to observe it, it literally avoids the problem.

Speaker 2: 29:10

Absolutely. Which is part of the reason that I have always I I wish we spent more of our caloric budget understanding the carcasses of dead programs. Because a dead program what what's if we the program has died. It it has it has panicked in the in the rush sense or in in in the go sense or it has an exception. It's like the the the the program has incontrovertibly encountered a a programmer error.

Speaker 2: 29:38

We throw away that state writ large.

Speaker 3: 29:42

We do. And and I think part of the reason is that at some point, those things are are driven by, you know, business metrics, and at some level, all problems are transient. You know, all problems are ephemeral at some level.

Speaker 6: 29:57

Right?

Speaker 3: 29:58

I this this computer architecture is gonna go away. This program is gonna go away. This platform is gonna go away one day.

Speaker 4: 30:05

Yeah. I don't know. You got credit card processing code that's been running since the sixties.

Speaker 2: 30:10

Yeah. And I I also get it, like can't you, like I mean, can't you just, like, go to the existential crisis way of explaining away anything? Can't you be, like, well, yeah. It's true. It's like the plane crashed, but our lives are meaningless.

Speaker 2: 30:21

I mean, we're

Speaker 1: 30:23

That's right. On a on a live enough timescale, all of our survival goes to 0. Right?

Speaker 6: 30:28

Yeah.

Speaker 3: 30:28

But I I guess what I'm saying is when when you go to ask for money to and time, you know, which is

Speaker 2: 30:34

Yeah. We no. This is a very good point in terms of what the it is very and I think that people engineers do not feel empowered to, build or buy or invest in the tooling needed to debug future problems. And because they feel that they can't justify it.

Speaker 3: 30:53

Well Right. And it's and it's actually really sad to us to think that there are problems that we literally that literally may just be practically impossible for us to ever solve because the biz the world will have moved on, you know, before it's justifiable to solve that particular problem.

Speaker 2: 31:08

I hear Dan again. The idea. Go for it.

Speaker 4: 31:11

We also have to kinda define what exactly you mean by a bug. Right? Because, I mean, you know, kind of an off by one error where you go off the end of array and see or something like that. Like that that in some sense is kinda trivial. If you have a stack trace, you can usually just kinda figure out what's going on and fix it.

Speaker 4: 31:30

But, you know, if if your bug is, gee, my program doesn't run fast enough, then that's a whole different can of worms. And, you know, to my mind, I think a lot of that sort of, you know, the former category of bug, these things are better addressed through aggressive testing and better engineering practices that we've just like completely, you know, not developed from scratch in the last 20 years. But, you know, definitely the industry has changed from when I was a young pup, you know, running around on vaxxes and things like that. And, you know, now, like, that stuff is done as a matter of course.

Speaker 2: 32:05

Yeah. That's a good point. Yeah.

Speaker 4: 32:07

So I mean, so a lot of bugs just don't even make a dent. But, like, the sort of visibility tools that you're talking about, like DTrace, to me, these are much more useful for addressing that latter category of bugs. It's like, my thing is slow or there's some random slowdown. What the hell is going on in the system? Give me some visibility into saw in, you know, into the kernel to let me know what's happening so I can try to figure out where the performance is going.

Speaker 3: 32:30

I I think there's always a difference between performance and unexpected behavior, though. I that's where I would define a bug as unexpected and undesirable behavior.

Speaker 4: 32:38

Oh, absolutely. Absolutely. Yeah. Perform always just a good example.

Speaker 2: 32:42

Yeah. Perform You know, it's just

Speaker 3: 32:44

an example. Make it make it work and then and then make it fast and then make it beautiful. Right? It's, so you kind of expect that we can have work by default in a working state, and it might not be as fast as we like.

Speaker 4: 32:56

Yeah. Performance was just an example. I mean, that's, you know, there there are any number of such things where it's like, what the hell is going on in the system? It's not crashing. Right?

Speaker 4: 33:05

It's not like there's a core dump sitting there that I can go poke at. But the system is behaving in a way that I didn't expect it to, and I want to know why. And tooling around that is really valuable.

Speaker 3: 33:15

Interesting. But tooling, like

Speaker 2: 33:17

Yeah. The what I I was just saying you're off by one error though. I which on the one hand, yeah, there are there's certainly a class of errors on which stack back trace can be, be, like, can be enough to to understand what's going on. There are many other class of errors where the stack back trace is ends up being symptomatic of what's a deeper problem, that you're actually off by 1 for deeper reasons than merely in other words, you you you can address the symptom quickly. But to understand that root cause, you actually need more of the surrounding state when the when the program failed.

Speaker 2: 33:51

I also think you're making a very important point in in an important one about all of the because it's true that CICD, I mean, the whole idea, all of our preproduction work has gotten way better than than it was when we were all pups. And I think that that's a that's compensation for the fact that we can't understand these systems when we deploy them. I mean, I think that which is I mean, it's good. It's like, it's the only way we've in other words, that is the only way we've been able to build systems that just work at all ever.

Speaker 1: 34:19

That's interesting to draw that I I I mean, look, I I hear what

Speaker 4: 34:23

you're saying, but honestly, like, I I'm not

Speaker 2: 34:38

seeing this definitely not a substitution for unit tests. And it's not a substitution for CICD, which are great developments. But I I it's more that the I think that part of the reason that that development has been so rich and productive is because it's been the only way of assuring that we don't introduce new feedbacks into production. Yeah. I see what you're saying.

Speaker 6: 34:56

It it has been

Speaker 1: 34:57

it has been a very significant change in the last 10, 15 years. Totally agreed. And and I think that's a great point, Brian, that that the absence of understanding has driven that need or accelerated that need for doing early integration unit testing in in a more, comprehensive way. And it is a disappointing gap that we haven't seen tooling come along with it and, tooling to understand some of these pathologies that we expect to see in environments. Both general tooling and specific tooling for understanding some of these more specific pathologies.

Speaker 2: 35:33

Have I somehow talked myself into a position where a lack of debugging has actually helped advance civilization? I think I I I'm very concerned with what I

Speaker 4: 35:41

But I I I'm gonna I'm gonna go back to my question. What is a bug? Oh. Right?

Speaker 2: 35:45

Oh, totally. I mean, I

Speaker 3: 35:46

would think it'd be

Speaker 2: 35:47

the you know, it is a it it is undesirable behavior. I think part of the problem is part of my problem with the nomenclature of a debugger is you don't necessarily know the undesirable behavior in your system if you can't observe it, if you can't see it. If you can't look at it, you don't actually know what's wrong with it necessarily.

Speaker 1: 36:06

Yeah. I think that's right. It's it's any it's any unintentional behavior, but it can have a whole class of it can have many different outcomes. 1 of the some one of those outcomes is no effect at all. It may be totally unobservable with with no consequence to the program.

Speaker 1: 36:20

Other ones might be crashes or performance problems or correctness problems, all all kinds of things.

Speaker 4: 36:26

If a tree falls in the woods, if there are a computer that crashes.

Speaker 3: 36:33

I was gonna say this is where Brian's previous field work example of showing somebody what their system is doing that they weren't aware of. Is it's not debugging anymore. It's introspection, but it's no less valuable. It's it's because we're the like you said, the way we develop now is very different from 15 years ago, where you're talking about controlling your process, your unit tests. Well, the code that you write is like 5%, maybe, of the code you ship.

Speaker 3: 36:59

You know, people pulling in a dependency chain from NPM, I'd like to know what that means just with my skeleton program before I start doing anything crazy.

Speaker 4: 37:10

And in

Speaker 3: 37:11

interest have some some idea of what that looks like and then actually do a comparative study of, well, what if I use this dependency instead of this one? Oh, my god. That's a 100 times better. Literally, a 100. And that that happens all the time.

Speaker 4: 37:25

Absolutely. Absolutely. And an interesting thing about that is that with the prevalence of tests, those become interesting examples of how you would use a third party library. Like they have, you know, pedagogical value beyond just like asserting that some modicum of correctness happens in the tightly controlled unit testing environment.

Speaker 3: 37:47

Sure. But then there's I mean, like you said, there's unit testing, and then there's integration testing, and then there's like real world testing with with live conditions, and those are completely different things.

Speaker 1: 37:57

Absolutely. But what I I

Speaker 4: 37:58

I think what I'm trying to drive at is that the move to testing has in fact obviated some of the need for what we would consider to be traditional debuggers.

Speaker 2: 38:06

Oh. Oh, wow.

Speaker 1: 38:07

Oh, yeah. Oh, yeah.

Speaker 4: 38:08

I I'm going there. The gauntlet is thrown, Brian.

Speaker 2: 38:10

It is.

Speaker 4: 38:11

Feel free to hear anything.

Speaker 2: 38:12

Do I have, like, I need to find an emoji or they can use over here. I need the Twitter space because I can, like, can I is there a mallet emoji that I can

Speaker 4: 38:18

Yeah? I Brian Brian's in the process of firing me at the moment.

Speaker 2: 38:23

Exactly. When Jess has it hooked up to an API, so you'd be amazed how fast it is. It's really, no. I mean, it's interesting. I mean, I I don't I definitely don't agree with it, because I feel that I've just discovered too many pathologies.

Speaker 2: 38:39

I think think it's too easy to say that when you can't turn on the light of that that system that's deployed. You actually honestly don't know the problems that you don't know about. So what you are finding is kind of one very important class of problems, but you are are then leaving totally dark, another extremely important class of problems that are the ones that are emerging systems when they're more mature, when they're deployed in production, and when they're doing the most damage. I mean, I, I mean, Adam, I don't think we can get out of here without mentioning AAD bug. The Oh, yeah.

Speaker 2: 39:14

The automated so there's a there's a conference. Was a conference. RIP, AAD bug. The automated and algorithmic debugging conference, AAD bug. And Adam and I were extremely excited to go to this conference.

Speaker 2: 39:27

And they only had it every couple of years, which I thought I

Speaker 1: 39:31

was think I was thinking of it today because of Hopple and it's like Oh, yeah.

Speaker 2: 39:34

Oh, yeah. Yeah. Yeah. Yeah. Totally.

Speaker 1: 39:36

And it's cicada like frequency.

Speaker 2: 39:39

Okay. So I I got a debug. So hopple has this cicada I love that analogy. It has a cicada like frequency because hopple is such history programming languages. It is such an important endeavor for humanity.

Speaker 2: 39:52

If we have it too frequently, we will spoil ourselves. I mean, it is that that's and I felt this was the same way with AAD bug. It's like the Olympiad. You can't have the Olympics every year. It doesn't become special.

Speaker 2: 40:02

We need to have it every and what we didn't realize is, like, no. This this this poor hapless academic community is just being pooped on by everybody. They can't find a venue every year. They can't even, like they couldn't get it together every year. And so Adam and I went to AAD bug expecting to find, like, just did this glorious paragon of academic virtue, and we found a very strange room.

Speaker 2: 40:30

Adam, you wanna describe what we found at ADMOC?

Speaker 1: 40:33

Well, the the the thing that I remember most starkly is there being this this sort of like test suite of excellence when it comes to automated debugging, program debugging. And it was, it was some kind of like pile of C programs And then there would be a lot of, you know, slapping each other on the back on that. Like and and really, like, contrived, you know, focused on the the simplest of simple bugs, and and debugging them on automated fashion, which

Speaker 3: 41:13

I I

Speaker 1: 41:13

don't know. I don't deny the the the right to pursue that, but I questioned the value of that.

Speaker 2: 41:19

Hey. Don't forget that.

Speaker 4: 41:19

That's kinda that's kinda what I was driving at though. Is that addressing those sorts of bugs has become uninteresting.

Speaker 2: 41:26

It is absolutely uninteresting. And it was and it was uninteresting at the time. And the it it was really unfortunate. And then half the room were prologue people. Don't forget that.

Speaker 2: 41:35

We had the, like Oh, yeah. The the Arden, the the Prolog. And we were spending a lot of interesting tooling on the Prolog. But it was a very it was clear that it was a hapless community that did not feel valued in the the broader academia. So debugging is not something that is viewed as academically interesting.

Speaker 1: 41:57

But and and but, Dan, the problems that you're talking about, you know, no longer being problem or now being easier, no longer being problems, they were sort of always easy problems. Well, but they but but but they weren't. Right?

Speaker 4: 42:08

I mean, so I had a I had a bug in a very large list program one time that was only detectable at run time because somebody tried to add, like, a number to a string. Right? And in a strongly statically typed language like, say, Rust, that's a a compile time error. Right? And then a weakly typed language like c, like, you get a pointer.

Speaker 4: 42:29

Right? So, like, the sort of better tooling and better, you know, like languages and better practices have led to entire categories of bugs just disappearing from our landscape. And and that's the interesting thing. And those those are the things that used to be like, oh, shit. My program, you know, dump core, gotta fire up GDB and, you know, figure out what's going on.

Speaker 4: 42:50

It's like, now you don't do that anymore. But in many cases, you don't have

Speaker 2: 42:54

to do that anymore.

Speaker 4: 42:56

You know? And that leaves the more interesting landscape of these pathologies that you guys are talking about as being like that's the really core interesting domain of, you know, these visibility tools. I don't want to call them debuggers, because I I mean, I feel like that sort of, you know, has a connotation which isn't completely accurate.

Speaker 1: 43:15

Oh, oh, okay. Well, so on that topic of observability tools that are not necessarily traditional debuggers, Brian, I'm gonna I'm gonna lob a a softball up to you to tell to tell why we were at a a debug and and your paper in

Speaker 2: 43:26

that conference. Well, yeah. So I I was very excited to because I have always believed that we've got that a dead process has a lot to teach us. And, in particular, that when we have a, when the kernel dies, we have this, and especially if the kernel dies with with memory corruption, we often give up on it. So when when we have when you see and, Dan, you've got, you know, your off by one example is fine if if the if the the, entity that was writing off by one of that array, if that's the one that induced death, that's easy to to diagnose.

Speaker 2: 44:02

If that did not induce death and it actually some other thread died when its array was plowed with that off by one error, that's exceedingly difficult. And one of the things that that we observed was that, you can often you'd be looking at a a memory buffer that had come out of of a of a pool of memory out of a out of a a KMM cache in this case. And you could see the buffer in front of it that plowed it. And so the question is, who has this thing? Who has the buffer that is that happens to be next to mine?

Speaker 2: 44:33

Who is my neighbor in memory? Because my neighbor just burned down my house, basically. And the and we would do all sorts of, like, just dirty stuff to be like, you know, we had a I we still have still use a, something that just iterates over entire dump looking for where this pointer might be. Who who has this address in memory and what is it? And what I I I need to pause you there just because

Speaker 1: 44:59

this because it's so crazy, and I just wanna emphasize that he means what he's saying. We look for the 64 bit value and see where we find it. Okay. This is like this is a game of bingo across the entire address space. Okay.

Speaker 2: 45:14

Okay. So you because you feel that that idea is so so knuckleheaded that people would feel that their that their understanding was incorrect because it it can't be that knuckleheaded.

Speaker 1: 45:24

And I agree. Like, I've used it, like, a ton of times to, like, save my ass.

Speaker 2: 45:29

Right. It's quote unquote k graph. It's very useful. Yeah. The but so and that was that would be useful.

Speaker 2: 45:35

And so what we observed is, like, actually, we can know a lot about what the pointer graph is, and I love to do this for Rust, by the way, because we do so much on Rust with this. But we can actually know it the the types of things, and we can propagate those types through the system. So we can start out without with our the the the things that are in our our modules. We know what those types are. We can follow those pointers and propagate types, and we and then you can actually have a chance of determining what is this thing in memory, which was super useful.

Speaker 2: 46:08

It also got would be had to do very dirty things to work around c. And in particular, you hit a union, and you've got no idea what it is. This is why I love Rust because when you hit an object in memory and it's an algebraic type, you know the the the discriminant is actually the part of the dwarf info for the object. You know what object it is you're looking at. So I mean, we could I mean, it's kind of tragic that we can do all sorts of things from a dump with Rust, and we probably do need it less because we are gonna have we will have less, rampant memory corruption problems in Rust based systems than c based systems.

Speaker 2: 46:48

I think that that's a pretty

Speaker 4: 46:50

Oh, I you're absolutely right about that. I mean, I when when we were working on a hypervisor back at Google, one of the first times that I realized that we had really made the right decision in writing this thing in Rust was, you know, we walked off the end of an array. We were accumulating some data structure and, you know, on our test system, there were 5 of them, and on, like, a real system, there were 500. And, you know, the array had space for 15 elements or something like that. And I I I still remember literally the hair on my arms sort of stood on end when I saw this, and it was like, we got a panic, and the panic said, sorry, you're indexing in, you know, 1 past the end of this era.

Speaker 4: 47:27

And it was it was exactly that phenomenon that you're describing, where it's like, okay, you walked off the end of the array, and you didn't crash, but you corrupted something, and the system kept running. And it was like, you know, having having worked in the world of weird research EC kernels for a bunch of years, And then all of a sudden coming to, like, RUS, where RUS is like, no. Hey. You just shot yourself in the foot. It was like, oh my god.

Speaker 4: 47:49

Wow. You can tell me that? That's awesome. It it

Speaker 2: 47:52

is really, really nice. And it it it does mean that I think and, Dan, this so the point you're making, I definitely agree with that. We certain classes of bugs, we have found other ways. I mean, it's great to have the forensic debugging, but the Rust eliminates a big class of bugs where you would need that debugging. Now it leaves intact the the really nasty ones.

Speaker 2: 48:17

So I think the need for tooling remains. But the the focus on that tooling needs to be necessarily on the nasty stuff rather than the easy stuff.

Speaker 5: 48:26

Yeah. I actually and no more

Speaker 4: 48:28

of this weird g d b. I'm gonna allow you to modify memory and rerun the thing. Oh, look. My program returned the correct result now. It's like, don't do that.

Speaker 4: 48:36

Write a unit test instead.

Speaker 7: 48:37

I think that's a really, really weird trade off that happens here as Rust pushes some of those, like, pushes people towards trying to solve those more difficult bugs and gets rid of a lot of the easier ones that, you know, I I I think one of the biggest issues with debuggers is actually kind of a human thing where comparing something like a debugger to something like Printline or Printf. Right? Everyone knows how to use Printf. Printf is always there. Printf works across operating systems.

Speaker 7: 49:03

People know how to use it. Debuggers are obviously more powerful, but they're more complex. And so there's this weird, like, mismatch in a way between when you're faced with really, really difficult to debug issues. But the hurdle that you have to climb is learning a bunch of tooling first to be able to do that, I think, is something that is a barrier for a lot of people to reach to debuggers as sort of the first or even second tool in a lot of cases.

Speaker 2: 49:26

Sean, I think you're exactly right. And I think especially when the thing you were trying to debug is the software that you yourself are trying to develop. And I think it's a mistake for people to denigrate prenf debugging. Prenf debugging is great. Like, if you've got a situation that you can debug quickly with print app debugging, you should debug it with print app debugging or print line debugging.

Speaker 2: 49:48

It the the challenge is more and actually, I end up as an undergrad, had a huge, like god. I've got a lot of people talking about this. Like, this huge gigantic department wide fight because this was during the object oriented programming. Like, object oriented programming gave rise to, like, these these fundamentalists that believe that there was the oop way to do it, and the and everything else was kind of a a the wrong way to do it. And in particular, adding print f to your code to debug it was the wrong way to debug your your your program.

Speaker 2: 50:29

So they would tell these introduction computer science program students, you may not add PRINFs to your code to debug it. You must use the debugger. The By Steve. By Steve. By the by the a professor who liked to unleash, the kind of the doctor Frankenstein of of programming tools unleashing the biggest kind of monster on the village.

Speaker 2: 50:53

And this debugger was incredibly slow and incredibly buggy and would often cry would often kill the patient. And these poor students sort of, like, weeping into the sun lab at 2 in the morning because they can't depunk their programs because they're not allowed to use print out today. And so we have this huge, like, blow up with the because we, those of us who are kind of on the systems track, we're like, this is you're doing wrong by these students. Like, they should be able to use print f is a valid debugging and an important debugging tool because you are modifying your program to admit a datum that says that you executed this this code. And that's an important tool.

Speaker 2: 51:29

And, Sean, I think to your point, it's like, that is a tool that, like, if you know how to write this program, you know how to use that technique. You've got all you don't need to ramp up on anything else. Nothing needs to understand your program. You don't need any initial tooling. And indeed, being able to do that quickly, I think, is important.

Speaker 2: 51:46

It it it's just that it's a tool that that is not useful for all classes of problems. And I by the I mean, Rust makes this amazingly powerful.

Speaker 6: 51:56

Got it. So apologies for just, stumbling into the discussion, but there's only one thing I wanted to to to say is that, you know, when I was doing the the go port for arm 64, there was a particular bug that was stumbling into. And, it was a very difficult to to reproduce bug. And, basically, the only way we could actually fix the bug is we attach a debugger to a particular test that was running for about 3 months. And the particular test took about 3 months to reproduce.

Speaker 6: 52:31

So, yeah, in some sense, of course, you could do printf debugging because you could recompile the code and and run it for 3 months. But the but the whole idea was that in those 3 months, you could observe the process, what it was doing, and, decide what you're gonna investigate next. And, essentially, if you have a proper debugger, it's like you you have a sort of dynamic print app. You don't have to recompile anything. You can keep running your test for 3 months or whatever, and then, you know, observe it for 3 months.

Speaker 6: 53:06

And that's that's one. The fact that you don't have to recompile your code is quite, quite, quite a powerful aspect of of having a proper debugger that you can have that problem. I mean, things like that.

Speaker 7: 53:17

To be to be clear, I mean, I I I'm not I'm not trying to downplay the power of debuggers. I totally totally agree with you. Like, they're they are, like, objectively more powerful than printf style debugging. Right? Like you you you have more tools at your disposal.

Speaker 7: 53:31

You have more control over the program. There's no need to recompile. I just think that it's it's it is worthwhile kind of looking at the I guess, I I hesitate to call it like empirical evidence, but I really do think that people reach for printf and logging style debugging before actual debuggers. And I I do think that that that human UX situation is a major, major aspect of why.

Speaker 6: 53:54

I I would I would definitely agree that it's a human UX, problem and that the the best way to do about to deal about this problem is to actually implement better interface for debuggers and actually educate people how to use the debug because GDB itself

Speaker 2: 54:10

Oh, god.

Speaker 6: 54:10

It's it's yeah.

Speaker 2: 54:12

Oh, god. It's just

Speaker 5: 54:14

That kinda runs up against a problem that we're talking about earlier where just because the better tools are made doesn't mean that the better tools are adopted. One time I had the opportunity to sit next to somebody on a delayed, flight coming back from O'Hare, it was cancelled. I just looked around, grabbed the 3 nearest people, and said, let's split a car because we're all going back to the same place. And, it just so happened that he was, somebody who I had seen issue a patch to the PSPP statistical mailing list.

Speaker 2: 54:42

And Were you coming back from a PSPP conference? How does it I first of all, I got so many follow-up questions.

Speaker 3: 54:48

How do you discover the same background? I mean, like, it's like, oh.

Speaker 5: 54:51

Yeah. So I, I used to work with a social science place. I was the open source nerd. We learned about SPSS, which is statistical package. I was immediately went to the open source version, and so that's how I found out about that patch.

Speaker 5: 55:05

It was, he was a professor at the University of Wisconsin and he was coming back from receiving an award for a tool that he developed that is the first hearing about it was the first time that made me question was a debugger. It would, instrument your program so that every memory write also logged out to a stats table, the values. And so then his paper that he was presenting was about inference of bugs through, you know, abnormal, abnormal rights. And my first reaction was, well, that can't be a debugging tool because you don't know about the bug. Right?

Speaker 5: 55:42

There there has to be a bug to be debugged. That was you know, I'm 23, I think, at this point, since then I've changed my position. But I thought this was, like, the future. I thought I had glimpsed the future, but it was some sort of mirror world that somebody else is getting to live. Because right after that, that was maybe 20 2005, 2006, and then right after that, everything hit the distributed issue

Speaker 2: 56:02

that we're talking about earlier. Hey. Hey. Do you have a so, I would love to put a pointer to that paper, actually, in our our our spaces notes here. Do you remember the author, or is there a I hope we obviously try to find it based on how you described it, but, how would we find that Seems

Speaker 5: 56:18

like 15 years ago. His name was Ben something. I can try to Google it. He's I'm sure he was a professor at UW

Speaker 4: 56:23

for a while.

Speaker 2: 56:23

Okay. That should be enough for us to go on. We should be able to go to go figure that out. Yeah. Interesting.

Speaker 2: 56:29

Well, in terms of, like, yeah, thinking that you that you've glimpsed the future and then not, and yeah. I think this is this is the the challenge with this kind of tooling is that it it does it requires so much specificity to make sophisticated tooling that you do end up with this least common denominator of gdb, which is, man, I try to be charitable to gdb, but GDB makes it so hard to be charitable to it because it's such a a mishmash, and it there's some stuff in there that's super valuable, but a lot of stuff in there that is that kills the patient, unfortunately.

Speaker 6: 57:02

It's it's always the same problem. It's like, there's nowhere never is the problem with lack of features. Always the problem with lack of abstraction. If you don't have the proper abstraction, you do not you if it's not programmable, then you cannot do your thing even though it has all the features. Like, I'm absolutely sure that MBB has less features than GDB.

Speaker 6: 57:22

It's just the the fact that it's actually programmable and it can actually write a shell object that can program GDB. It simply makes it more strictly more powerful.

Speaker 2: 57:32

Yeah. That's interesting. I also feel that, like, with all of this stuff I mean, the the other challenge with all of this is that you and, Shauna, you were kinda making reference to this in terms of you're talking about the kind of the UX lift. You do need to get people to find their first bug using this tooling, and hopefully, that comes quickly because the the the disposition changes towards tooling. Once you have found your first bug with this stuff, you begin to reach for it earlier and earlier and earlier, and you begin there are more and more classes of problems you can use to find it.

Speaker 1: 58:08

It's a great point, Brent. And and one of those things is it's so hard to motivate the education of a tool that can have a the perception of a a high ramp to learn without an actual burning need. And then once once you get them past that first experience, you know, then you've demonstrated that that investment is worthwhile. But when you don't have a bug to debug, it can be really hard for folks to grab on to to new technologies that have a ramp.

Speaker 2: 58:34

Yeah. I mean and and recently, I mean, we've seen this over and over again. I mean, I've I've seen this with everything I've ever developed. It's been, I would say, it's been, fun to kinda replay history with with humility with this this current debugger I'm developing. And, you know, watching my coworkers kind of use it for the first time to debug a bug that they wouldn't have debugged otherwise.

Speaker 2: 58:53

But it takes a while because it's, you know, the time to learn something new, it doesn't it's like, no. I'm dealing with a house fire right now. It's not time for me to learn something new. Like, my my house is burning. I wanna focus on that.

Speaker 2: 59:04

It's like, no. No. No. I know your house is burning, but, like, I I we we we actually have a more structural way of understanding some of this stuff. But it's it's tough.

Speaker 2: 59:12

It takes a long time to for people to kind of to get there. I know justifiably so.

Speaker 4: 59:21

It also requires a fair amount of infrastructure. I mean, one of the you know, going back to the whole idea of printf debugging, there was a time we were playing around with the Nova hypervisor, which had kind of atrophied. It's been on GitHub for a long

Speaker 5: 59:34

time. It's 30 bit only.

Speaker 2: 59:35

Right. Yeah. Yeah. Right. Right.

Speaker 4: 59:36

Well, we're yeah. Yeah. Because we're we're playing around with it and trying to get it to go, and it was crashing, and we weren't exactly sure what was going on. And, like, one of the most powerful debugging techniques, especially early on in boot, before anything was really set up, was basically, you know, as in volatile halt, and then inside of info registers. You know, which I think is very similar to the type of debugging that one would do with like a dtrace or an mdb or something along those lines.

Speaker 4: 01:00:07

But in the sense that you're you're inspecting the state of the system. But, you know, it's like you you just didn't have any of the infrastructure to be able to do anything like that, you know, because the system is still in this embryonic state.

Speaker 2: 01:00:21

Absolutely. And you've gotta be able to to to think about, you know, what can you add to the system to make that that a a faster and better experience to extract state, not necessarily dynamically, but, like, you know, early on in boot is a great example where, yeah, you don't have I mean, often, you just have, like, you know, LEDs or you've got GPIOs. Right? You're just kinda pulling GPIOs in various directions then and trying to infer your state that way. And that is where the in situ debugging can be can can be useful.

Speaker 2: 01:00:49

But, I I would certainly we have seen this. It it is always interesting to watch someone kind of hit that point where they are are starting to use the tooling because the the enthusiasm level changes quite a bit. Notice historically.

Speaker 1: 01:01:04

Yeah. I I gotta share. I I had a great experience this week, or or last week where I was showing a colleague, MDB, for the first time,

Speaker 7: 01:01:12

Speaker 1: 01:01:12

a real bug, a real problem they had. And running commands that I, like, forgot were my fingers. I hadn't run them like in, in like 5 years. And then being able to step back and ask the question, how would we have seen this with other tools? And it it may just not have been observable.

Speaker 1: 01:01:27

It may have been the kind of thing where you kinda read the tea leaves and make some changes and hope things change on the other side and they were related to the changes you made. But, there's nothing quite like driving one of these unknown issues actually to the root cause, and it's so satisfying.

Speaker 2: 01:01:42

It's very satisfying. The proof of a debugger is in the debugging for sure, and it's, like, actually finding issues with stuff that you finding thing learning things about your software that you wouldn't have found otherwise. I also do think I mean, we'd be we'd be remissed to to not mention all the open tracing efforts that have happened. I mean, we have seen an explosion in software observability that I think is all extremely positive. I I think it's still a challenge to actually use all of that stuff, but it's all steps in the right direction.

Speaker 2: 01:02:09

It feels I don't know. Adam, I really

Speaker 1: 01:02:10

Yeah. No. I think that's that's absolutely right. And I was thinking through this whole conversation that I I buy Dan's argument that the or or or the argument that we've come to that some of the, the lack of observability or the lack of of folks being able to understand their systems has been one of the strong motivators for rooting out some of these problems earlier with CICD and and test driven development and all all these kinds of practices. And I've been wondering, or or you have the same observation.

Speaker 1: 01:02:37

We see stuff like Observe or LightStep or whatever. But it just doesn't feel like we're quite quite over the precipice where debugging and debugging infrastructure and tooling has become just part of the process the way the testing is.

Speaker 2: 01:02:49

You're right. We have not hit the CICD point. I'm not sure when we hit that with CICD, but we've we were indisputably past that that kind of that fulcrum. And we're not we're not there on OpenTracing, I think. I

Speaker 6: 01:03:02

I I must confess one thing. The first the reason I ported Go to Illumos in 2003 was simply so that I could I could run Deepgram on Go.

Speaker 2: 01:03:13

Yeah. God bless you.

Speaker 5: 01:03:14

That's great.

Speaker 2: 01:03:15

Yeah. That's it it is actually it is very nice. I actually with the static languages, I mean, the the dynamic languages make it really, really hard to dynamically determine what's going on. And dynamically instrumenting dynamic languages effectively requires VM cooperation. Adam, do you remember our brief love affair with Perrot?

Speaker 1: 01:03:36

Yes. Yes. Absolutely.

Speaker 2: 01:03:38

You remember Aaron Rivers all the so we, pair it with this this VM that was gonna rule all VMs, and we were like, this is the VM we're gonna make debuggable. And and I became a huge parrot fanboy, and then parrot seems to have parrot seems to have died.

Speaker 1: 01:03:52

Yeah. I remember I got off a flight having read, like, the pearl 6 and parrot book. Like, I had like, I was bringing the good news. Like, I really I really felt like this was the the the gospel.

Speaker 2: 01:04:03

Have you heard the good news about Parrot parrot? No. That's right. Parrot seems to have, parrot and and it's now it's not pearl 6 anymore. Right?

Speaker 2: 01:04:10

It's what whatever. What are they calling it? I better 6 is it's it's been rebadged to Roku? No. Roku is what my kids watch.

Speaker 2: 01:04:19

Right. I'm I'm I'm in the same spot. What what is it what is it called?

Speaker 1: 01:04:24

Rokey? Raku.

Speaker 2: 01:04:25

Right? Is it Raku? Raku? Is it god. It's like, hey, Pearl 6.

Speaker 2: 01:04:29

I'm sorry. It's still Pearl 6. If only there was

Speaker 4: 01:04:32

some way that we could query the Internet. That's right. It's Raku, r a k u. I just looked it up.

Speaker 2: 01:04:39

The r a k u. I I don't know. I I I'm sure it's me. That one's not sticking with me. I don't know.

Speaker 2: 01:04:46

I think Yeah. I I for whatever reason, Roku is is is squatting on those synapses. I'm sorry. I'd I'd not that it you know, not deservedly necessarily. But, I think we we, you know, we just wanna keep these to about an hour.

Speaker 2: 01:04:59

So I I think we probably wanna wrap it up, but this has been, it's been great as always. Thank you, everyone. Adam, any any closing thoughts?

Speaker 1: 01:05:08

My closing thought, here's my here's my my shot, is that, the end of Moore's law is gonna be the thing that motivates us to understand our systems better. Oh. Because we're gonna we're gonna need to start squeezing where previously we could just be lazy and wait for Moore's Law to do the squeezing for us. Oh.

Speaker 2: 01:05:24

And that gets to Dan's point too. Danny's preaching your gospel in terms of understanding what systems that don't perform very well.

Speaker 4: 01:05:32

But it's not just performance. I wanna emphasize that. That was just an example. It's like I want to understand the behavior of my system.

Speaker 2: 01:05:39

And the end of Moore's Law is gonna force us to that. I like it. Adam Allen, have you been saying that the entire time? About 5 minutes. Yeah.

Speaker 4: 01:05:47

Don't lie, Adam. He's pretty it's sort of his little bullet points on a sheet of paper before

Speaker 2: 01:05:52

he's No. No. He's exactly burned down this morning.

Speaker 1: 01:05:54

When I got up this morning.

Speaker 2: 01:05:56

Wait a minute. Adam was the one that said, have you rid lobsters this morning? Wait a minute. I think he's like the Spanish prisoner. Alright.

Speaker 2: 01:06:03

With that Alright. Alright. Thanks everyone and, I will talk to you next week.

Speaker 5: 01:06:07

Thank you.

Speaker 6: 01:06:08

Alright. See you. Cheers.

Creators and Guests

Host

Adam Leventhal

Host

Bryan Cantrill

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere