Oxide and Friends | Transcript: A Crate is Born

A Crate is Born

March 6, 2025 / 01:43:09/S5 E8

Adam Leventhal: 00:00

Alright. 05:00.

Bryan Cantrill: 00:02

Five o'clock. Okay. So before we get started, I you know, obviously, we can't just rush into context for this conversation. I do feel is it is it Ghosh to do a little check up on my my intel prediction? Because I mean, I'm sure you could just Ghosh

Adam Leventhal: 00:16

or no? I think we need to I think we need to do it.

Bryan Cantrill: 00:20

So, recall that my one year prediction was that the intel co CEOs would still be in place after the end of the year. And and I also further I believe I can't remember if it's not upset or later upset that I believe I was gonna claim partial credit for this month by month. So we've entered a new month. I figure it's a it's time to do a checkup on this. And indeed, I I kind of the checkup is apt because Intel is actually just enshrined.

Bryan Cantrill: 00:49

I actually saw it like the the the news headline of like Intel names new CEO. I'm like, oh my god. Alright. I'm only gonna be one sixth correct. But as it turns out, it's just the filing that is enshrining the co CEOs, Michelle Johnson, Halthus and, and then her co CEO on the finance side who's I don't know, the finance guys.

Bryan Cantrill: 01:09

Sorry. Somebody's got a name.

Adam Leventhal: 01:13

She probably even knows it.

Bryan Cantrill: 01:15

You know, and and but did you look at that letter that they sent her? Yeah. I mean, congratulations, bang. On behalf of Intel Corporation, I am pleased to promote you to the position of chief executive officer, comma, Intel products.

John Gallagher: 01:31

I love

Bryan Cantrill: 01:31

Comma, reporting to the chief executive officer of Intel once named.

Adam Leventhal: 01:36

I I I love that the the process was so formalized. Nobody was like, hey. Do you think we should just, you know, double check it? It's like, no. No.

Adam Leventhal: 01:45

No. Just let the workflow do its thing. Like, if we don't let the workflow do its thing, then, why have the workflow?

Bryan Cantrill: 01:50

We can imagine, like, being in HR and tell me like, hey. I just wrote the sentence here to write, and I'm rereading it. This does not, I don't know. How many CEOs are we gonna have? I mean, we're have Oh, I don't think have VPs anymore.

Bryan Cantrill: 02:02

Like, we could have, you know, I'm like the I'm the CEO of energy. Oh, really? I'm the CEO of technology. Oh, okay. Alright.

Adam Leventhal: 02:08

You don't think it was just like a Mad Lib, like just fill in the blank, like no human actually lay hands hilarious.

Bryan Cantrill: 02:13

I can't wait to read this one. Oh, this is you guys are gonna love this one. Yeah. Yeah. It's and I would just like to and then also naming her salary, I'd like to say, good for her.

Bryan Cantrill: 02:23

Get that bag for this kind of public humiliation of being the co CEO to be reporting to a CEO that'd be named later. You deserve a million bucks. That's what I gotta say.

Adam Leventhal: 02:32

$33,000,000. Right? I mean, but who's counting?

Bryan Cantrill: 02:35

I guess that's true. And that's

Adam Leventhal: 02:37

before stock?

Bryan Cantrill: 02:38

I guess that's true. The the APB payout goal subject to eligibility, other program conditions. What again, what I I I feel that I feel they're both being kind of publicly humiliated with this. So I feel that they, you know what, go get it It's what I gotta say. And Yeah.

Bryan Cantrill: 02:54

I did so I I observed this on on Blue Sky and then I'm not sure if Ben Tucker is a listener to the podcast or not, but Ben Tucker had the the reply, but which are the e CEOs and which are the p CEOs?

Andrew Stone: 03:09

I saw that.

Adam Leventhal: 03:10

That was delightful.

Bryan Cantrill: 03:11

I lolt. I I like I actually snorted. I I guffawed. I I just thought that was I mean, skews that humor, man. You can't you can argue skews that humor.

Bryan Cantrill: 03:19

That's great. This being a reference to those who do explain the joke, Intel's notion of e cores, efficiency cores, and p cores, performance cores. So I love turning their SKU stack into a little deserved mockery of this ridiculous this ridiculousness in Play Doh.

Adam Leventhal: 03:42

I I know that we're just doing the check-in on your prediction. But but I did love the the register article headline. Wanna save Intel? Fire the board. Bring back Pat, says ex CEO Craig Barrett.

Adam Leventhal: 03:56

And the whole article just sounds so delusional. Just sounds so bananas.

Bryan Cantrill: 04:03

I mean, you have, like, literally everybody has a different everyone in the village has a different opinion about what to do with Intel. I mean, there it is I mean, yeah, it did is Paul Otolini? How Wait. Wait. What's the Paul Otolini involved?

Bryan Cantrill: 04:17

Is Paul Otolini dead? Paulo Ottolini is dead. Paul, what's that? We're not gonna get Paulo. Paulo Ottolini got a little sorry.

Bryan Cantrill: 04:23

Awkward.

Andrew Stone: 04:24

Got it.

Bryan Cantrill: 04:24

Paulo Ottolini's widow? Maybe? That maybe making it worse? No? Why not Paulo Ottolini's widow actually?

Bryan Cantrill: 04:31

Let's get I mean, not? You know what? Let's alright you know what I think we should stop before I say anything more inappropriate about about the dead yeah about the dead which is say Intel oh I'm here all week.

Rain Paharia: 04:44

Hey yo.

Bryan Cantrill: 04:47

The jokes write themselves. That's not what we're we're here. We're we are not here merely to celebrate the the one sixth the one sixth completion of my prediction. But we are here to to celebrate a a crate being born and kind of the the the the thought process behind that. So I want this is gonna we're gonna get some context here, but the the I wanna kind of give the the why I'm really excited about this discussion.

Bryan Cantrill: 05:16

Because I think, you know, Adam, to me what makes software so interesting is that there is so much ambiguity. You know, you you're kind of imposing this kind of really rigorous structure in terms of software into something that is like this kind of structuralist void. And as a result, like, a software engineer, you have so much latitude for design that it can be debilitating. Right? I mean, I I can find it

Adam Leventhal: 05:40

in both editing. Totally blessing and a curse. Right? The fact that you can't do anything and that you have to sift through the infinity of possibilities to find the thing you're actually gonna do.

Bryan Cantrill: 05:48

That's right. And I feel that, like, that gives you you know, this is a a a former colleague of ours, George Cameron, had a a great line. I don't know if it's original to him or not, but it feels like it it's only could be. Every line of code is a business decision, which I always loved. That part of the reason why you need to have trust in a in a software organization because you're relying on software engineers to kind of constantly make the right decision about what to do when and it's not always clear.

Bryan Cantrill: 06:13

In fact, it's often not clear. And there's some kind of like big subjects. One of them by the way, I feel that we will need to talk about at some point is like when do you rewrite something? You know what I mean? Like when do you conclude this is every step as a further step down the wrong path and I actually need to rewrite it.

Bryan Cantrill: 06:30

Like that's a really big decision. And we're kinda onto something that's not wholly unrelated to that which is like when do I when does the specific problem that I'm solving, when does this transcend the specific and merit a more generic solution? And on the one hand, I think we've all seen like false genericism can be a real problem where you make something falsely generic too early. But I think much more often it's that genericism comes even may arguably a bit too late. And that genericism can be extremely powerful.

Bryan Cantrill: 07:04

And, you know, most of the most powerful abstractions we have came from people that were trying to solve a more specific problem. I think that's that's fair to say. Yeah. Well said. And I and I think that and this I I just love this, like, concrete example and kinda telling the story of this example which seems like, you know, kind of a problem that wouldn't immediately be obvious to be generic.

Bryan Cantrill: 07:32

In fact, it wasn't. So Andrew, I wonder if you could get kind of give us the lead in here and maybe let's start with like the problem that we're trying to solve in terms of updating the system and the abstractions that we had developed for that, if you don't mind.

Andrew Stone: 07:49

Yes. So we are building an update system, and we are shipping a product where the product is distributed system, and we are trying to update that distributed system remotely and automatically and with while keeping things running, if those things being users VMs, users virtual machines. And so finding the right abstractions and patterns to do that has ended up being tricky, And so we ended up coming up with is this planner reconciler pattern. I think that's what Dave Pacheco calls it. And that's we basically have this semi offline step where we take these inputs, planning inputs, inventory, basically feedback information about the world, and we generate a plan.

Andrew Stone: 08:38

And then

Bryan Cantrill: 08:39

that plan It's a for how we're going to get to our desired state.

Andrew Stone: 08:43

Exactly. It is a plan. It is the plan is actually a reified and a blueprint, which is essentially our desired state. Right. At this point in time, based on these planning inputs

Bryan Cantrill: 08:58

and Just to put a little flesh on that, like what would be kind of involved in that state? What are kind of the some of the things involved in a blueprint state?

Andrew Stone: 09:07

So the number of, say, zones running on each sled and which zones those are, where given datasets we use EFS. So where given datasets live. So we have a bunch of those data sets are usually attached to zones, but not always. We have like the bug data sets, and we may want to spread them across a few U. 2s on a few sleds.

Andrew Stone: 09:28

We have anti affinity decisions coming soon, which will allow us to place instances differently, although instance placement is not actually part of the blueprint. It's a different story. And we also do things like configure disks and write them like, generate control plane what we call control plane disks that are really notated and, like, identified or located in our centralized database instances of the actual physical disks.

Bryan Cantrill: 10:07

In this kind of abstraction of blueprints is a really, I feel kind of an important abstraction breakthrough for us to to to split the problem a little bit to the kind of the this planning phase and then the thing that would act on it. Is that is that a fair statement?

Andrew Stone: 10:21

Yeah. So I mean, the key a a key issue is that, like, soon anytime you make a plan, like, is it human's plan, God laughs, right? Like, you make a plan, it's gonna change. And so it can change like immediately after you generate it. The inputs can change like while you're literally doing the computations to generate the plan.

Andrew Stone: 10:44

And so when we can create a plan, and then it doesn't automatically become the target to be executed until it's made officially the target. And so we build up a linear chain of plans, and there's an atomic step in the databases, just like compare and swap transaction, where each blueprint is based on its parent, and then we swap it out. We install a new blueprint if that constraint currently holds, if it's still if the existing target is still its parent is still the existing target. Right? And so right now, we do this manually through a tool called OMDB, but eventually, it's gonna be done automatically based on the environment and and timing and other decisions.

Andrew Stone: 11:31

And so automated update is where we're tending to go. But in order to get there, we first wanna figure out what the structure of our desired state of the whole rack looks like and eventually multiple racks. And then we take that plan and we, execute it. And so we look at that, and the executor can deterministically look at the steps in the target in the target blueprint and do things like deploy zone images to remote sleds. Right?

Andrew Stone: 11:59

And so, like, the both the blueprint planner and the executor run-in Nexus, which is, one of our core control point abstractions. And so it distributes, zones and disk information and dataset information to SLED agents, run on every SLED. And so you have this separation where the planner creates a blueprint, sets it as the target, and then the executor picks it up and starts and starts making it real. Right? And there's this whole other problem where, like, we actually have multiple instances of nexus.

Andrew Stone: 12:33

So multiple planning steps can be running at the same time as well as multiple execution steps. And so an executor can still be executing an old blueprint, we have to make sure that that's safe. And we mainly do that via generation numbers or versions or epics, whatever you wanna call them. And oh, where was I going with this?

Adam Leventhal: 12:54

Well, Andrew, if I may, just to orient again, this is to get us I mean, there's a huge huge amount of machinery to get us from the previous version of all these different components to the next version of all these different components. And I think another key aspect of what you're describing is making it comprehensible. So it's not just the computer executing automation that if it screws up, it's inscrutable to the humans trying to figure out what's going on, but to add some level of understanding and comprehensibility at every step.

Andrew Stone: 13:25

Precisely. Yeah. There's a

Bryan Cantrill: 13:27

And and as you said, it's not just just update. Right? It's like the you wanna be able to remove a sled that is that is that you wanna take out for life cycle management or because it's got bad hardware. It it to to remove a sled, to add a new sled, you're gonna reconfigure the system and get to a new state. Sorry, John.

Bryan Cantrill: 13:43

I I cut you off there.

John Gallagher: 13:45

No. No. It's fine. I was just gonna say we you mentioned this earlier that this this concept of of is a blueprint a description of the steps you're supposed to take? And it's it's intentionally not that.

John Gallagher: 13:55

It is the declarative state of the of the entire system that you want to be in. And then it's the responsibility of execution to make reality match that blueprint. This is important because like like, as Andrew said, we might have multiple executions running in parallel. Things can change over time. So if you had We went through, you know, a lot of design discussion about what the representation should look like.

John Gallagher: 14:16

And if you do something like these are the steps you should take, like evaluating that concurrently across three different Nexus instances while the world may change out from under you becomes like basically impossible to debug and understand what's going on when things go wrong. But if your blueprint says this is the entire state of the system as it should be, and if you generate a new blueprint, you don't generate the steps to get from a to b, you just say the new state of the world is b and you need to go and make that happen.

Bryan Cantrill: 14:44

Yeah. Which feels like, John, this is kind of a pattern that I feel a bunch of us have come to independently from our scar tissue with distributed systems where you it it it's much easier to build a reliable system when you know the end state as opposed to trying to mutate the system without actually completely understanding what its state is.

John Gallagher: 15:05

That's right. We spent, sixteen hours last week debugging a problem in a different component, related to blueprints that ultimately boiled down to, I know what the state of the world is supposed to be, but I've applied it as a series of deltas over the last fifteen days or something. And I ended up with an off by one error. If instead we could apply the entire state because we always know what the entire state is supposed to be. So anyway, yeah, agreed.

John Gallagher: 15:29

Scar tissue there as recently as a cut from last week,

Bryan Cantrill: 15:31

and we're trying to not do It's more of an open wound, actually. I'm kinda looking forward to this one being scar tissue as soon as the pleading stops.

Andrew Stone: 15:39

Yeah. I mean, the another important point about, like, this being the declarative state, the blueprint, is that the execution can make decisions about what to do. It's not supposed to make literal decisions. Right? But it's running.

Andrew Stone: 15:52

It's figuring out what steps, how to take how to get from the current state of the world to the next state of the world. And that in in almost all cases relies in running idempotent operations, like sending a new configuration down to the SLED agent with a given generation number. And so if that thing's out of date because a new plan has been made the target and another executor has pushed down a newer configuration, it just gets rejected, by the SLUT agent. So it's it's a really nice pattern. So we can treat this blueprint.

Andrew Stone: 16:19

We can we can get a bunch of list of blueprints and see the transformation of what the system was is supposed to be the direction it's supposed to be going, or at least what the planner thinks it's gonna be going. And if we have other information like inventory or what the actual state of the system is, we can perform some tests. Right? Or we can, like, reproduce the state of the system and figure out what went wrong in the planner. And so we have all these mechanisms, and we need a way to understand, like, what the changes are in these blueprints So we can see like, why did it make, why did the planner make this decision to generate this as the new state versus another state?

Andrew Stone: 16:57

And yeah, I think that kind of leads us into the topic.

Bryan Cantrill: 17:00

Got it. Okay. So that kind of gives you the idea of some of the abstractions that we are trying to implement here. So kind of walk us into how you ended up encountering this problem that ultimately was solved with Daft.

Andrew Stone: 17:17

Yeah. So it's not So, yeah. Okay. So Dave, Dave wrote ONDB, which is currently the, like, the mechanism we knew that we wanted because, like, building all this infrastructure is so complex and the current stage of the software and where we are. We knew that we wanted to, like, be able to run these planning steps manually and generate plans manually, and we wanted to be able to, like, look at them offline, and we built a bunch of tooling to do that.

Andrew Stone: 17:46

Right? And Dave wrote the OMDB tooling, which is our Omicron debugger. And, essentially, that's an access point that we can access for now, like, usually during installs or updates rather when when we log into one of the slides. Right? And so we can access all the slides there, the SSH in, the special keys for that, whatever.

Andrew Stone: 18:10

Yes. Our customers know about that. And so we log in, we can generate these plans. And so immediately, like, we started essentially saying, okay, we wanna get blueprints. And so a bunch of handwritten code got written to do that.

Andrew Stone: 18:23

And so our blueprints are pretty complicated. They're getting less complicated in a way before they get more complicated again, but we have a bunch of separate maps that maintain, like, all the zones, all the discs, all the datasets, a bunch of other metadata, like, click house and cockroach information, you know, where those replicas live, how far along they are, etcetera. And so we have this manual diff information, and the first thing, like the structure of the OMDB diff output, basically only showed zones. Right? And so I went ahead and added some some of that for ISS, I think, and then Sean went and added it for datasets, and we had just a bunch of nested loops that were that were a mess.

Andrew Stone: 19:07

Right? And when I finally got around for doing it for I added some support for ClickHouse, which is our metrics database, essentially essentially serves our metrics database across the rack. And and that support was pretty complicated because administering ClickHouse, multi node Clickhouse is more complicated than doing it for Cockroach. And so it ended up with all these, like, you have to actually implement multiple planning phases to get to a deployment of a multi node Clickhouse cluster. And so that I started writing the diff for that, and that became quite complex.

Andrew Stone: 19:40

And I was thinking, okay, I really like don't want to get to automated tooling for this, you know, right now. Like, I think, like, after talking with Rain, I was like, I think we really wanna be able to, like, have some, like, automated solution for diffing things and then just convert that diff to a printable output. And Rain did point out a diff is create, but like at the time I was like, let me just wrap this up. We I haven't had enough pain yet. I'm frustrated.

Andrew Stone: 20:06

I haven't had enough pain yet. That code went in and then started

John Gallagher: 20:10

the bug.

Bryan Cantrill: 20:10

Actually, that code went in. You mean, like, this is code that effectively does

Andrew Stone: 20:13

For Clickhouse. Yeah. I added some manual diffing code.

Bryan Cantrill: 20:17

Manual diffing code for Clickhouse. That is basically gonna kinda manually walk these structures and figure out where the differences are.

Andrew Stone: 20:24

Yeah. And then, like and then and then convert those differences, those manual differences and whatever structure I came up with into a printable, like, displayable table, like in a way that we do for zones and desks and things like that. And so that took, a day and a half to, like, get it all working and set up. Like, that's it was annoying code to write. Like, I'm sure it wasn't like a full day and a half, right?

Andrew Stone: 20:48

But like, given other things and meetings and stuff, it was a day and a half to get that in, which just like seems excessive for what, theoretically should be pretty simple. Like, take these two structures and like, give me a third structure. What is the difference between them? Right? And then you should be able to have some generic code that you can reuse for all these structures to print them out.

Andrew Stone: 21:08

And so, like

Bryan Cantrill: 21:09

But I mean, but on the one hand, got obviously, saw the one here, it would take me took me a day and a half. It shouldn't take me a day and a half. What I should do is spend several months writing something generic and then it would be immediate I mean, it it's kinda like the right disposition to write not but I think this is something that we we battle with all the time of like this feels this feels a little dirty, but like to make this generic would actually be much more time consuming.

John Gallagher: 21:35

Yeah. There's history here. Right? So like the very first version of Blueprints, all that it controlled was zones. Right?

John Gallagher: 21:41

Because we ship the system with the static configuration. And if we need to rebalance things, the very first thing that we need to rebalance is like the control plane services. So when the initial blueprints were written, the only thing it really had in it was a single map of you know, the key is the sled and the value is the set of zones that's supposed to be running on that sled. So obviously you wanna know what changes from one blueprint to another. You've got a single map on one side and a single map on the other side.

John Gallagher: 22:07

So you just write some handwritten code that shows you the difference between those two maps, right? These were added, these were removed, these were changed, right? And then like over time, right? That was eighteen months ago or something. And every six weeks, something else gets added to the Blueprint And we've already got these tests that check for diffs.

John Gallagher: 22:25

So when you add something new to the Blueprint, you see the changes from A to B. So somebody adds a new field to the Blueprint, they go back and look at the diffing code and they're like, oh, this is kind of messy, but like I only added this one new field. So I'll update this manual code and do this one new field, right? And Andrew is sort of underplaying his own role in this because he sort of got stuck holding the bag in a bunch of those cases where he was the one that had to go and update this diffing code. So he had to live this pain.

John Gallagher: 22:49

It wasn't that he got to Clickhouse and was like, oh, I had to spend a day doing this. It said he got to clickhouse and was like, Oh, I spent a day doing this, but I spent two days doing it three months ago. And I spent a day and a half doing it the month before that, you know, over the last eighteen months. Right?

Adam Leventhal: 23:02

Right.

Bryan Cantrill: 23:02

And

Andrew Stone: 23:03

yet I still punted on the clickhouse decision until until I started getting around to another pull request, which is to clean up some of the execution software. And that code ended up having to fix a bunch of tests and all those tests was affecting the hand rolled diffs in those tests. And at that point I was like, I need to stop and like step back because like, this is gonna drive me nuts. And like, I actually think that would have taken less time to do them the clickhouse changes when I look back on it, but I was just frustrated enough at that point that like-

Bryan Cantrill: 23:34

Okay. So this is interesting because I feel like there's always like that breaking point where you hit like, this is the point that is causing me to kind of reassess everything. And so this point is like, I'm having to go update all these tests effectively. Help me understand that breaking point.

Andrew Stone: 23:54

Yeah, so essentially that's what it was, is that like, I found these tests and so in order to write the tests that I wanted to write, we were looking at diff outputs and like this hand rolled and looking at the structure, and I needed to look at for the diff, I need to be able to look in the diff and be like, did this, was this zone or actually for this one, it was, was this disc expunged? Meaning, was it like physically removed from the system? Not physically, but like hard, like hard to remove. Like we're not we're not gracefully removing it. Like we're kicking out of the system from the control point point of view.

Andrew Stone: 24:29

And, you know, any zones that were on there are gone, and we're gonna kick them as well, etcetera, etcetera. Well, there was no expunge field in the diff. So now I had to go through and add the expunge field, but that's gonna change every single, like, expectorate test. We have these golden file diffs, so all those are gonna change fine. We automate updating for that.

Andrew Stone: 24:46

Maybe that's fine, but I also had to go through like another 20 tests. I would have had to go through another 20 or so tests to like change how the comparisons are working. And it just it just seemed like wrong when like that that field should just be in the diff automatically. Like, we add the field that you get added to the diff. I'm like, why can't I just get that?

Bryan Cantrill: 25:06

And you also have to have that moment of like, okay, this is not just tedious for me now, but this is going to continue to be a problem. Like this problem is only gonna grow as time goes on. Like this is not the last time we're hitting this. I've hit this, you know, I a couple more times I was left holding the bag on cockroach and you're beginning to get this kind of mass of like, it actually is it merits a slightly more generic solution here or more elegant solution.

Andrew Stone: 25:35

There's actually a couple of other things that drove me towards this. So we weren't actually just diffing blueprints. We were diffing blueprints with inventory collections, and they have different structures. And that is straight up madness. So we like, it made no sense.

Andrew Stone: 25:53

Like, we were diffing things that were incapable of being diffed because the blueprint had more information than the real state of the system on some And so like those fields didn't exist. And so then when we wanted to print those out, we had to come up with a structure for the print output. And like, Iliana did this at one point, and like, I ended up copying Zerich Zerich stuff just because it was like a it was frustrating mess. And like and at that point, I was like, I've been so irritated by these collections, especially with the Clickhouse stuff that I was like, you know what? I'm just gonna remove this diff like, I will come back to this, but I'm like removing all the difference between Clickhouse and between sorry, between inventory collections and blueprints.

Andrew Stone: 26:34

Right? And, like and when we have an automated solution that can generate this, we will generate a unified type that is the like, essentially the the union of these two types, and then we will diff those.

Rain Paharia: 26:47

And then

Andrew Stone: 26:47

we will always be self consistent. And so, like, that has not been implemented yet. I've moved on to other things just because, like, nobody was using that stuff. Like, we weren't using it anywhere, really. Like, we we have some homemade tools, reconfigure it or I mean, all this is homemade.

Andrew Stone: 27:03

We have a reconfigured our CLI where we can actually download blueprints and inventory from customer systems, and we can also, like, do that in tests and stuff so we can, like, look at those.

Bryan Cantrill: 27:13

I I I love that. Yeah. Yeah.

Andrew Stone: 27:16

It's great. And, like, you can edit the blueprint that way. It uses the real builder, which is, a subset of the planner. It it's it works great. It allows us to interactively as a human, like, edit a blueprint so you can, like, change it in just the way you want.

Andrew Stone: 27:28

And you can build tests specifically by building your inputs that way. But, like, we really didn't use the the diff functionality there, but we will wanna use it, and we're especially gonna wanna use it when we, like, do online planning. We're gonna wanna, like, show why we made decisions. Kinda like you wanna you wanna have, a justification when you're when you're making when we, your automated system is making decisions, it's gonna go, no. I saw this in the inventory.

Andrew Stone: 27:52

This was my diff here. This is what I saw, and this is why I made this decision. And so, like, you end up with this kind of reasoning system. And so I was like, alright. The first step is just figuring out how to, like, diff two structures automatically and then convert them to the output that we want.

Andrew Stone: 28:07

And so the structures we want are blueprints. And so Rain actually proposed a library called Diffus, which does this sort of semantic diffing. And so I was able to like it's got a nice

Bryan Cantrill: 28:22

grab I mean, and and q, like, reigns, like, oneness with the crate ecosystem. Right? I feel like you always like you're you're like, hold on. I'm feeling your crate coming on. Just a moment.

Bryan Cantrill: 28:34

I need to tell silence, please. Well, I summon this crate from the other world. It's like, you know, we're all gonna hold hands and have a seance. And, oh, a crate has appeared. I I mean, I don't know how you do it.

Bryan Cantrill: 28:45

I mean, you stay, like, very in tune to what's going on in the community in that regard.

Rain Paharia: 28:51

Well, this one, I think, was just something that I was I don't know. Like, I'll sometimes browse create silo as a hobby. Do do you not browse create silo as a hobby?

Bryan Cantrill: 29:02

Thought I wish you did. I if I could tell my brain that, like, hey, you know the that time we spent on YouTube from one in the morning to three in the morning where we were looking at, you know, at esoteric private plane crashes in Alaska. Maybe we could instead be on crates.io browsing trust crates.

Rain Paharia: 29:21

So, like, yeah. So this one, I think I actually in a in a prior role, I wanted to do semantic diffs off, I want to say, like a two level nested structure as well. And I was like, Okay, is there some automated way to do that? And so I found this crate called Diffus. Diffus is very interesting.

Rain Paharia: 29:51

So this is built by another company, it seems like, right? And they clearly that kind of semantic diffing as well, just as we do. And so I kind of proposed that to Andrew. And then Andrew, I think, spent a fair bit of time with it and came away with the impression that Diffus is it has some sound principles, but it also kind of does things a bit too aggressively. Andrew, do you wanna share some of the things you found about it?

Andrew Stone: 30:34

Yeah. So, like, first off, let me say, like, would in no way would exist without diphys, right? Like, it just I wouldn't have even thought to create a procedural macro to do this. And like, yeah, so I started playing with diphys, and it was super easy to basically put the Diffus derived macro on all the types we wanted. It has like a few strange things.

Andrew Stone: 30:57

So it doesn't use the EQ, right, equality, to determine equality between types. And so it uses something called same. And so you can implement same on floats and the way it actually implements them on floats is to print their representations and then compare the printed represent ations. Excuse me. Kinda weird.

Bryan Cantrill: 31:20

Oh god. God. Wait a minute. I didn't write that code. Where I I have a I have an airtight alibi when that code was written, by the way.

Bryan Cantrill: 31:26

I did not that's not me. That's all what? Sorry. I I feel so defensive.

Andrew Stone: 31:30

But like that was so minor. Right? But I did have to like manually implement same on a few types, like, you know, like three out of like 150 or something. It wasn't a big deal. So like, I just thought that was kind of weird.

Andrew Stone: 31:41

And like it also had, like there's this, it kind of has tiers of wrappers around its diffs. And so every type was wrapped in either a What do they call it? A copy or change. Right? And then if you're in a map, like if you're looking at a map, the map change is, like, nested in, like, did this map change?

Andrew Stone: 32:03

And so, yes, it changed. So you go to the change variant of the enum. And then under that, you have to loop through the map and look at each individual element and see which ones, like, changed. Right? And there it'll tell you if it was added and removed or whatever.

Andrew Stone: 32:17

But like you now have this process where you have to like step through this hierarchy. And so you're doing a bunch of matches when you're there and it gets super tedious. And I was like, I was gonna just use that directly and implement the diffing, like the diff output based on walking those. And as soon as I start doing it, I was like, this is horrible. I was like, rain, like, show me an easier way.

Andrew Stone: 32:37

Like, just like, the the format of the output was so strange. Not strange. Like, it makes it makes sense self contained. Right? When you want to use it for what we wanted to use it to, giving our existing different code, it just was harder than like it should be.

Andrew Stone: 32:51

And I didn't want like anytime anybody added a new structure to have to walk these different types. I was like, how can I essentially wrap over these? So I said like, Rain, like what's a way to do this? Right?

Adam Leventhal: 33:04

What's your other crate? What's your backup?

Rain Paharia: 33:08

Think, you know, think it was

Bryan Cantrill: 33:09

different. Name 10 more.

Rain Paharia: 33:10

Oh God. So, you know, I think one of the design principles of diffus dense seems to be like it's very opinionated, right? Like for example, with like strings, right? Like if you have two strings that aren't equal, it will actually do like an LCS on the strings, right? It will actually like compute, you know, actually try and diff the strings themselves, right?

Rain Paharia: 33:33

Rather than, you know, like with strings, often like you have multiple different ways you might want to diff them. Like for example, you might want to diff line by line, or you might want to diff like just all you care about is whether they're the same or not. And I think one of the things about Diffus is that, I mean, they're clearly building it for their needs. Like, it just seemed to be like very, very opinionated in a way that just did not quite match some of our needs.

Andrew Stone: 34:01

But it's also like comprehensive, right? Like it does things that Daft, which we'll get into, like does not do, right? It does those LCS comparisons. And like, we just wanna know if the strings are different, that, we we don't need to know, like, where they are different per se. Like, we can we can figure that out on our own later.

Andrew Stone: 34:16

And more importantly, if we want to, we can provide, like, custom hooks so, you know, people can provide their own mechanism to do that, like, difference functionality if they want that. And so like they try to take every type in Rust and make it difficult and we just don't do that.

Bryan Cantrill: 34:31

And so Andrew, your experience with diff is like, okay, like I'm seeing some of the power of a generic abstraction here, but this one isn't quite the right one for what we need to do with the kind what we need out of Yeah.

Andrew Stone: 34:43

But it took me about a month between there and DEFT. And that month, was me implementing, like, using diffus and and building a library of pipes to convert from the diffus types to what we want. Right? And so I asked Rain, like, Rain, like, what's like what's a good way to, like, parse these, like, an automatic way? And Rain, can you talk a bit about, like, the hierarchy or the, like, heterogeneous, like, three parsing problem and, like, what you recommended to me?

Andrew Stone: 35:18

Because like this is like I ended up doing a bunch of like tedious boilerplate work that we ended up throwing out. But like without doing that work, I'm not sure again that like I would have gotten to Daft.

Bryan Cantrill: 35:29

Yeah. Interesting.

Andrew Stone: 35:30

So yeah. Like I got like I essentially did get the automated diffing working before DAF was written, but I threw it out because like it was Friday night, I got it working and like the PR was like good, it could have been merged, but I was like, I just had this icky feeling. Like, I spent so long doing this and it still wasn't quite complete. So, like, part of the problem is this heterogeneous, like, tree walking problem, which you get with, like, parsers and compilers and things like that. So, like, Rain Rain could explain it better than I can.

Bryan Cantrill: 36:04

Yeah. Yeah. Rain, elaborate on that. What what does it

Rain Paharia: 36:07

So, you know, like, when so usually, like, you know, in in college or whatever, when people talk about you want to do a graph traversal, or a tree traversal or something, people generally think in terms of homogeneous graphs. So these are graphs where all nodes have the same type, and all edges have the same type, right? And so that's the first thing that comes to mind, right? In the real world, that is sometimes the case, right? You do see homogenous graphs a lot.

Rain Paharia: 36:40

But for parsers and compilers and things like this, you often have lots of different types. And those different types relate to their the different types relate to their like the nodes, the edges between them have different shapes and so on, right? So it tends to vary a lot in the real world in a lot of different problems. So if folks are familiar with the Sin crate in Rust, Oh, yeah. That is the crate that lets it is a full reimplementation of the Rust parser, right?

Rain Paharia: 37:14

That crate actually has a really good abstraction called the visitor type. And so the visitor type, the idea behind that is that they actually list out every single node type. And then you can put you can essentially have hooks for, like, where you want to do things. And the way the visitor trait works is really interesting. It actually is a borrowing from functional programming languages, which are the first ones to really deal with that.

Rain Paharia: 37:49

So if folks have been around functional programming, they've probably heard of like lenses or optics. Those are words that get thrown around. Visitor actually is like an example of a lens. So David Tollney has very clearly been inspired by this stuff, but also does not use that word anywhere, which is really fun. But so I think the way we were thinking about it at first was, can we bring a visitor trait into the diffing code so that the idea would be that along with next to the diff that gets generated, also have this visitor trait.

Rain Paharia: 38:36

And this visitor trait has these walking functions, and then you can insert hooks into those functions to do things either in a preorder or postorder basis. So that is actually a strategy that works really well. I mean, I've used since visit a bunch. I also added visitor API to TOML edit. Yeah, you know, it turned out that in this case, there were just so many different variants, and there were just so many different cases that I know Andrew was like, it just kind of blew out of proportion, like the extent to which you had to model.

Rain Paharia: 39:16

Because again, with DFS, it's like copy and change, and so you want to model both the copy variant and the change variant. And you want to model both the copy and the change variant for every single type. Andrew ended up having a copy, a change. It was just such a big, you know, such a big trait. It ended up being probably the biggest trait we have at Oxide, and we have a couple of really big

Bryan Cantrill: 39:40

traits at Oxide. Wow. Yeah. I feel sort of saying something. That's like in kind of the side trade Olympic games.

Bryan Cantrill: 39:47

I mean, fact that you're taking the podium there is pretty impressive.

Andrew Stone: 39:51

Yeah. Yeah. Brain warned me, like they're like, yeah. You know, this is gonna be, like, brutal. Like, I like, I know this is gonna be bad, but it's like a one time, like, just do it.

Andrew Stone: 40:01

And then like everybody can use the visitor, right? And like, yeah, you may have to add like a new, when you add a new field, add in like, we just like, because we're handwriting these visitors, like there are crates that can auto generate them. Because we're handwriting these visitors, like, okay, when you add a new field to the diff, you will have to add essentially a callback method. But like, you're just gonna be copying the stuff that's there. It's pretty straightforward.

Andrew Stone: 40:23

Right? And so I did. I went through, I got all that work done and like wrote a bunch of visitors. I actually wrote them for individual types and then had a trait that composed the other traits with some associated types so that you can slide them in. Like you could if you just wanted to get part of a blueprint, you could do that and have a visitor for it because some like the blueprints, you know, contain a lot of other large types.

Andrew Stone: 40:51

And so

Bryan Cantrill: 40:52

I mean, and so are you kind of thinking like, is the juice worth the squeeze here? I mean, or

Andrew Stone: 40:58

the Yeah, like, I'm I'm thinking like, is this the right way to do it? So like, here's what got me to like, I got this done and like the whole time, like, I'm like, okay, I'm like, I'm a week out. It turned out I was two weeks out, but like, whatever. Double, like, double all my estimates. That's fine.

Andrew Stone: 41:14

And like, I finished it, and and I looked at the visitor, and I was like, you know what? Like, this is not a mechanical visitor. Like, I'm not taking the types and then making callbacks that are purely mechanical, because those callbacks would also force ourselves to use those different types that were somewhat oddly named or like things that we didn't care about. And so I ended up like massaging them into like other structures as part of the visitor. So I was essentially parsing and restructuring the codes and the code in the callback to make it more uniform.

Andrew Stone: 41:53

And so I did that and I got it working, but that meant that any time anybody added something, they weren't just doing this mechanical transformation. They'd have to figure out what the right structure should look like and put them in the types that I created. And so now they have to know the difference types and my new types, and that's gonna be worse. Right? I'm trying to do this to, like, create less work for people.

Andrew Stone: 42:11

And I'm writing justifications in the poll request saying, no. No. This is gonna be better. And I'm reading this after I'm done, like literally five minutes after I tell Brian, like, I'm done, it feels good. I go, You know what?

Andrew Stone: 42:22

It doesn't feel good. And like, I'm basically writing these justifications and I'm gonna get like, I'm gonna torture people with this. Like, it may be worse than the original. And because like, what we were doing was walking these visitors and like, cool thing about visitors is if you're in a parser like sin, you only want callbacks on a few of the methods or a few of the structures, right? Like in some cases you just don't care about things.

Andrew Stone: 42:44

Like, so it's really cool for writing tests actually to have a visitor. But with diffs, we actually want to diff every single thing. So you are like, what I was doing was taking this visitor and then writing an implementation of the visitor for the blueprint diff. That would go through and walk the full tree using all the callbacks and put it into the blueprint diff struct that we wanted so we could use that format to display the things we wanted. So this intermediate state.

Andrew Stone: 43:12

Right? And so then I looked at, like, Rain said something about automating blueprints or automating visitors. Like and I looked at them and I that I realized, yeah. Yeah. Like, I'm not doing this broke transformation.

Andrew Stone: 43:22

I can't use any of these crates that will, like, automatically derive a visitor. And like, what I did is also horrible. It's just a crime. And so I'm like, you know what? What I really want is just this end structure.

Andrew Stone: 43:33

And like, I did modify, like as part of this, I added, I think like a bunch of new features to Difference. Like, I changed the the the proc macro to do different things and to, like, report more information than it was reporting, like, to give you the actual, like, before and after types when it detected a change and not just, like, force you to recur. So you could get back the original types, which is what you want or at least what we want in our case. And I was like, well, what if we had a proc macro that did the diff, but then actually spit out that structure that we want instead, and we don't have to do any of this bullshit. Then we could just take that structure directly and print it out.

Andrew Stone: 44:07

And so that's how DAF came about. Yeah, and so I did that. That was created, like the initial implementation of that I did that weekend. Like I started that Friday night, I started playing around proc macros, never written a proc macro. Shout out actually to one of my former VMware colleagues, Edamir.

Andrew Stone: 44:27

He he wrote a little tutorial and it just got me feeling like, hey, you can do this. So that was really nice. It was nice to see. Was like, I know that guy.

Adam Leventhal: 44:35

How was the taste of the forbidden fruit?

Bryan Cantrill: 44:38

Absolutely forbidden fruit.

Andrew Stone: 44:39

You're just like, I got it in

Bryan Cantrill: 44:40

a mood and I love it. I can't like, get me. Lie. Yeah. I'm ready to go sin.

Andrew Stone: 44:47

I will say that after like, when I came in on Monday, I was like, oh, I have a new power. Like, I'm gonna, I'm just gonna like, I'm gonna mesh it up. Like, we're gonna, like, we're gonna, we're gonna go. Yeah. We're gonna get weird.

Andrew Stone: 45:00

We're gonna get real weird. I

Bryan Cantrill: 45:03

almost feel like your first Monday after proc macros should be a Monday that you're forced to take off. It's like you need to be verification. Manage waiting period, seventy two hour waiting period from proc macro discovery to actually being, it's just it's just too dangerous in those moments. You're just

Adam Leventhal: 45:19

like Until you're no longer trusting testing positive for pro proc macro,

Bryan Cantrill: 45:23

you should just stay home.

John Gallagher: 45:24

So Andrew Andrew had to show up. Right? Because if you look back at his PR, he opened it late Friday night and tagged a bunch of people for review. And then first thing Monday morning, he's like, nobody review this. Never mind.

Andrew Stone: 45:33

This is not the way we're happening.

Bryan Cantrill: 45:35

I was, I just was in like leaving Las Vegas for proc macros and I am now strung out on sin and everything else.

Andrew Stone: 45:42

It's all like good, like a bunch of new things that I've done, not just an oxide, but everywhere. I was like, I'll put something out for review. And I was like, I wanna try something else before people get in on Monday to review this. And like, yeah, this was like me, like race against the clock. Like if we're gonna like merge this thing in or wait a day, whatever, get some reviews, or I'm gonna figure out something different.

Andrew Stone: 46:05

And this different thing is gonna work or it's not. And so, yeah, I went like

Bryan Cantrill: 46:10

actually throttle. That's really interesting, Andrew, because I actually do advise people that like, if you are thinking like, god, I think this other route may be the route and I'm beginning to like feel like I wanna go down this different path, but I don't know if that's the right decision or not. One thing that I always advise people is like, come up with a an amount of time that it's gonna take you to explore that other path and then give yourself that amount of time and say like, okay, I'm gonna give myself, you know, in your case, was a weekend, but it might be like, I'm gonna give myself a week to go down this path. And after that week, I will know one of two things. That this path that I'm exploring is the right path or is looking promising, and I will have much many more reasons why I wanna continue down that path.

Bryan Cantrill: 46:56

Or I will know that like, no, no, this is a it looked enticing, but it's actually a dead end. And now I can go back to that other path I was gonna go down with a lot more certainty. And the only thing like you're you're kind of bounding the amount of time you've lost on it to a week if it ends up being the wrong path. And I have found that that's been because I I also feel that by the time a software engineer is kind of like verbalizing this, the path that feels enticing is almost always the right path. You know what I mean?

Bryan Cantrill: 47:25

Like, subconscious knows that, like, I I I and it is only the sunk cost fallacy that is preventing you from just diving into it. And so I think it was the I mean, I but you I knew

Andrew Stone: 47:41

like I knew as

Rain Paharia: 47:42

soon as

Andrew Stone: 47:42

I had the thought that everything I did was garbage. Like I was like, yeah, this is this is bad. Like I just, I don't know if I get a feeling like that after writing, like it was a lot of code too. It ended up being like 5,000 lines of code. Right?

Andrew Stone: 47:54

And so it just some of the parts of it were already merged in because I was writing the different visitors for the smaller sub structures and merging them in so we could get smaller reviews. Right? It's when I got to that final boss stage and put everything together that I was like, this is horrible.

Adam Leventhal: 48:10

Brian, what you're describing is is such an important piece of advice, I think, because at the top, you talked about the the breadth of possibility. And I think for me and I think for a lot of folks, that that sort of doubt, am I working on the right thing? Right? Like, I have this speculative idea in mind. I'm a day in.

Adam Leventhal: 48:27

Did I waste my day? But if you just kind of put that time horizon at just, I'm gonna spend two days on it. I'm gonna spend a week, whatever it seems appropriate and say, I'm not gonna ask myself if I'm doing the right thing for this amount

Bryan Cantrill: 48:38

of time. That's right.

Adam Leventhal: 48:39

It's very liberating.

Bryan Cantrill: 48:40

That's right. Yeah. Because I actually that voice in your head, at least, I mean, for me, it sounds like for you as well. I think for most offenders, that voice in your head of like, I don't know. This feels like a shouldn't we be getting back to the trail here?

Bryan Cantrill: 48:51

Like, we seem to be, like, off of the thicket. It it's like that voice slows you down. You know? You you end up being like, no. Like, I need to actually, like, just barrel through this thicket, and I need to, like, silence that voice.

Bryan Cantrill: 49:06

And I can't do that if you are kind of like you're kind of like taking it like day by day. You need to be like, no, I'm gonna give myself a week or whatever it is. Whatever it kind of takes you to go all the way into that. And I feel like, I mean, god, I've done this so many times in my career and it end I mean, I'm actually that I'm reminded of when we had the back in Fishworks days and we had this the the the appliance had this shell that I that had become this pearl monstrosity. And it would, you know, the way every pearl monstrosity becomes a pearl monstrosity, like one line at a time, you know?

Bryan Cantrill: 49:46

I mean, it's just like, and I had let the sunk cost fallacy just push me further down this path. I'm like, this is the wrong path. This is the wrong path. This is the wrong path. And and I it was like really me having to like reason with myself of like, I and I knew that like, I I wanna rewrite this thing in JavaScript and I'm gonna give myself a week to go do it.

Bryan Cantrill: 50:06

And then after that week, was nowhere near done. But after that week, I had total clarity that that's the right path to go down. And God, was so liberating. And fortunately, Andrew, you figured that out very quickly that like, okay, this is and I I'm also

Andrew Stone: 50:24

mean, basically it was like literally like six to eight weeks of like working on the wrong thing. I'm like, we have major things that need to get done. And I'm off here, like figuring out how to like display something nicely in our like non customer visible tool.

Bryan Cantrill: 50:40

Well, and I think that was also really tough, right? Because it's like, you know, this is an extreme, like this more the kind of aggregate problem of updating the system and being able to do that autonomously, being able to do that in a way, and this is a distributed system that is gonna be operating over an air gap or is operating over an air gap. And it's like this is our top priority is to get this kind of this seamless update. And on the one hand, like you're working on that top priority. The other hand, you're like, god, am I like, you know, I'm over here.

Bryan Cantrill: 51:09

She kind of like, I am in the kind of the thicket. But I've always I mean, it's like we know that this is because the way we got here is like so earnest in terms of like trying to get this thing to work and kind of being burdened by this technical debt. So I mean, I don't know. When you and I were talking about this, I'm like, I just feel like now is the time to do this. This is only gonna get worse as time goes on.

Bryan Cantrill: 51:31

So let's get this thing.

Adam Leventhal: 51:32

Yeah. Two more pieces of advice along those lines. One is, I think I think we've all had this feeling. When you feel like you're you're reluctant to add anything to your existing kind of direction because you feel like every step you take is in the wrong direction, like, it's a it's an another step down the wrong path. That's one shibboleth.

Adam Leventhal: 51:49

And then the other, it sounds like you did it here, is when you go articulate it to someone else, often it could be helpful to have someone else feel like they're giving you that permission. They're like, no. A % go take a week. You know? Even if it's even if that person doesn't really have a great handle on it, but even the passion with which you talk about walking down

Bryan Cantrill: 52:09

I mean Or that person is giving you really sage thoughtful advice.

Adam Leventhal: 52:13

That's right. That's right. It just happens to be the same advice they dispense to everybody.

Andrew Stone: 52:18

We were really trying to figure out, like, whether we were like, update is a super important problem for us. It's critical to get it right. It's also critical to like, you know, get it done in less than ten years. And so like, you know, maybe even fewer than that.

Bryan Cantrill: 52:33

Any investor listening to this is just like vomited into a trash can. They also, I mean, many, many time figures for less than ten years. So yes.

Adam Leventhal: 52:42

Yeah.

Andrew Stone: 52:43

But like, yeah, we had a meeting and we were like discussing it. I'm like, I'm like 70% done with this visitor information implementation, but like I can totally move on to like other stuff and come back to it. And everybody's like, yeah, like it's fine. Like just keep working on Brian's Brian was pretty emphatic. Like just finish it out.

Andrew Stone: 52:58

Like you hit a real problem. You started doing it. It's taken longer than expected, but like you're close. Like, to be fair, I was close, except I was close on the wrong implementation. But luckily, it only took like an extra week to get the real thing done.

Andrew Stone: 53:12

And then So

Bryan Cantrill: 53:13

then the the so so tell me about that. Was it so that weekend is the, okay. I've got this PR out, but I think it's down the wrong path. I'm now gonna do is Daft kind of born that weekend of like, I'm gonna do my own thing.

Andrew Stone: 53:27

Yes. Yeah, absolutely. And I think I call it was, I wanna generate, I basically don't wanna do all the extra stuff that's going on, and it was like, let me see. Yes. I mean, in short short answer is yes.

Andrew Stone: 53:41

It was it was exactly that. It was like that Friday night. It was like, I want a proc macro. I want the proc macro to look different from what diffuses. I want it to output different types of structures.

Andrew Stone: 53:50

I'm not sure what I wanna do is even possible, but I'm gonna, like, try to implement it. And so, yeah, I worked on it that weekend, and I did. Like, I had it diffing mapped and and sets and, like, normal structs and enums by, like, Monday. So, like, it was it was ready. Like, I had a p I had a a library that was available to look at by that Monday morning.

Bryan Cantrill: 54:17

Right? That also must have been great, by the way.

Andrew Stone: 54:19

Yeah. I mean, like feel really good.

Bryan Cantrill: 54:21

I mean, that feeling is so good when on the one hand, you've been kinda wandering in the thicket for a while, but when you kinda break through and you're like, no. Now I've got total clarity and every line of code is kinda telling me like, nope, this is the right direction. This is the right direction. That just feels great.

Andrew Stone: 54:38

And like changing it to like use it. It also had like so I had added functionality Diffus to like ignore certain fields. You couldn't do that with Diffus originally. So I added that to the proc macro. That was my baby step intro to editing proc macros.

Andrew Stone: 54:53

I was talking to John and he's like, It would be really nice if we could just ignore these fields. That seems important if we're gonna have a drive macro. Okay, let me figure out how to do that. It was actually surprisingly easy. But yeah, I added that support.

Andrew Stone: 55:06

Then like the other thing was just being really explicit. So CAF took a different approach in terms of not parsing everything ahead of time, so not fully recursing. You could mark things as leafs. So like, diff is actually diffs when it diffs enums, it does different things for whether it's a change in the variant type or not. And so, like, if it's a change in the variant type, it gives you the before and after.

Andrew Stone: 55:33

So, like, the the wrapped type, essentially, which is, like, kinda what I wanted. I wanna know, like, what

Bryan Cantrill: 55:38

the actual name Yeah. Yeah.

Andrew Stone: 55:40

When, like, they're the same type of the same variant, and it recurses, and it recurses automatically, and there's no way to get the original out. So that was another added I added to Difference was to be able to give me the original when it recurses. But then there's also like you're matching on these different things. And so I just said, you know what, all I want is this thing called a leaf, and this leaf will get spit out whenever like, when we wanna bottom out. And sometimes like you can also, those types can be derivable.

Andrew Stone: 56:06

And so like you can still diff them, but you can stop. So you have this laziness, You stop at a leaf, and then you can like, if you want to, you can continue to diff into it. And so like by default, all enums, I mean, yeah, all enums are just leafs, like they just diff to a leaf. And you can, like if the inner types are comparable, like, the caller can just call diff on them directly. So, like, right there, it makes it usable in a in a few more scenarios than than diffus was.

Rain Paharia: 56:40

Yeah. I think I think, you know, one of the insights that I think we feel like we got from Diffus is that fully eager diffing is, like, in at least for our use case was not quite correct.

Andrew Stone: 56:54

And Yeah. Interesting.

Rain Paharia: 56:55

Being able to have an interplay between eager and lazy diffing, which is where DAF kind of ends up, is actually ends up being, like, really suited for our domain, with Blueprints at least.

Bryan Cantrill: 57:06

You could I mean, because we kinda want a sloppier diff, a kind of a programmer dictated sloppy. Like I don't need to do like a I don't want you to actually automatically recurse on all this stuff. I wanna ask the programmer. I wanna dictate some of that stuff.

Rain Paharia: 57:21

Yeah. Yeah, exactly.

Andrew Stone: 57:22

Yeah. And so like I mentioned Clickhouse just as part of that, like the Clickhouse cluster config is very large. It's a very large structure and I just tagged it with a leaf annotation. And so we don't actually diff it. Like I, we still use the manual different code for click outs because that's unlikely to change in the near future.

Andrew Stone: 57:40

And I just didn't wanna like write it. And so was like, just call it a leaf and then we use the manual code. So you can just like escape out to that manual code. Right?

Bryan Cantrill: 57:47

Oh, that's interesting. Yeah. That's okay. So that's actually kinda capturing something that I think is always, like, clever when you can find a way to get from the old world to the new world iteratively. Right?

Bryan Cantrill: 58:01

And that's where you can like, hey, this code works. There's no reason to rip it out. And I don't wanna so you can actually the old world in terms of your manual click house diff can coexist with with the new world in terms of DAFT.

Andrew Stone: 58:13

Yeah. And like, to be fair, it's actually not that hard. Like the whole point of this is to make it pretty easy to diff, but there's a lot of like the way I was structured the printing code for that is somewhat tricky, and I just didn't wanna mess with it at that time. I just wanna get this out and get it merged in so we had something to move forward with. And I was like, I can always go back.

Andrew Stone: 58:30

Like, if if I'm gonna if this structure ends up changing, then I'll, like, do the proper thing. But like there's no need to do it immediately. So it's it's nice to be able to like just ease your way into these type of things, the backwards compatibility.

Bryan Cantrill: 58:44

Yeah. Interesting. And so Rain, what did you kind of swing in here in terms of like taking this really promising start that Andrew had in terms of Daft and making it a bit more generic?

Rain Paharia: 58:56

Yeah. Andrew started on this, and it was just really deeply inspiring because this is, this for me it felt like, oh, this is also a way you can do things. And especially the level of programmer control that Andrew envisioned with Leafs and so on was just truly, you know, it blew my mind, right? So you know, I was like, okay, hey, let me see how we can kind of improve things and make them even more generic. So there were a few little things that I realized just as kind of downstream from what, like I really credit Andrew for having that realization that you can just like pause recursion at particular points and then continue them later.

Rain Paharia: 59:47

So one of the realizations that I had was that, you know, with Diffus, we talked about the same type, right, which was the same trait, which is like EQ except it's also on floats, right? That was like all the same trait did. Right? EQ except it also

Bryan Cantrill: 01:00:04

exists. Right.

Rain Paharia: 01:00:07

Andrew kind of replaced the same trait with EQ because we don't really have floats in our blueprint structures, which, you know, I guess is what you would expect, but, you know, we don't have any floats. But then, while thinking about it, I realized that you don't need to have EQ at all. Like, you don't need to have any kind of expectations on what the value should be like with this leaf structure, right? So I think one of the first things I did was to actually drop the EQ requirement from values. So map keys still need EQ.

Rain Paharia: 01:00:52

Right? And that is just an inherent property of keys. Right? And and that is a property that is giving back

Bryan Cantrill: 01:00:57

to And also, like, if you're using floats to index into a map Yeah. Like, I I I mean, I I like, you're what are you doing? You're asking for trouble. I mean, you're you're a mischief maker, and you deserve exactly the world you're you're getting. So right.

Rain Paharia: 01:01:11

And so, but for values, we actually don't have any restrictions at all. So I think, you know, one of the things that kind of Andrew did was, so he introduced, so with the diffs, like, with, with, say, an eager diff of, like, a b tree set, right, Andrew introduced, four maps. So so it was added, removed, modified, and unchanged. Right? And now added and removed are, like, pretty easy to do, but modified and unchanged, the way you decide that is via the EQ implementation.

Rain Paharia: 01:01:47

What I did was I actually combined those, modified and removed into a single map common, and then if the value types implements EQ, then it provides iterators and basically generators for the modified and unchanged maps. So what that does is that kind of the EQ is like a value add, right? And so it means that if you have any arbitrary type but doesn't have EQ, you can kind of just get this for free. So you know, that was, I think that was one of the things I did. Another thing I realized

Bryan Cantrill: 01:02:25

And so that would force these, so if you did have, let's say, a structure of floats. Right. That you'd be able to get the common, and if you, I mean, you're gonna have a a a compile time error effectively in order the programmers gonna have to resolve this compile time.

Rain Paharia: 01:02:43

Correct. Yeah. And and honestly, this is one of the things I I really like about Rust, which is that you can kind of define these additional methods or, you know, additional functions that are only available under certain constraints that maybe not don't exist on the type overall, and it's one of my favorite bits of Rust. And so in this case, you bring your own EQ implementation, right? Floats have a few different ways to compare them, right?

Rain Paharia: 01:03:08

Bring your own, right? So that's the kind of thing. That's one of the things

Bryan Cantrill: 01:03:15

that I realized, yeah. I I also love this, Ryan, because I think it it kinda it tailors I I really I mean, one of the challenges with, you know, with the the approach that diffus took, not to take anything away from it, but it it made every single programmer pay the penalty for the fact that that floats have this property that make them really I mean, that make them difficult. Right? And anyone who does floating point work knows that that these properties exist, but you're making like everybody else pay the tax for it, which kinda sucks. And you really wanna only have to pay that tax if you're attempting to do something that necessitates it or namely and so I I I really love kinda you you wanna make the simple thing be simple while still allowing for the complicated thing to be implemented.

Rain Paharia: 01:04:04

Exactly. Exactly. And so that kind of also extends to things like, what if your struct has a type that does not implement, you know, DAF or, like, DAF doesn't know about, right? You can just annotate that type with leaf. And so what that means is that that just becomes a leaf.

Rain Paharia: 01:04:25

That is a point of lazy recursion, and then you can, like, bring your own diffing algorithm, just like you would otherwise. So I agree that the model where it's like, yeah, if you're doing something weird like floats, then you kinda have to pay the price for that. Right? It's just inherent. But, like, I agree that, like, you know, other people maybe should just, like, you know, not have to do that.

Bryan Cantrill: 01:04:50

Wait. I just I love that also in general because it just says that you can have look. We're not this is you can use Daft and have different ways of diffing structures. Just have those be leaves, then you can do whatever you want within those leaves.

Rain Paharia: 01:05:02

Yeah. So and so, you know, a practical example is that so we implement we implement this for B tree maps sets and maps and so on, because with sets and maps, there's really just one way to diff them. But we don't implement it for vectors, because with vectors, a vector can be diff by many things, right? Maybe you only care about equality, right, if it's a vacuate, right? Maybe maybe you care about, like, maybe it's a sorted vector and you wanna treat it as a set or whatever.

Rain Paharia: 01:05:30

So, you know, we don't we we don't try to be opinionated in that case, and we just, like, let it be a leaf.

Bryan Cantrill: 01:05:36

Yeah. Interesting. And I mean, I assume that a lot of what we from a practical getting back to the blueprints perspective, I mean, sets and maps cover up. I mean, that's a that's a bunch of it, I assume.

Rain Paharia: 01:05:48

Yeah. Yeah. There is a I mean, there is just so much here that is kind of driven by a very concrete and very nontrivial use case. And for me, I think that is just really important as a general principle, right? When you're designing something generic, it's kind of important to not get lost and be grounded in real specifics, right, about a concrete use case that you have, and then you kind of work from there rather than just like designing something, you know, in in an ivory tower or whatever.

Bryan Cantrill: 01:06:23

Well, and that's the the like the trick of designing something generic is you like, you want that specific use case, but you don't wanna you don't wanna kind of overfit for that use case either. You don't wanna have something that is falsely generic and only really fits the one use case, then you might as well I mean, it's and I think it's part of the reason why as software engineers, we're kind of this is like this kind of constant struggle of am I hitting the right level of genericism here? And I think I mean, think in this case, think it honestly helped because we had something that was that was provided some level of genericism, but then was instructive about how, okay, actually this is not what we want. Want something that is in some ways it's like, diffus was almost too generic.

Rain Paharia: 01:07:05

Yeah.

Bryan Cantrill: 01:07:05

Or in that it didn't leave this kind of additional latitude that the programmer needs around leaves.

Rain Paharia: 01:07:13

Yeah. I mean, I yeah. I I guess I think of diffus as a maybe just like too opinionated.

Bryan Cantrill: 01:07:19

Yeah. I guess yeah.

Rain Paharia: 01:07:21

Yeah, I would I would consider it to be a little too specific in that sense. Right? Like where

John Gallagher: 01:07:26

Okay. Clearly. Yeah.

Rain Paharia: 01:07:26

Yeah. Yeah. Like LCS is not always what you want, for example. Right? Yeah.

Bryan Cantrill: 01:07:33

And then could you talk about the fact that it's a no standard crate, which is definitely an interesting kind of artifact that fell out of this?

Rain Paharia: 01:07:39

Yeah. I mean, you know, this is just, again, like one of those things that kind of worked with Diffus for a bit, and then I realized that there is absolutely nothing here that absolutely requires standard. I mean, if you have a map, then you kind of need alloc for BtreeMap or std for a hash map, right? But if you're just diffing structs, these can be arbitrarily nested structs, and if they exist on no standard, then we can also generate diffs on no standard. And I think Rust has this lovely reference system with lifetimes and stuff, and so we end up using you know, we just have one lifetime parameter, but we end up using that.

Rain Paharia: 01:08:24

So there's no clones or anything. It is just, you know, it is, like, it is all, like, it is a diff that just points into various parts of the original data structures. And and, you know, that's that's one of my favorite things about designing Rust software.

Bryan Cantrill: 01:08:38

So this this is really cool. This wasn't necessarily a goal. Yeah. Right? This is good because we're not.

Bryan Cantrill: 01:08:45

But, you know, we definitely have got a lot of no standard software. I mean, that's obviously all of hubris is no standard. All of our kind of embedded firmwares on no standard. So, boy, is it a nice attribute to have when you can have it? I mean, it just the and because I think it's even I mean, I don't know, like, it sounds like this is your personal disposition as well.

Bryan Cantrill: 01:09:04

It's like, when a crate is no standard, it just tells you a lot about I mean, that you can that it does not perform any allocation. There's like there are all sorts of things I feel I know about that crate because it's lived in this more constrained world.

Rain Paharia: 01:09:19

Yeah. Yeah. And and in fact, you know, like and and so there's kind of no standard and no alloc. And I think, you know, we kind of hit the no standard and no alloc. So, you know, we work with just core, right, which I think is really cool, and that does show that there are no allocations.

Rain Paharia: 01:09:36

For me, I think, you know, so I guess if, so if I wasn't working at OXET, I would not care as much about no standard, I will be honest, right? Is OXNET is the first time I've been dealing with no Great! It's, you know, learn things.

Bryan Cantrill: 01:09:52

Listen, I appreciate your candor, but I find it heartbreaking.

Rain Paharia: 01:09:56

But I think no standard is I agree that like once you have accepted the constraint of no standard for your core library, there are certain decisions that you end up making, and a lot of those decisions often lead to better library design. Not doing any clones, for example, that is just core to the library. And so like, there's a lot of Rust support for that kind of model as well, which, you know, is great. And so, I agree that there's definitely a synergistic kind of thing there.

Bryan Cantrill: 01:10:38

Well, yeah. And I I I just think that there is a a kind of like a, you know, a stoicism in the in the classical sense of the stoics when you are this kind of like deprivation that you force yourself when you have when you get no standard. I think there's there's value that comes, you know, maybe I I I may or should not I'm I'm not gonna pull that metaphor too hard, but the I think it's great that it and because it wasn't a designed goal, but it would and it's a I mean, and especially given that you've now revealed the fact that it is only your affiliation with Oxide by which you care about no standard crates. I was was this just because the Oxide performance review season is coming up and you're trying to I'm I'm kidding.

John Gallagher: 01:11:16

It's a joke.

Bryan Cantrill: 01:11:17

That's a joke. I just but when was it through here that you realized like, oh, wait a minute. Like, we should just make this no standard. Like, there's no reason for it to be standard.

Rain Paharia: 01:11:27

I think there was just I think while I think it was actually the moment where I realized that we could slap leaf, like the daf leaf annotation on maps and so on, if you want to make them unique.

Bryan Cantrill: 01:11:43

Yeah, interesting.

Rain Paharia: 01:11:44

That is the point I was like, oh, you could just treat anything as a leaf. And that kind of led to, Oh, leaves don't allocate, they just have references. Then that led to kind of this.

Andrew Stone: 01:11:57

Brains like undersell, so I said it was my first proc macro, but it was also like, there was a lot of funky things inside there and getting the lifetimes right in the proc macro. I made my lifetimes invariant, and so it wasn't causing a big problem to use it, but I eventually did hit a problem. And I hit that problem after Rain told me that I was going to run into a problem. Lifetimes need to be covariant. And I was like, oh.

Andrew Stone: 01:12:26

And by that point, Rain had already started working on the fix for that. And so it all came together at about the right time and was able to resolve

Bryan Cantrill: 01:12:33

the problem you hit. Yeah. So that

Andrew Stone: 01:12:36

I Oh, I don't remember. I mean, it was just like, it's like I had to add a separate more it's essentially adding more lifetime annotations, and you have to, like, say that this lifetime you have to add another annotation that says this lifetime lives at least as long as this lifetime. And, like and then adding extra lifetimes on all all the, like, structures that follow. Like, you just start infecting things with, like, extra lifetime. Totally.

Bryan Cantrill: 01:13:00

Right.

Andrew Stone: 01:13:00

And so yeah. And, like and I didn't know. Covariant I I I think I've tried, like, couple times to read Brain's blog post on invariance and covariance and stuff like variants and rust. It's actually a little like markdown book, and I have read it. I've read like, it's it, but like I never had like a problem where I needed to apply that directly.

Andrew Stone: 01:13:21

And so I finally went back and reread it and I was like, all right, this makes sense now. So like, it's another thing, like having a problem where you're applying it, like it wasn't abstract at that point, was like, no, this is a real problem. And so like brain fixed that. And it was like, it was a good amount of like stuff to fix, along with a bunch of other stuff. Like I also originally didn't like, didn't give the user as much choice, for instance, for sets in order to not have to derive like the right types and not have to derive like set types for the elements, to drive certain types for the force certain types to implement certain things, I just popped off, like I put them in VEX.

Andrew Stone: 01:14:00

So when you diff sets, you got like a you got VEX instead of sets back, right? And that was kinda wonky. Like for maps, you got maps back, either a beach map or hash map. For sets, you got VEX back. And like, it was kinda weird.

Andrew Stone: 01:14:14

And so brain came up with the modified and the iterator and stuff that they've talked about. Like, there's a lot of changes, but it's very funny. Like, you'll look at it and you'll see like, if you've ever looked at the contributions on GitHub, like this one's particularly funny because like I started it and you see the point where I think I'm done. And then like, at that point, Rain has like touched it a little bit and, like, added some CI stuff and, like, couple minor things. And then it goes on for another, like, week.

Andrew Stone: 01:14:40

And now Rain has written, I think, like, twice as much code as I have in-depth. So I was not done, it turns out.

Bryan Cantrill: 01:14:47

Yeah. Right.

Rain Paharia: 01:14:48

I just wanted to get contributions are a little bit of a lie because most of that is like fixtures I added for testing.

Andrew Stone: 01:14:57

But you can do it. Like users will also It's also proper errors. Like if there was a problem with like the DAF macro, which who knows what you would have gotten before, like, I don't know, go look at it and then like read the code and figure out what went wrong because that's what I was doing. But rain gave it return like proper errors and stuff, which like none of that stuff existed and I didn't know how to do it. So it it would not have existed.

Andrew Stone: 01:15:20

I was also too lazy.

Bryan Cantrill: 01:15:23

It's rain must have been fun to kind of like take this thing and like, okay, we can now get this. You know, especially you got some of these things falling out with no standard and the you've got kind of the right I mean, you say you're kind of energized by by some of Andrew's original abstractions and just getting that totally nailed into something that this like feels right in a lot of ways.

Rain Paharia: 01:15:43

Yeah. Yeah. So, know, just just the I think just the it just gave me so much energy. So I I know we talked about proc macros earlier and I have like a long I don't know. I have very I go with like very, very hot and cold about proc macros.

Rain Paharia: 01:15:59

In particular, error handling with proc macros is absurdly difficult to do correctly.

Bryan Cantrill: 01:16:05

Yeah.

Rain Paharia: 01:16:06

It's like people And keep in mind that error handling is not just for Rust C, it is also for Rust analyzer. And it is very important when you're writing a proc macro that it behaves well with the Rust analyzer.

Bryan Cantrill: 01:16:18

Okay. So, Rain, let's elaborate on this stuff a little bit. First of all, when you say error handling, mean programmer error in terms of like the programmer has an error in terms of the way that's been invoked or?

Rain Paharia: 01:16:31

Yeah. There's kind of, yeah. So there's two kinds of, I guess there's two kinds of errors that, and so we're talking here about program errors, right? So there's bugs in the macro itself, and we will set those aside The macro is perfect.

John Gallagher: 01:16:47

Getting better every day.

Rain Paharia: 01:16:48

Yeah. So syntactic errors and semantic errors, right? And both of those things are kind of important, and both of those things need to be handled with a lot of care when you're writing proc macros. So, you know, it's funny we're talking about the eager and lazy thing. Actually, one of the things I'm pretty unhappy about at SYN, and to be clear, I love SYN, but one of the things I'm unhappy about is that it is an eager parser, Right?

Rain Paharia: 01:17:20

It is it will, like, parse everything. And if anything is invalid, it will just fail at the first error, I think. Right? Or or maybe at the first two errors. And so what that means is that and so, you know, that's fine for Rust C, but it makes Rust analyzer deeply unhappy.

Rain Paharia: 01:17:40

So whenever there's a syntax error, go on.

Adam Leventhal: 01:17:42

Yeah, sorry. I just mean it's sort of not as fine for Rust C, I think, right, because I was going to draw this distinction where TIN will give up immediately. Rust C does a great job of sort of trying its best to then give you some information about what went wrong. Whereas, CIN, you're just like it it's just like unhappy with the inputs. As you say, Rusty, I actually think does a little bit better, but Rust Analyzer does even a step better than that.

Rain Paharia: 01:18:10

Yeah. Yeah. I think you know? And and that is just one of those things that the eagerness is just baked into since design. There are actually other ways to do parsers, and I've been reading up a little bit.

Rain Paharia: 01:18:22

There's an interesting way to do parser where people kind of essentially have this homogenous set of nodes, and then you kind of say that this node pars like and so on. There there's some interesting work there. But but as it stands, you know, I I think you have to be really careful with SYN about that. I think so that that is kind of syntactic errors. I I've written a bunch of stuff around.

Rain Paharia: 01:18:49

So so the place where I ran into was Dropshot, which is our REST HTTP server. We so I I last year, added a macro which lets you generate an API from a trait. That was a whole adventure. That also I did on a weekend, just like Andrew did this, and then he wrote it on Monday, and everyone was like, this is amazing. Think in, but you know, one of the things I realized while actually turning that demo into something real was that the Rust Analyzer error handling was just kind of pretty bad if you just take the standard path with SYN.

Rain Paharia: 01:19:27

Yeah, I think the even worst bit though is when there are semantic errors, one of the things that's just like extraordinarily important is that you highlight the actual part of the code that was wrong. This is a thing that Rust proc macros you do via like this whole, like there's this whole span system with the Rust macros, with proc macros.

Bryan Cantrill: 01:19:51

Yeah, yeah,

Rain Paharia: 01:19:52

yeah. And so you are supposed to carry around span information whenever you're writing anything around proc macros. And it's like so you're not just writing what you want to do, but you're also carrying around exactly which bytes of the file that this kind of identifier comes from and so on. Tracking all that span information is like, it's actually like really annoying. Like, it's actually really difficult.

Rain Paharia: 01:20:19

And so, you know, that was the other bit that I ended up kind of, just kind of working on with DAF to bit, just making sure all the spans were right, and then also writing a lot of tests to make sure that we didn't regress.

Bryan Cantrill: 01:20:32

And we just need echo Sorry, Adam, go ahead.

Adam Leventhal: 01:20:36

If I can just echo that that point, Rain, you were making one of the things we did in drop shot earlier. I mean, one of the benefits of Rust is that when you labor to have a really precise span to point out where specifically the error was from, Rust seed does a really good job

Bryan Cantrill: 01:20:51

of Yeah.

Adam Leventhal: 01:20:52

Guiding the user. So it rewards that effort. And then, Rain, there was that new annotation too, right, that diagnostic annotation. Is that right?

Rain Paharia: 01:21:00

Yes. Yes.

Adam Leventhal: 01:21:01

That can that can further guide users if you're like, if you are, say, missing a trait implementation, you need it, and it gives you additional description. But so that's something in drop shot. We worked really hard to make sure that the I would kind of say that when proc macros drive off the rails, the error messages you get by default are completely inscrutable. We worked really hard not just to have spans that are appropriate, but even to generate code that forced errors that with greater levels of comprehension. But to to your point, I think that a a kind of novice approach to proc macros are kind of first tier of proc macros is to just write the code, omit the code.

Adam Leventhal: 01:21:44

It's great when it works and terrible when it doesn't. But there's, like, 10 times as much code, 10 times more code rather that you could write to have really precise and helpful errors.

Rain Paharia: 01:21:56

Yeah. And and, you know, I feel like this actually mirrors compilers in general, where I think there's, like, compiler errors, I guess that's that's what they go by. But, like, I think one of the things I pointed out is that you like, a good a production compiler is, like, 80% error handling. And Yeah. You know, a production proc macro is, like, it's maybe not 80%, but it's maybe like 50% error handling.

Rain Paharia: 01:22:25

It's just it's just hard. It's users can type in whatever they want and kind of have to behave, like, at least somewhat reasonably.

Bryan Cantrill: 01:22:32

And when we used semantic errors, like what kind of errors are you what would be an example of a semantic error when using DAFT?

Rain Paharia: 01:22:41

Semantic error might be something like, for example, so you try and diff a struct, right, and a struct has a particular field. That field not implement the diffable trait, right? Right. And so when there is an error, you want to actually annotate the field saying that that doesn't implement diffable rather than just being like, know, pointing to the macro invocation itself. So that is an example of a semantic error.

Rain Paharia: 01:23:15

So that's not a syntax error. That is just a, it is resolved after proc macros are generated during the type check.

Bryan Cantrill: 01:23:27

Right, got it. Okay, and Zee, wanna be able to Sure, you wanna highlight that for the programmer for Rust C, but then also you got this Rust Analyzer piece to it.

Rain Paharia: 01:23:33

Yep, yep. All of those depend on doing the right thing here, and doing the right thing here is hard. Yeah.

Adam Leventhal: 01:23:40

And just to on that Rust analyzer part, I think the I've come to appreciate, Rain, and and I'd I'd love to get get your take on this, that the best kind of most complete proc macro would actually generate lots and lots of code even in a failure case. Would generate compiler errors as well that get omitted, but it would also just try its best to generate all the code it could to minimize the impact to something like Rust Analyzer. Because when your proc macro fails and generates nothing, then Rust then Rust analyzer gets very sad. Right? Like, all so all of a sudden, all kinds of far flung parts of your code are no longer valid because, say, you haven't implemented the trait after all.

Adam Leventhal: 01:24:21

So even generating a trait implementation whose whose implementation is a to do may help Rust Analyzer.

Rain Paharia: 01:24:29

That is yeah. That is very true. With I think with Rust Analyzer in particular, it's like, you know, even if there is a syntax or semantic error, you really expect that you can control click on things. You really expect, you know, find references, like, all of those things to work. And with a failing proc macro, you really have to care a lot about that.

Rain Paharia: 01:24:53

And the thing is that usually when people write Rust code, they may be returning result type, right? And the result type has an Okay type, a variant, and an error variant. And it turns out that that is actually exactly the wrong model here, because what you want to do is you want to collect all of the errors. You don't just want to bail on the first error. And you also want to generate as much code as possible.

Rain Paharia: 01:25:19

So you actually want to retain as much as possible rather than just giving up on the first error and generating nothing. A model that is completely different from the usual way you write Rust and it's closer to the way you write compilers.

Bryan Cantrill: 01:25:37

Right. Well, because when you're writing a proc macro, like, you are writing part of the compiler. Yeah. That's true. You you kind of you've crossed the Rubicon and you are you're longer writing the system.

Bryan Cantrill: 01:25:48

You're writing the compiler.

Rain Paharia: 01:25:50

Yeah.

Bryan Cantrill: 01:25:50

So we we we get the point where with, you know, Rain, you're able to to kinda take this thing and and and I think extend it a little bit, polish it in some important dimensions. But then we get to this artifact that is like, actually, this is that now we've got something that feels like we can really begin to use in a bunch of different ways. And, John, I don't think it was too long before you found a new use for this thing.

John Gallagher: 01:26:16

Yeah. I don't know that new use is is the way I describe it so much is like, earlier we were talking about like, you start down a path and you realize pretty quickly that this is the right way to go. And that was sort of the experience I had. Like Andrew had landed this thing with Daft and then I had to make this change. I was like, Oh yeah, Daft was absolutely the right way to go.

John Gallagher: 01:26:32

So I think what happened was I was the first person that had to make a non trivial structural change to the blueprints after this DAFT code had landed. Oh, interesting. So before I started in on what I was working on, the blueprints had four different maps that were all keyed by slit identifiers for different parts of their configuration. So when blueprints were first created, all that had in it was zones. There was a map of the slit has this set of zones.

John Gallagher: 01:26:59

As new things were added, we added new top level maps to the blueprint. So here's the set, here's a different map that says which disks are supposed to be active on each sled. And here's a different map that says which datasets are supposed to exist on each sled. And like I think at the time we thought that putting those in a separate maps was reasonable. In hindsight, it was definitely a mistake because you end up with you've these four maps that are all keyed by sled and they may have like there's no static requirement that their keys are the same.

John Gallagher: 01:27:27

Like you might have a sled present in one of the maps and not present in another. And in fact, that was true because each map sort of made different choices about when it was okay to drop a sled out based on when it was decommissioned

Bryan Cantrill: 01:27:37

or removed

John Gallagher: 01:27:37

Bryan Cantrill: 01:27:37

Oh,

John Gallagher: 01:27:37

yeah. So it just became this sort of kind of a mess to reason about. So the change I was trying to make, or the change I did make was to merge those maps down into a single map. So it's now just like a sled configuration. So we have one map and then inside of the values for that map, have the zones, discs, datasets, etcetera for each sled.

John Gallagher: 01:27:57

So Andrew in landing the daft code, Andrew had to do a bunch of workarounds is maybe the right way to put it to account for these maps that might have different content. So like if you're trying to diff one blueprint to the next and a sled is present in the discs map in one and not present in the disc map in the second, but it is present in the zones map for both, like that's sort of like you can imagine that being confusing for the diff output, right? There was all this work of saying like, if it's present in the zone maps on both, then include the sled in the diff, even if it's not present in any the other maps, etcetera, right? So I did the sort of structural work to land the map. I didn't have to touch any of the diff encode to start.

John Gallagher: 01:28:41

But as I was going through it, you'd have to look at like the commit history on the PR, but it's like, I did this work, made, you know, the tests are all passing. And then there's this to do of like, we've got all this like manual workaround where essentially what we were doing is like Andrew had to eagerly do some extra diffing, like some of this recursing into leaves that we talked about earlier in order to account for discrepancies between these maps. So as I made my first pass through it, I left this to do to come back later. I was like, maybe we can just get rid of all of this actually. Now that these maps are merged, we may not need to evaluate any of this at all.

John Gallagher: 01:29:13

And so after getting the thing, you know, mostly working, went back to this and started to delete these eager recursions and that required updating all of the tests because they're basically the only user of this thing, right? There's OMDB which is the production use that essentially just emits human readable output. But for every line of code in OMDB related to diffs, we have 30 lines of code and tests scattered across the code base that are trying to compare Like the planner has tests, the executor has tests, the builder has tests to say, if I make this change, what changes in the diff? What changes from one Blueprint to the next? And in tests, you're not looking at the human readable output.

John Gallagher: 01:29:51

You're trying to say, I tried to change the configuration on this sled. If I look at the diff, does it show that that configuration actually changed in the way that I thought it should have? So as I'm removing all of these sort of eagerly diff things that we used to have to build ahead of time, And I'm going back and replacing them with Daft Code. I'm like, okay, so I now have like the top level thing is just a sleds map. So the thing I get from Daft is just a leaf node that says before the sled, for any given sled, I have a before config and an after config.

John Gallagher: 01:30:24

And I go into the test and I start reading it like, this test doesn't care about the top level sled config. It cares about what were the changes inside the zones. So how am I gonna do that? And I landed on, like we alluded to this earlier, but this is my favorite function in DAF. So I'll put this in the text chat, but it's leaf, a method on leaf called diff pair.

John Gallagher: 01:30:45

So if you scroll to the, like if you look at the leaf definition, as we said earlier, it's just a before and after with no requirements on the types. They don't have to admit equal, don't like nothing. It can be any type at all. But diff pair is implemented if that type is itself diffable, then this just runs DAF again on that value and gives you a diff of the before and after value. So in the case of these tests, right?

John Gallagher: 01:31:10

I can, the test was expecting that a sled was modified and then it wanted to look at how the zone configuration for example, was modified. So I can grab the leaf that says the sled was modified. Here's the before and after sled configuration. Then I called diff pair on that and it recurses into the sled and says, here's how the zones were modified. Here's how the disks were modified.

John Gallagher: 01:31:30

Here's how the datasets were modified. It became like, the tests, like basically you add this method call and change a couple of field names. And it works just like it was doing before when it was looking at all of this manually compiled sort of like mess that we had to had to put together, that Andrew had to put together to land this stuff in the first place.

Bryan Cantrill: 01:31:49

Yeah. Interesting. So the and this is a real vindication of this kind of what Andrew is calling a lazy approach, but it's I mean, oh, well, I guess as programmers, we view laziness as as one of the I mean, this what what are the three I mean, I guess

Andrew Stone: 01:32:03

the Yeah.

Bryan Cantrill: 01:32:03

Does Larry Walker laziness, impertinence, and hubris, are they

John Gallagher: 01:32:08

that are the

Andrew Stone: 01:32:09

Great virtues.

Bryan Cantrill: 01:32:11

The the the the three virtues, right, of of software engineering. And I think that the because on the one hand, it it it is it's that that laziness, right, John, that is allow I mean, this is a real embodiment of what one of the kind of the the design parameters were for Andrew. Like, okay, this is now actually we're seeing the code that can now be ripped out because we are actually only caring about this bit. And then I wanna dive deep on this bit and ignore these other things that I don't care whether they change or not.

John Gallagher: 01:32:40

Yeah, that's right. There are a bunch of tests just care about how many sleds were modified. And in those cases you just look at the map diff you get back from DAFT. But if you do care about how the sleds were modified, you can then choose to recurse into the particular fields that you care about. It's one of those like, don't know, Rain, if you want to talk about where the idea of the diff pair came But I feel like it's like one of these things where it's like, oh, we can be generic in this way.

John Gallagher: 01:33:07

But also if you happen to also be diffable, then we can give you everything you could possibly want via this one method.

Rain Paharia: 01:33:15

Yeah. Yeah. So, you know, like for me, like if you click through it and look at the source for diff pair, it is simply self. Before. Diff, self thereafter.

Rain Paharia: 01:33:26

It is something you could do yourself, right? But the thing I really wanted to do is I really wanted to hint that this is a thing you can do, right? And so that API is like, it is There have been a couple of other times where I've seen APIs or added APIs that which you can do yourself, but it is not obvious.

Bryan Cantrill: 01:33:49

Yeah. I love that. I love that.

Rain Paharia: 01:33:52

And so diff pair is just that. It is just a one line function, right, that you can implement yourself. Like, just giving you a hint that, oh, this is a thing you can do kind of is like, at least for me, a moment of enlightenment where it's like, yeah.

Bryan Cantrill: 01:34:05

The fact that you also market as in line too, to be like, look, this is basically an alias for this thing that that is, but it kind of teaches you something about the way this thing is implemented.

Rain Paharia: 01:34:15

Right.

Bryan Cantrill: 01:34:16

By giving, yeah, I, I really like that. That's, that's really interesting.

John Gallagher: 01:34:20

It does. I mean, this is kind of a minor point, but it is a tremendous improvement in readability because especially in tests,

Rain Paharia: 01:34:26

like you don't, you don't, you tend

John Gallagher: 01:34:28

to just sort of like shove a bunch of stuff onto one line, right? Like you're like blueprint dot diff dot sleds dot modified dot blah, blah, blah, So you end up with like five or six identifiers all chained together. And then that is what self is in this context, right? So if you wanted to call this method yourself, you know, call before dot diff self dot after replace self with like, five chained function calls that you now need to stick into a locals, and you got an extra one.

Bryan Cantrill: 01:34:52

Stick a local. Yeah, yeah, right. Yeah, you make it readable at all.

John Gallagher: 01:34:56

So having the method, it's not just a hint that you can do this, but it's also like a non trivial readability improvement, especially in tests.

Rain Paharia: 01:35:04

Yeah. Yeah. Yeah. That actually, I think I ended up needing that. When I was, like, using it, I was like, oh, yeah.

Rain Paharia: 01:35:10

I think having being able to chain methods in in a fluent style would be useful. So, yeah, deferrals came from that.

Andrew Stone: 01:35:18

I wanna be clear. I just think that just like ran proposals to me and I was like, yeah, sure. It sounds fine. Like, I don't know, can't you just before and after, can't you just do it yourself? So I like this idea was so far outside.

Andrew Stone: 01:35:29

And then like, soon as it was implemented and I saw John use it, like, immediately pinged him when I was reviewing that PR. And I was like, oh, this is awesome. Like, this is really powerful. This is very funny to like because it is just one line. And I was like, why can't they do this themselves?

Andrew Stone: 01:35:44

Now you see the Frost analyzer

Bryan Cantrill: 01:35:45

as well. And it must have been very vindicating all around John to get to kind of your When you say it was like very validating of the design decisions we've made like, okay, this is definitely the right direction. And we've got and now we've got something again that we're gonna be able to because we've got a lot of work still to come on all of this and this is stuff we're gonna be making a lot of use of and having this kind of really robust foundation underneath us now just feels it it feels like, you know, Rain, when you and I were talking about this, we're talking about Paul Erdish would talk about, you know, a mathematician that not only lived to a long age, but was on amphetamines his entire life. He didn't have kids, so he basically just did math his entire life. And we talk about proofs being like in the book.

Bryan Cantrill: 01:36:31

And not to, you know, not to get to not to imply that we're, you know, that we're advancing math here necessarily, but it feels like this is like a crate that's in the book that we've got something that is for our use case is really a tight expression of what exactly what we needed.

Rain Paharia: 01:36:53

Yeah. And and, you know, I feel like a lot of

Bryan Cantrill: 01:36:55

the

Rain Paharia: 01:36:55

process, again, not to say we're advancing math here, but a lot of the process that we all went through was not dissimilar from a process that a mathematician goes through, right? Where you're kind of poking around the unknown, you're trying to figure things out, and then suddenly have some clarity, and then, you know, you can kind of, kind of it goes from there, and then it just feels right from there.

Bryan Cantrill: 01:37:18

Yeah. And I think, you know, Emily had said this earlier in the chat, I think it's a really it's worth understate the underscoring that the you know, people are talking about how we can use LLMs as part of software engineering. And yes, there's some mechanical parts where it's great to have, you know, LLMs assist us on, but this is the art of software engineering that is like, this is really the art and the craft and the stuff that comes from kind of the wisdom of software engineering. This is not something that is that that you're gonna get get an LLM telling you about. Or like, don't know, maybe you would.

Bryan Cantrill: 01:37:53

I don't know. It'd be interesting, but they I I I think that, you know, this is really where and kind of where do you draw those lines of abstraction Yeah. Is that the stuff that you're just this is truly what is still required of software engineering. It's still why software engineering is hard, but also delightful is that you have so many degrees of freedom and it is so abstract at some level, but yet very concrete. It's really what makes it interesting.

Rain Paharia: 01:38:23

Yeah. I I just feel so much of this just comes from experience and, like, you know, building this judgment and, like, you know, like Yeah. Understanding of these things. It's just so much of this is, you just have to keep doing it, you know, for years and years and years, and then, you know, hopefully get better at it. It's just, it's hard.

Rain Paharia: 01:38:41

It's really difficult.

John Gallagher: 01:38:43

Yeah. I remember messaging Andrew as I was putting my PR up. I said, Hey, I know that you were struggling with like, should I still be working on this? How much board should I put on this? If you had any lingering doubts that this was the right way to go, check this out.

John Gallagher: 01:38:54

And it's like all this manual error prone stuff is now gone. And we get to just use the thing that the computer is doing the right thing from the get go.

Bryan Cantrill: 01:39:03

Yeah, that is great. And I mean, John, I mean, great for you to kind of like feel that, but then also Andrew, that must just been like really vindicating like, okay, yes, this is all It was amazing

Andrew Stone: 01:39:16

because it was really like a struggle. Like I don't like, it's not, it wasn't a struggle in terms of like, I'm doing this work. It was a struggle is that like, I'm blocking, I had other outstanding PRs that I was waiting to like waiting on this work. And like, we hit a couple bugs in like testing, a few times where like, my outstanding PRs would have fixed that. Right?

Andrew Stone: 01:39:36

And it's just like, oh, like, am I really working on the right thing? So it did feel good to see that. And like, now that I'm done, sure. It's the right thing. Yeah, why not?

Bryan Cantrill: 01:39:46

Well, I think this also kind of gives insight into why it can be hard to get total forward visibility into how long software takes and that, you know, it can take some kind of winding paths and that you can have these kind of lingering doubts, but then, you know, when you once you get to that foundation, it's so clearly the right call. And again, we will be using this over and over and over and over again. That it's part of what makes it interesting, but it also makes it makes it tough, makes it hard. You gotta, you know, put up a putting up a PR on Friday and then realizing that like, shit, that's all wrong. I actually, I need to I've got a totally different way of doing it now.

Bryan Cantrill: 01:40:34

Well, this was awesome. And I, you know, really there's so much to appreciate here. I also just have to say that I I love the collaboration that we've had up with a bunch of different people jumping in on this and kind of picking it up at various moments. And obviously, Andrew, you know, lot of that kind of fell to you, but then also having, you know, kind of rain pick up the baton and then John having you actually, know, begin to realize this in terms of the way we actually use this in the real system, which think is it's great. It's really inspiring when when get a team that comes together.

John Gallagher: 01:41:16

Yeah. I could not I could not ask for better coworkers.

Rain Paharia: 01:41:19

Yeah. Same.

Bryan Cantrill: 01:41:21

Well, it was it was a lot of fun. It was all it it will I love when a plan comes together even though I even in this case, it's the right plan. I'm not sure what the plan was other than keep grinding on these abstractions until they feel right. And in this case, we got one that's that is really terrific. So great stuff and really excited to to see and again, on our most pressing problem.

Bryan Cantrill: 01:41:48

So really excited to see how we're we're gonna use I think and and the fact that it's no standard, I in RAIN at some some point, you know, we're gonna be using death and hubris.

Rain Paharia: 01:41:58

That was my goal.

Bryan Cantrill: 01:41:59

Yeah. Yeah. There you go. It just feels like because again, this is one of these tools that once you have it, you find new uses for it. So really exciting stuff.

Bryan Cantrill: 01:42:12

Well, you all. Really appreciate it. And thanks for walking us down this the the tail and now anyone of course, this is all open source. Everything we've been talking about is open source by the way. Of course, I think people know that.

Bryan Cantrill: 01:42:27

But so you can go check out blueprints yourself in terms of Omicron. I also made and I I I hope this is gonna be okay with Dave, but I did make RFD four fifty seven and 4 50 9 public. So people can go check those out too. If they wanna understand a little bit of the context for what blueprints are, kinda where we got here, and then you can begin to pair that with some of the implementation. But of course, you don't need to you don't need to know any of that to go use Daft yourself.

Bryan Cantrill: 01:42:53

That's just a crate. It's no standard crate. It's got you can use it just about any context. So have at it. Alright.

Bryan Cantrill: 01:43:02

Thank you everybody. And thanks again for joining us and we'll we'll see you next time.

Creators and Guests

Host

Adam Leventhal

Host

Bryan Cantrill

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere