Software Verificationpalooza

Speaker 1:

One question I had, like Lollapalooza is still a thing, I guess. I

Speaker 2:

was just, I, as I was reading that, yeah, I was like, is this going to be the best reference?

Speaker 3:

Yeah. Yeah.

Speaker 4:

Yeah. It is a thing. And, it is the most mainstream musical music festival that exists right now, as far as I know. So Oh, really?

Speaker 2:

Way to be mainstream, Brian.

Speaker 1:

No, mate. Okay. Because I mean, it's definitely a very gen x phenomenon. And, but it's still but it is, it it has continued. And, because this was a, like, a Jane's addiction thing way back in the day, and then it was a festival that brought in a bunch of different kind of eclectic kinds of music.

Speaker 1:

And then, like, the idea of, like, Palooza as a suffix. I kinda went down that rabbit hole.

Speaker 2:

Dude, I I I was right there with you too, and I was like, is this maybe as durable as gate as a suffix? Maybe slightly less dirty? Oh, we need a palooza gate.

Speaker 1:

Or is it Eatpalooza?

Speaker 2:

There we go.

Speaker 1:

We could go.

Speaker 2:

Why not both, but very different things.

Speaker 1:

Very different things. Yeah. I would you rather go to would you I think Palooza gate, which which you prefer, Palooza Gate or Gate Palooza? Gate Palooza sounds kind of like, oh, like, a

Speaker 2:

multidimensional It's like all these categories. Oh, yeah. I think it's like a scandal ridden administration. It would be

Speaker 3:

this is

Speaker 2:

just the gate palooza.

Speaker 1:

I he's in gate palooza, like, many different stages with different scandals in every stage. Like, oh, man. Go to the Pizzagate stage. Those they absolutely nuked it. That was just crazy over there.

Speaker 1:

Exactly. Crushing it over there. Yeah. So I think I I would prefer to go, but then I I kinda like, clearly, Palooza gate is obviously the scandal that brings down the Alpalooza. I assume.

Speaker 1:

Right? I mean, we agree on this.

Speaker 2:

Kinda be.

Speaker 3:

Got it.

Speaker 1:

Right. And then the other, like and I'm wondering if you went down this rabbit hole as well. The, like, validation versus verification that you get to did you know that I just that I kinda, like, changed up on you as we were talking about this. It went from

Speaker 3:

Who?

Speaker 1:

Yeah. It went from, validation palooza to verification palooza.

Speaker 3:

Mhmm. And

Speaker 2:

I didn't notice that.

Speaker 1:

And, amazingly, this is something that people got strong opinions about. I so I don't even I don't even know which which. So, anyway, here we are, at at Palooza Gate. So I I went with, with verification verification Palooza. So now I know we've taken more detail than anyone wanted to know.

Speaker 1:

We come up with our dumbass title. It's like, can't you jokers just, like, stick to Simpsons references and be done with it? So we've got Greg and Rain here, and, we're talking about software verification and because I think there's a lot that is here and a lot that we that different folks have done certainly at Oxide, but we know all well beyond Oxide as well. And I I wonder if maybe we could start, Greg, with you and because, a a couple of months ago, you were demonstrating for the team how you used, TLA plus to actually do some model verification of some software that you were working on. And the thing that I loved about this is you got there from a concrete issue that you were working on.

Speaker 1:

So do you wanna describe a little bit about the issue and kind of where you got to and and how what model checking is and and, how it was valuable for you?

Speaker 3:

Yeah. Sure. There's there's a whole there's a whole lot to unpack there. I wish it was only a couple of months. Believe it or not, it was all the way back in January, that we did the the demo that you're thinking of.

Speaker 3:

And, it's just been been trying to implement all of these things since then. But, yeah. So I have, been working for quite a while on various aspects of live migration of instances in our system. And the last week's show, we talked about some of the, well, not we. I wasn't here, but y'all talked about some of the various, aspects of what it takes to to take a running virtual machine and move it from 1 virtual machine monitor to another potentially on a different machine.

Speaker 3:

So there's there's all the mechanical pieces of that, and I've worked on some of those. But lately, I've been focused a lot on the, on the orchestration process for this. So given that we, we want to move a running VM from 1 compute sled to another, How do we get our control plane to coordinate all of that and stand up the appropriate VMM processes and get everything to invoke the right things at the right times and monitor all the all the state and everything like that that, that needs to be monitored in order to to get one of these things moved over. So the the process for doing this is is a pretty, gets to be a pretty gnarly distributed systems problem in in relatively short order. You've got 2 separate virtual machine monitors running around as part of your migration process.

Speaker 3:

Those are both reporting status to 2 separate processes that we call the SLED agents. These are the the processes that run 1 per compute SLED that coordinate all of the activities on the sled. Those are reporting status back up to the Nexus, which is a separate application that is the doing all of the main control plane business logic and that has, that hosts all of the external oxide API endpoints that you go and invoke. So you got all these different things talking to each other. The nexus is off talking to a database that it, you know, to which it it reads to which it writes and reads, and from which it reads the the state of individual instances so that it can coordinate various, various things that are happening, and respond to API requests correctly.

Speaker 3:

So you get all these different actors running around. They can crash at various times. They can go away at various times. Their responses to each other could be delayed by arbitrary amounts of time. Just standard, you know, concurrent all all kinds of interesting concurrency in there.

Speaker 3:

So when I started designing the, the live migration procedure that the control plane runs, I I I did one draft, and it was a pretty, you know, pretty shallow draft. Just here's generally what the steps look like. Okay. That looks pretty good. So then it is

Speaker 1:

And the second draft, like, in your note. Okay. Right.

Speaker 3:

Is it my not yet. Yeah. Yeah. So this was this was the draft of the design document that turned into one of our request for discussion. So that for the

Speaker 1:

So you're not yet writing code. You're just kinda writing down. Okay. Got it. Yep.

Speaker 3:

Yep. That's right. Yeah. So just trying to get all of this designed on paper so that, you know, I could I could put this into a dis into a request for discussion and have folks at Oxide review it so that because it's a lot cheaper, of course, to deal with design issues much earlier in the design process than, hey. Look.

Speaker 3:

I implemented this thing, and it's 5,000 lines of codes, and none of it's right. And the and the algorithm is fundamentally broken and and that sort of thing. Go ahead, Brian.

Speaker 1:

We well, so this is kind of an interesting because, I mean, I think that there are some things that you really need to just, like, like, get in and prototype to really iterate on. Mhmm. And then there are other things where you're just, like, actually, you're just gonna get into trouble by and when you get this distributed system and you got a lot of pieces moving and a lot of possible failure modes, getting in there and, quote, just getting something working sometimes sends you off in the wrong direction because you act if you you really do need to, I think, sit down in your notebook and be like, what actually are the steps that needs to happen, what can go wrong in each of these steps? And you are almost disturbed by a prototype sometimes.

Speaker 3:

Yeah. I I think I think that can be true in some cases. And I think that for the, for the sort of system that we're talking about here too, there's also just a lot of things that we know we knew upfront from other operations that we had written and other code that we'd done, we're we we knew that there were certain requirements that we were gonna have to meet. So this gets into things like we use, we use a scheme called distributed sagas to do our to do certain long running operations in the control plane. And the idea is that you take a complex task and you break it into individual units that can be done idempotently and that can be undone.

Speaker 3:

And that allows you to set things up in such a way that you can guarantee that your entire task either completes or gets undone and the the prior state gets restored, even in the face of your executor crashing in the middle, in the middle of what it is that you're trying to do. So we have a bunch of requirements around that that we knew we were gonna have to, uphold. And I wanted to I wanted to be able to go through and say, yeah. Every step of this migration procedure is going to, going to uphold all of these properties that I know we're gonna have to uphold because we, like, just talked about this a few when I was writing this a few weeks ago, there had been a control plane meeting where we'd sat down and gone, yeah. We really gotta be careful about this and make sure that this stuff's getting verified correctly for all of the things that we're writing.

Speaker 3:

So, yeah, there there was this this definitely had a lot of design upfront aspects to it because I knew that, there were there were a lot of things we had to get right, and I definitely wanted to try to make sure that I was thinking them through correctly, before trying to before trying to get get into having too much fun in the, in in the code itself.

Speaker 1:

Yeah. Totally. Yeah. You gotta you gotta know the prototype and know the waterfall, and this is an example of when you wanna waterfall.

Speaker 3:

Yeah. So so when I when I started doing the design document for this, like and and just making sure that I had what I thought was a reasonable thing in my head. Like, the I did like I said, I did a first draft of this, and it was it was pretty straightforward, and it looked it looked okay. So then I did a second draft where I started fleshing it out. And then I I wanted to do, like, the thing that I was actually gonna put up for peer review.

Speaker 3:

And the 3rd draft started out okay, but about halfway through, I pulled it up to look at it again. I I still I still have this around. About halfway through, it starts to to devolve into this long list of, like, to do x x x. There is no way this design could possibly work. Here is an 11 step race condition involving 3 different sleds that causes us to leak the fact that we're migrating this instance and we never think it's running again.

Speaker 3:

And here's a different 11 step race condition that makes it so that the instance is running in 2 places at once. And eventually, as I was looking at these, I realized, Hey, wait a second. Like a variant of one of these races is actually a problem that we have today. We have a bug where, due to the actions of the, of the I mentioned the distributed sagas earlier. The saga that's responsible for creating and starting a new instance, it's possible to have a sequence where we we invoke one of these, and it starts your VM.

Speaker 3:

And then the the control plane process that's doing this immediately crashes before it records that the VM was successfully started. But your VM has started, so then the guest software running in there shuts the VM down. Now it's supposed to be stopped, and we go back and replay the step that starts it because we didn't get to the end of that step yet. And now you have a VM that's running even though the last expressed user intent was was that you were, was to stop the virtual machine. So it would I I remember, like, I came up with this on a Friday and came back on Monday and kinda, you know, started nagging some people about, hey.

Speaker 3:

Is this a real thing? Have I identified a real issue? But that was about the point where I decided, alright. You know what? Thinking about it really hard is not gonna be sufficient here.

Speaker 3:

Like, this this system is too complex. There's too many ways to interleave all of these concurrent operations. It's time to break out the big guns. It's it's time for formal methods. And and that's It's

Speaker 1:

formal times.

Speaker 3:

And and that's what led me to get out all the TLA plus material again and what led to the demo you were talking about at the beginning.

Speaker 1:

So could you speak to your own origins, respect the TLA plus? Because I think you got exposed to this at Microsoft. Is that right?

Speaker 3:

Yeah. So just just as background, the the whole the whole discipline we're gonna or I'm gonna be talking about here, is is this discipline of formal methods, which are the, the the practice of describing systems in a very mathematically rigorous way so that we can design them and specify them and hopefully verify them. Not that I'm I'm gonna go go all in on verify here. I I'll probably I'll probably slip up and validate, like, a third of the time, but we'll we'll go and verify. Using using the tools of mathematics and and formal logic that are available to us to reason, you know, very, very formally and powerfully about the about these systems.

Speaker 3:

Yeah. I mean, it in terms of my own history with it, I, I think I first came across TLA Plus specifically back in, like, 2010, 2011, something around there. I was working in the Windows kernel team at the time, and was working on synchronization objects, up in user mode and had a colleague who worked on some of the, the kernel mode counterparts to these, some of the kernel mode synchronization objects. I was I was building something new and having a lot of trouble, like, getting all of everything to work right and getting it not to just, you know, crash at various points. And my colleague was like, hey.

Speaker 3:

You know, here's this here's this thing for formally, you know, specifying the the different steps that these algorithms can take and, you know, like trying to come up with stronger inferences about them. I can't remember if he mentioned that it had a model checker, at the time at the time or not. But I was a couple years into my career, and I was basically like, yeah. You know, I'm really stressed out about getting this thing delivered. So Yeah.

Speaker 3:

I'm like, I nope. I don't know what I I I am I'm I'm gonna forget about that for now. Like, I'm I'm way too busy to deal with this. Some years later, I started I forget how I did this, but I first came across, Hillel Wayne's writing. And Hillel is a, is a consultant and writer who specializes in formal methods and in TLA plus specifically and wrote, the the TLA tutorial at learntla.com.

Speaker 3:

And, there's in addition to all of his writing about, about formal methods, there's a lot of interesting stuff that he's written about, software, again, software verification more generally and about, about engineering practice and engineering as a profession, and it's just like bread and butter stuff for me that I find really interesting. So, that was kind of a nice return to it a little bit. And then in, it was a reminder that this stuff existed. Then the first time I think I actually, like, used TLA for anything reasonably practical was in 2019. I was on leave of and had a little bit of downtime and used it to reproduce a bug that I had once written into the win 32 critical section that could cause, a critsec to be, not held, but have a waiter who was blocked on it.

Speaker 3:

And it it turned out to be, like so the the issue was this complicated ABA problem in the state of the lock where, like, you you get somebody who acquired the lock and then dropped it, and they thought that, like, the lock state hadn't changed, and it was somebody else's job to wake waiters. But in fact, the the somebody else had left the lock long ago. And what I remember about this bug was that I had written it, and then it had gone through, like, the first step in in the Windows engineering and integration process to get to a slightly broader internal audience. And then we started getting, like, these occasional reports of, like, hey. You know, in stress testing, this thing is failing.

Speaker 3:

We're seeing these locks get abandoned. So back the thing out, and and it took, myself and 2 colleagues of that team a few days, really, to figure out what was going on. Like, one of one of my teammates came in, a few days later and said, okay. I was thinking about this last night. I think I finally got it.

Speaker 3:

What if, like, this and this and this and this? And we looked at the code and we realized, hey. That's exactly right. Like, that's what's going on here. We fixed the bug.

Speaker 3:

So that was, a couple years prior, but in 19, I decided, okay, like, you know, as an exercise, I'm gonna try to reproduce this and and see, like, you know, what what we would have gotten out of a TLA plus model for it. And I was very impressed upon by the fact that the model checker found took the bug that took probably 50 engineer hours to find and pinpointed it in about 25 minutes And then verified the solution to it in another 25 or 30 minutes. That was that that made a big impression. It was it I was very impressed upon by what a powerful tool this was. For reasoning about all of these different ways that we could get these different interleavings of different threads working on, working on this lock, and that we could make sure that there wasn't any deadlock and all of our appropriate invariants were upheld at each and every step.

Speaker 2:

Greg, I love that in particular because you started from, you know, I don't have time to learn this thing to holy crap. If I had learned this thing, it would have saved me so much

Speaker 3:

time. Oh, man. Yeah. I I I was I definitely kicked myself a few times after realizing, boy, this would this would have saved me a whole lot, like, back when we when we ran into this. But, youth for for one thing, I was pretty early in career when that happened.

Speaker 3:

So, you know, I will I will give myself a little forgiveness there and better late than never. It's nice to be able to use all of these tools now.

Speaker 2:

Well, after the ignorance of youth comes the hubris of experience. So, some of us transition just, you know, from one to the next.

Speaker 3:

Well, so, I mean, a lot of my philosophy with these tools too, is that I I wanna use these because I know I'm gonna make mistakes, and I know that there is no way I'm gonna be able to think hard enough about any of these problems to be able to actually, like, suss any of this stuff out to begin with. And and so it's, you know, for for me, I I really like having the extra assurance that comes from having the computer do the really hard work of thinking about, all of the different ways that things can be interleaved and can interact with one another and having it, you know, hold up the various problems that we we can get into in designing systems like this. So, it is if if nothing else, experience has taught me that, yep, I write a lot of bugs, And it is nice to be able to head as many of those off at the pass as I can manage, and these are nice tools for, for doing some of that.

Speaker 1:

Okay. So with that, I mean, do you want to explain a little bit what t o a plus is, and what what is a model? How do you generate it? How what what are we doing when we're actually checking them?

Speaker 3:

Sure. Yeah. So so so TLA is the is the temporal logic of actions. The plus is for some set theory stuff that I, you know, candidly did not take enough discrete math and and understand. Is the plus in superscript?

Speaker 1:

It looks the pluses in superscript, which is like up there with the UNIX being in small caps as a way of just angering people from a typesetting

Speaker 3:

perspective. But I am, I mean, the way I have not it might be typeset and superscript in some places, but no one that I have seen is gonna get particularly antsy about just having it in regular regular script.

Speaker 1:

I'm sure there's someone who only uses superscripting. Like, I told you these these people are slobs. These people who don't put it in superscript, they're slobs. Anyway, sorry. Pretty indicative.

Speaker 3:

No. No. No. No. No.

Speaker 3:

That's fine. So so the the idea is that, you know, we, we we take a combination of of set theory and and and formal logic, and propositional logic and add some notational, affordances to be able to talk about how a system evolves over time. And in doing that, we we are able to, to generate the way that I think of it is like big graphs, big execution traces of the way that a system is allowed to that a system is allowed to proceed, through through a set of executions. So so the idea is that, like, a a model what a model really, really kinda boils down to is a it's a big old logical predicate. It's the disjunction of a whole bunch of of different individual predicates that that specify you know, I have I have the universe, and I have some set of variables in the universe that I care about.

Speaker 3:

And the variables can take can, in theory, take any arbitrary value. I can have some variable that is 3 or it's some string or it's some, you know, unique unique model value that, you know, is is not equal to anything except itself or or, you know, or it's a set or something something along those lines. There's there's a few different, you know, individual, quote, unquote, types of things that that we can have in in the system. And, what our what our model is is a specification of values that that it is a set of predicates that tell us what values belong to, like, our and I'm really really kind of struggling for the words here, but what values fit into our model? Like, which ones are part of the the set of states that can be reached by the, by the system that we have specified.

Speaker 3:

So usually the way that you would write that is you would say, in the initial state, my variables can have any of the following values, and that could be like this could be this is guaranteed to be null or it always starts out as 0, or it's like anything from 1 to 10 or something along those lines. And then you take the disjunction of that with a special predicate that says something of the form of if the variables are in are have the following values or meet the following conditions in the in some prior state, then they take on the following values in the next state, and that next state becomes part of the universe of part of the selection of states that are allowed, by the by the system that you have specified. So you you were writing a set of rules essentially for how you can proceed from one set of variable values to another at each individual step. And then you can, you can so that that's your specification of your system. Beyond that, what you can do is you can express sets of predicates over those variables that are supposed to be invariant over the variables.

Speaker 3:

So you can say things like, you know, this this thing should always be an integer of some kind. And if it's not, then I have screwed something up in the model. You can get into things like so for the example I gave earlier, we can have a rule that says, if the instance is currently running, like if the variable that I have in my model that says, that says what the state of the instances is set to running, then the user intent, the last thing I heard from the user, had better not be that it is stopped. Like, that that is if if I see that, then that's a problem. And then there is software that that takes this thing and enumerates all of the different possible states and executions and interleavings thereof, and will check for you to see if your invariants are invalidated at any particular point.

Speaker 3:

And if they are, then it will scream. And it will scream in an especially useful way where it will show you, the the the TLA model checker, TLC, will will show you the set of steps that it it took to reach the broken invariant. So you can look through there and see, like, this whole, like, oh, this and then this and then this and then this and then this. And these, like, 15 steps together led to this particular invariant being violated, and this is the thing that I have to go figure out how to fix.

Speaker 1:

And let me how often is it a bug in the model? I because as you're developing the model, presumably, you also have things where it's like identifying one of these interleaving strike. Oh, actually, I realized that the model does not represent the system accurately in this regard.

Speaker 3:

Yeah. I mean so so go sorry. Go ahead and finish, please.

Speaker 1:

No. No. No. Yeah. Go ahead.

Speaker 1:

So they just like, how how often is it about the model versus, the the model is correctly representing the program, and therefore it's a bug in the program rather than the thinking?

Speaker 3:

I mean so if it is a if you're doing a translation, it's it's at least, like, sometimes that that this happens. I have not, like, translated a

Speaker 1:

whole heck of a lot

Speaker 3:

of code at least as I have been doing these. It's mostly been sort of new design. So I tend to think of bug in the model as something more along the lines of, like, you

Speaker 1:

know, but once I referred to the wrong Yeah.

Speaker 3:

I referred to the wrong variable there. And so now I I have an invariant being violated, but it's it's because I screwed up. Right? Like, this wasn't what I meant to express. I I think that a common thing that can happen is, that you like, the the common failure mode for me has been like oversight of one kind or another.

Speaker 3:

Like, you know, I there's there's this particular case and I just, like, didn't think of that. And so I specified this behavior that I thought was going to work, but it's missing, like, some particular condition that needs to be, needs to be checked in order for everything to, in order for everything to hold up correctly. So, like, I was working on one of these today. I'm gonna try to find the example that I had in here. Yeah.

Speaker 3:

Something along the lines of, we have to, oh, yeah. It turns out that we have to check to make sure that the thing that is currently in the table is the one that we're trying to remove from the table because of, like, just doing this, you know, route replacement is not semantically correct for the particular thing that I am modeling. I my experience is that, like, the this sort of mistake mistake, I wouldn't even call it a model in the bug or excuse me, a bug in the model necessarily. I tend to think of it more as just, yeah, like, you didn't you didn't think of that quite right. You kinda left it underspecified or, you know, here's here's the thing that you you didn't picture just yet.

Speaker 3:

One thing that is fun about this is getting into dialogue with the model checker almost where, you know, you come up with something and you have your set of invariants and alright. Here's my proposal for how we're gonna uphold these. And checker says no. And the checker says, oh, yeah. You know, you like you messed this up in this particular way.

Speaker 3:

Okay. Well, you know, I'll I'll I'll add this and this and add this check here and this step here and deal with that. And you send it back to the checker, and the checker says, yeah. No. But what about this other thing?

Speaker 3:

And, for a while, while I was doing all the design work for migration orchestration, there was I was just sitting here, like, having a back and forth with the model checker about, like, okay. What about this design? What about this design? What about this design? And, if it's it's actually, like, even in the commit history, like, for the models in our in our internal docs where it's like, okay.

Speaker 3:

This is still failing because of this thing. Alright. We fixed that, but now it's failing because of this thing. And, it's it's honestly, like, kind of fun to have the automated, you know, automated conscience sitting there telling you what is wrong with all of the stuff that you've done so far.

Speaker 1:

When you're able to do very rapid iteration, but it's a a rapid iteration at the kind of the highest level of the system of where you're actually thinking about the system and its most abstract, before you can make much more costly mistakes by actually building it the wrong way.

Speaker 3:

Oh, yeah. Absolutely. I mean, one of the one of the things that is really, you know, valuable about this, like, one question that that I think, Brian, that I think you asked when I first, you know, showed some of this stuff was, you know, what the what the procedures are for translating one of these things into actual code. And, there's it depends, like, kind of on what system you're working in, but generally, automation of of turning a formal specification into a piece of executable code is is a very, very difficult problem. It would would've been especially difficult in a case like this.

Speaker 3:

Like, the the prototyping that we'd be talking about here would be in order to implement the ultimately implement the the design that that I settled on, Had to go change, like, our there were a bunch of changes to our internal database schema, and I had to update all the code that was, like, dealing with all the types that we're dealing with the schema. And And I had to add all the APIs that dealt with all of these internal interactions between these various things and, like, thousands of lines of Rust ultimately to to implement all of this stuff and a a lot of time spent on it. And, yeah, like, trying to do all of that just to get to a point where, you know, we could find out that here is, like, this subtle, you know, this particularly subtle issue, that would've we may not even have found any of the subtle issues.

Speaker 1:

Totally. Oh, god. Yeah.

Speaker 3:

And I think too, like, that that's that's a point too that I think is worth emphasizing. With the original bug that that led to this whole, led to this whole affair, in order to hit it, you would have needed the a a a very precise set of step of of of racing steps to happen where a Nexus process crashes at exactly the right time. And while it is coming back up, your customer stops a VM, and the VM stop gets recorded, like, just recorded enough but but not quite enough in order for this, like, in order for this whole thing to get replayed and and deal with the and create the condition that we were trying to avoid. Can you imagine writing the integration test for that? Like, the the the

Speaker 1:

the set of controls that

Speaker 3:

you would need in order to be able to get, like, this specific sequence of crashes and everything to happen. And that's if you know, like, that this bug is in here in the first place. So Yeah. I was saying that

Speaker 1:

that doesn't mean you can debug it. I mean, this is the kind of bug you're gonna see exactly once. And you've gotta hope that you've got enough information that's been left postmortem for you to be able to figure out what actually happened here, and then rely on kind of the the putting it all together in terms of what has happened, and then the ability to reproduce it. So it's a much, much easier to think about it in terms of the model.

Speaker 3:

And and with the rare ones too, you know, as I'm I'm I'm sure we we all kind of appreciate. When you, you know, you do you write your fix and you go to test it and the bug doesn't happen, and is that just because it you took it and now it's even more rare than it was to start with? Or is it really gone and your fix is your fix is really working? And, yeah, being able to just, you know, iterate all of overall of this stuff exhaustively is a, is a very, very powerful set of techniques for for dealing with these things.

Speaker 1:

When it's also especially if you've got a fix, I mean, having just had this had this recently to myself where I had a fix that was, almost complete, but slightly but not complete and not actually complete, but it made the bug so much less likely to hit that I I deluded myself into thinking that I'd actually completely resolved the issue. And it it it's very when you don't have that higher level of abstraction to be able to reason about, it's it's easy to do that. It's easy to think that you fixed it because the bug the bug used to be here, and now it's gone away. It's, like, well, actually, it's now it's now you've made it much, much, much less likely. Congratulations.

Speaker 1:

But you actually haven't completely fixed it, and there's still a window, and that's where your model checking can really come come to the rescue. So can you just talk about the mechanics of this a little bit? Because you were using, I think, the the you're using PlusCal. Right? Is that the am I remembering that correctly?

Speaker 3:

Yeah. For the for the for the model I demoed, yes. So so t l a pure t l a plus is what I described earlier, which is, you know, you you go and you write all of the predicates, predicates by hand. There is a domain specific language called PlusCal that is more procedurally oriented in the sense that it looks a lot more like it it there's there's a Pascal ish dialect and a c ish dialect. And it looks a lot more like a straight line set of procedures and and and functions that you have written.

Speaker 3:

And and I think can be a little bit more familiar, certainly it was for me, getting into, getting into modeling for the first time. So you you write one of these things in this more procedural style according to the rules of the DSL, and then there's a translator that turns that into into actual DLA plus for you by taking your it does the insertion of all of the like, I have this procedure, and it's at this particular point in the procedure. It does all of the program counter stuff and generates that for you so that you don't you don't have to think about it as much. The the learn TLA resource I talked about earlier, is PlusCal oriented. I think for the first, probably, I'd say 80% of it or so, it is a, it's a good way to get started and with with more familiar concepts, if you I think if you were brand new to this sort of thing.

Speaker 3:

And it is also, I think especially good for taking existing code and trying to translate it into something that can be model checked or taking an existing procedure and turning it into something that can be model checked. I I have done a little bit since then of of just doing things in pure TLA plus. There is a set of of examples. The the collection is called pragmatic formal modeling, and I've got links to all this stuff that I'll I'll kick over and we'll make sure we get in the show notes. But, the there's some examples of things that you can do to model systems that that and these examples are written, like, just in pure TLA, no plus cow insight.

Speaker 3:

And and that kinda put me on to some of the the power that you get out of expressing things in, just, like, in in this pure form. I I think if you were asking me, like, to pick 1, if I were doing something that was very clearly, like, procedurally oriented and I or that has, like, a lot of I find it easier to reason about, like, locks and waiting for conditions and and and translating code that looks like that in PlusCal for things that are, you know, sort of interlocking, interacting state machines where you have a bunch of, like, if the if the the widget is in this particular condition, then it will it can go into this state or that state or the other state. That can actually be really I found that that can be really natural to just write in TLA. And, that's that it turns out actually that the migration design specifications that that I mentioned, I ended up just doing those that way because it turned out to be the easiest to reason about. Like, oh, yeah.

Speaker 3:

You know, this is how all of these individual pieces of state are allowed to change given the different things that are going on

Speaker 1:

Oh, interesting.

Speaker 3:

At at some some particular point. So, yeah. I mean, it it took reading the the some additional examples to really get a sense for, like, oh, yeah. You know, here's here's here's how I would express all of this stuff, and here's what 8 AM kind of looks like. And that that was really helpful at figuring out how to actually put those things together.

Speaker 1:

And so did the model were you able to reproduce the bug that you thought you had found as you're beginning to design? Like, I think I've got this bug in the in the Exton system. Were you able to reproduce that bug, using your model?

Speaker 3:

Yeah. So so the the the first and and it feels a little bit like so I was able to do that. It it feels a little bit like cheating because, like, we've we had already looked at it, and it's like, yeah. You know, this is kinda already there. So am I writing this model in a way that is gonna get the model checker to show me just to show me the bug that I know that I already think is there, or am I doing this right?

Speaker 3:

Like, am I being fair about all this? But but I think I I think we decided it was the latter. And and the nice thing about it was that it was oh, yeah. Sorry. Go ahead.

Speaker 1:

Will you will you use this pattern twice? Right? I think it was kind of interesting that you use this when you were first kind of experimenting with this in 2019. And maybe that's just a good way for, you know, as you're kinda getting your your feet wet with these things, like, actually going and reproducing something that you're already aware of and let this thing find it is actually a pretty good way to to to understand the tool chain, I imagine.

Speaker 3:

Yeah. I I I think that's true. And and, certainly, like, there is when you're learning to do something like this, there's no substitute for practice. And you can do all the reading you want, but, eventually, you kinda gotta, I I think, anyway, you kinda gotta just do some of them to really get a sense for how everything works and what what works well and what makes things fit together nicely. And, certainly, doing that with an existing system that you already understand well is easier than is is easier than try for for a first go around than doing something out of whole cloth.

Speaker 3:

So, yeah, the I I wouldn't necessarily say that the parts of of our control plane that I was working in were things that I knew very well at the time that I was doing this. So, you know, it was it I I wouldn't I was a little worried that I was misrepresenting something. But to your point, yeah, I mean, the the model checker was was pretty quick to throw up the bug that we thought we had and was pretty quick to verify a solution too. Like, we go and there there was a solution that we'd suggested that was essentially, yeah. What if we break this into 2 phases?

Speaker 3:

And we, you know, have one step that says, hey. I'm gonna start an instance here and then another step that actually starts the instance so that if the second one, like, gets replayed, then it doesn't, like, it doesn't get started again because it already started once and then got stopped. And in in principle, yeah, you know, the the the checker's happy and we went off and implemented that and have generally been pleased with it since. I mean, to be fair, I don't know that we had ever actually seen this issue in practice, but it was certainly something that appeared to be there on inspection. And, you know, hopefully, it will it will not crop back up as we continue doing more empirical testing and and stress testing at the system and things of that.

Speaker 1:

Well, and I I think that that's also super satisfying and a very it's a good omen as a software engineer when you are discovering preexisting bugs as part of your work. You're like, okay. This is like I'm beginning to test this thoroughly enough that or think about this rigorously enough that I'm discovering preexisting things that were already here. I just again, I feel it's a good it's a good omen. It's not necessarily a certainty, but it's a, you know, you're you're going in the right direction.

Speaker 1:

And you're you're you're thinking about about the, the system, in the right ways, or you're thinking about it in a way that hasn't been thought of previously. So the and did you end up did the models end up did you end up effectively checking that in side by side with software? Or what what what happens to the artifact that is the model?

Speaker 3:

So they're they're not in with the software, although they probably should be. I will have to get over my embarrassment at at my my probably unidiomatic and and generally noobish DLA plus modeling. But, they are, they are at least checked into our internal, request for discussion repo. So for the oxide folks at least playing at home, they're they're present alongside rfd361. For the migration one, I would like to get it transferred over into the into the Omicron repository, into the control plane repo so that it will just, like, be there as part of the repo docs.

Speaker 3:

I I also am looking to do that. There's an issue I've been working on today, actually. We're doing doing some of the same modeling tasks, to try to verify that, like, oh, yeah. This this proposed fix for this, particular problem that I've seen, is is probably gonna work out alright. And I would like to, get that spruced up and and and put in, put in at least into the into the issue comments, if not, just checked into the repo as well.

Speaker 3:

But, yeah, I I think it can be really helpful to have that stuff alongside. These things function in part as documentation of the way that the system is actually supposed to behave at a high level. And, you know, what is, what was just you know, let me let me see if I can dig up one of these things. Which one which one is this? So this is the, the configuration file.

Speaker 3:

I I wanna I wanna get a sense of, like, how long one of these things is. So this is Yeah.

Speaker 1:

I think it'd be it'd be very helpful to see one of the

Speaker 3:

totally. Yeah. So, I mean, the the the t l this t l a model so so this one here, I'm I'm looking I'm looking at one of the ones that I did for the migration orchestration. The the entire model file, including just, like, lots and lots and lots of block comments just to try to explain what's going on, is about 700 lines. And the I remember, like, when the actual code was originally done, it was about 10 times that in total in in in the delta.

Speaker 3:

So and and spread across just many, many, many files and many components. And like I mentioned earlier, buried in the weeds of all of this stuff around, how exactly do we write all the ORM code to go manipulate the database in the right way? And what do all of these individual, like, API endpoints look like? It's it's not boilerplate. Like, it's important stuff that we have to do in order to have a shippable artifact.

Speaker 3:

Like, no one's gonna run the model, run the VMs on the model. But I think it is a much, a much more concise way of expressing the the intent of of of the system here.

Speaker 1:

We and and maybe that's a good segue too to, and, Randy, you because, you know, when you begin to go from, like, okay. So we you've been using the model to kind of envision the system from whole cloth. And now we actually have a system, and now we need to actually verify the the kind of the Extant system. And then we need to be able to improve it, and we need to be able to fix a bug in it and be able to to know that we that we haven't kind of violated its invariance for the actual implementation as opposed to the the model. And, Rainen, I and testing is something that is very near and dear to your heart.

Speaker 1:

And you'd, I'd you've been, working on things to to, use property based testing, which is something that I have not used. So maybe you could, this is maybe a good segue to some of the work that you've done in that regard.

Speaker 4:

Yeah. So it's funny because, the first time I was introduced to property based testing was in the context of a model being defined, and then, we needed to test that the implementation actually worked. So, so, so I used to work at Facebook, and, I was on the source control team at Facebook, building the next generation source control server called Mononoke. And, as you might've heard of Facebook stores, all of its code in a single giant repository with, 100 of millions of lines of code last time you,

Speaker 1:

And And we just got the the mega mono repo. There's gonna be a different term. This is, like, yeah, mono repo, but, like, no, much larger. This is I mean, it's amazing the scale.

Speaker 4:

It's it's pretty incredible. It's, it it leads to certain interesting trade offs. So one of those trade offs is that we have to build new source control data structures, to support this because the existing, Git like data structures, didn't meet some performance characteristics that we needed them to meet. So we, so my coworker, Jeremy Fitzharding, actually built out so we we did some design by hand, and my coworker built out a, the specification for the structure in alloy, which is a cousin of, TLA plus, I believe, Greg. And, actually, Hillel, I think, has some good writing on alloy as

Speaker 3:

well. Yeah. There's there's alloy is definitely in the in the same family. It, and Hillel does have some extra writing about it. I have not really I haven't played around with it a whole heck of a lot, except like a little bit last week, just to sort of see what it was like before before sitting down here.

Speaker 3:

So, the the, the thing about alloy, I think, is that it has historically not been as strong at modeling concurrency. Like, it's always been kind of a bag on the side, until its most recent release, and it's gained some different powers for reasoning about concurrent systems and the flow of time and that kind of thing. So, it is, something that I'd like to revisit at some point. But, yeah, alloy alley seems cool in that it has a it seems like, it has a much stronger object model, shall we say, than than TLA does. And I imagine that for for the kind of it I I I bet you that for the kind of system you're talking about here, Rane, that that kind of property, the the sorts of powers that it has in that respect were probably, particularly useful.

Speaker 4:

Yeah. That sounds right. Yeah. That's my understanding too that alloy is good for modeling, data at rest, I think is the way my coworker described that. But, anyways, so, you know, we'd written up this model, and it seemed to check out.

Speaker 4:

And we'd, you know, ironed out all the all the issues with it, but then when we actually went to write the code, we realized that the model was fiendishly complicated. And, I think the code to translate back and forth between, Mercurial, which is what we're using, and, the this model was, like, several I wanna say at least 1500, probably 2,000 lines of code. So, we needed a way to actually make sure that things like, if things like you needed to be able to do a round trip, and you needed to be able to exercise all sorts of edge cases. And so that's where I started looking into property based based testing. And, we we've used property based testing in a in a couple of ways.

Speaker 4:

One of them was we took out random subsets of our gigantic monorepos, and we replayed them, and we made sure that, like, all the invariants were upheld. We also synthesized random repositories to make sure that, you know, in all these sorts of edge cases, the all the invariants that we wanted to, uphold actually, you know, worked. So I think overall, that was my first experience of property based testing.

Speaker 3:

And And and, Rich, maybe

Speaker 1:

is do you just explain what kind of what property based testing is gonna be approach that because it's a, and kinda its origins?

Speaker 4:

Yeah. So property based testing originally, my understanding is that the first, versions of it were written in, Haskell. I think this comes from the 19 nineties, I wanna say. And, the idea behind property based testing is that, most of the tests we write, like, in our normal use are, like, you know, like, you have a list that's, like, 14235. You sort it, and the output you get is 12 345.

Speaker 4:

Right? And the idea behind property based testing is that rather than testing for specific examples, you generate a, you generate a bunch of random test cases that you hope will exercise all of those, all the different code parts you care about, and then you run them through your code under test, and you make sure that the answers there are correct. Now there's a huge asterisk next to the how do you make sure that the answers are correct, but that's the overall view of, property based testing. I think there's 2, there's kind of 2 different generations of property based testing libraries. One of them is one that, I'm sure a lot of folks have heard of.

Speaker 4:

It's called quick check. Quick check is, is kind of the original property based testing library. It has been in use since the nineties, and I know that, it's in use for things like automotive, design where you wanna make sure that your, your car software behaves the right way, under edge cases and so on. The other, the next generation of property based test systems was originally, actually in Python. It is called hypothesis.

Speaker 4:

And in Rust, the port of hypothesis to Rust is called prop test. I'm going to offend some people by saying this, but I think that nobody should use quick check at this point. Like, quick check. And I'm gonna go into why. But but prop test is like, if you're if you're if there's one takeaway that you have from this, it is that, if you're using Rust, you use prop test.

Speaker 4:

Haskell has Hedgehog for the same thing. And in Python, of course, there's hypothesis And all of them have a design that is, better. And I'm gonna talk about how it's better. So with property based testing, I mentioned earlier that, property based testing is all about generating random values. Now those random values tend to be pretty big, and so often, like, you will generate a value that's, you know, like, for first for example, for starting a list, you might generate a value that has, like, 500 elements or something like that.

Speaker 4:

Now when you're actually, like, trying to debug test failures, if you're just suddenly presented with a, here's 500 elements, and you need to go, you know, figure out what's going on here, then that can be really hard. So the other bit that property based test frameworks provide is called shrinking. And the idea behind shrinking is that after the test framework finds a failing case, it tries to reduce the size of the input, in a way that it ultimately finds the smallest input. And and the algorithm for that is pretty straightforward. You just kind of keep throwing smaller and smaller inputs at it, and until you find the smallest input that fails.

Speaker 4:

So, the modern property based testing libraries like PropTest are a lot better at shrinking than, than the previous generation. And so they they require a bit more effort to use, but I think you get so much more out of them with shrinking that, it ends up being, a pretty clear win.

Speaker 1:

And so then how do you use them? And how do you if I I wanna use I've got a Rust crate, I wanna use PropTest, like, how do I get started?

Speaker 4:

Yeah. So, with PropTest, you can so there's a library. It's called PropTest. You, import the library, and I'm gonna just paste a code sample here. And so let me see.

Speaker 4:

Wow. Okay. So it ate up all the, the the Oh, just for Natalie. But, so here we have a couple of different examples of property based test for a, for, for a sort function. So in this case, we are testing a function called my sort, and we have 2 property based tests here.

Speaker 4:

One of them is that, once you've sorted the input, for every pair that's in window and after dot windows 2, we make sure that the the previous element is less than equal to the next element. So that's a that's a pretty straightforward, invariant that you want to start function to uphold. I think that this this sort of invariant, if you spot it, it's great, but often it ends up being pretty rare, in my experience. So for a sort function, I think, you know, it's pretty clear, but there's like, I found a lot of other real world use cases where it's very hard to find these sorts of bit variants. I think, the more interesting, kind of invariant and the one that I think is really exposes how powerful property based testing is, is the second kind of example which is where we test our sort against, in this case, a bubble sort.

Speaker 4:

So, over here what we do is we have our system under test, which is my sort, and we also have, some sort of simplified model of how the code should work. In this case, it is a bubble sort. It could also be, for example, the sort, in the standard library, which is probably correct. And so we run our input through our function, and we run it through the function that is that forms our model, and we compare that our output is the same as the bubble output. And, this sort of, I think that this, this general pattern of testing, is, is, is much more common than most people think.

Speaker 4:

And I, as someone who's used property based testing for a long time, I'm like always on the lookout for like, okay, what other, preexisting work of art can we compare this to in order to make sure that our system behaves in a certain way?

Speaker 1:

Yeah. It's really interesting. So when you say you're on the lookout, like, what things are you looking for where because it mean clearly, if if I were taking one system and replacing it with another or another implementation, for example, very useful because I've got my Oracle is the the extent implementation. Exactly. But it it gets rockier when we're kinda doing new stuff.

Speaker 1:

But so when you're on the lookout, what are you looking for? Are you looking for kind of subsystems that we have analogs for elsewhere, or what are you looking for exactly?

Speaker 4:

Yeah. Usually, there's subsystems that, there usually, there's, like, smaller components that might have preexisting solutions. Often, what I found is that there is a there is some other way to express a particular algorithm or data structure that has a worse time complexity. So, or you know, it's less efficient in some way. So I will put in the effort to actually write a second implementation that is like facially correct and has, you know, even if it has a worse time complexity or space complexity or whatever, and then run our, run tests through both of those, to make sure that they're the same.

Speaker 4:

So often, you know, you do have to put in a lot of extra work. But in my experience, it's work that has paid off over and over again. Like, I found, like, probably over a 100 bugs by a property based testing at this point over my career.

Speaker 1:

Wow. That that wow. That's that is real when I as as you're saying that, one thing that I'm thinking about is, you know, we write a lot of no standard code, and that you're operating under certain constraints under no stand we are actually are trying to implement things that actually, are available in the standard library. And that would be a an example where you it's very easy. Like, your Oracle is the thing that actually has the the the that uses the standard library, and you can compare that very directly and know that your no standard implementation is correct before you actually have to debug it in embedded environment.

Speaker 4:

Yeah. That's, that's sort of, like, more implementation is is a pretty common, pretty common approach.

Speaker 1:

Yeah. That's really interesting. So and, and maybe worth describing the kind of concrete example that you were showing us in terms of of prop testing and where where this came up and kind of the software that you were writing, and how you found something that could act as that kind of oracle?

Speaker 4:

Yeah. This is this is actually a really cool example because I thought that this is, just a really fun and interesting use of property based testing. So I wrote this, create called, buff list, which, I I just posted a link to. So this represents a segmented list of bytes chunks. So the context here is that usually when you're working with byte buffers in Rust, you have something like a VECU8 or a bytes if you're fancy.

Speaker 4:

You can use the bytes grid. All of them are contiguous chunks of bytes arrays. So if you're you for example, if you're storing, like, 20 gigs of memory, or 20 gigs of data, like, say, a disk image, you would need to do a contiguous 20 gig allocation. That can be pretty inefficient. So, a different way to store bytes, data is as a, list of bytes chunks or in particular, a queue of bytes chunks.

Speaker 4:

And the idea is that you can read data off of the beginning of the thing and go on until the end. And the the data structure internally keeps track of, making sure that, you're reading the right bytes and you're not skipping over bytes and so on. So for this data structure, I needed a cursor type. So, for those who are unfamiliar, in Rust, there's this wonderful notion of cursor types. So if you look at standard iocursor, then that is a type that wraps any sort of contiguous, byte array, and it provides a read implementation on top of it and a seek implementation on top of it.

Speaker 4:

It logically represents some sort of pointer into this data. And, so the cursor type is it's actually, like, if you look at the implementation, it's pretty straightforward. It's a it's a really useful type that works across any sort of contiguous byte buffer. However, what we have here is a discontiguous byte buffer, so cursor doesn't work for that. So I needed to write my own cursor, and, the cursor for discontiguous byte buffers is actually significantly harder to write than a than a than the standard IO cursor because Totally.

Speaker 4:

Because you have to keep track of all sorts of fiddly indexes. You have to make sure that, you know, you're reading path you aren't reading past the end of something. You need to keep track of all of those things. So I needed a principled way to make sure that my cursor was correct. And the thought that came to my mind is that why don't I compare my cursor against the standard I o cursor?

Speaker 4:

So what I did to address that is I I wrote, a test which generates a random, discontinuous buffer, like a buff list. It combines that into a single bigger list, and that's something that is very inefficient, but it is okay to do in a test. Right? So that's that's kinda where you start seeing the power of prop test. After that

Speaker 1:

It it it we and this is a very concrete example we were talking about where you've got different constraints where you the the the whole the reason Buffelisk exists is because it's gonna be processing these galactic amounts of data. We actually don't need the galactic amounts of data to be able to test all of these various edge conditions.

Speaker 4:

Exactly. And so that's where, so that's where, you know, that's where, you know, you can, like for smaller inputs, you can be sure that, buffer list works properly. And so I just pasted a link to the test. And the thing this does is it generates this buffer list. It combines this into a single thing, and then it also generates a set of operations on cursors.

Speaker 4:

So so the set of operations are set position, start, seek end, current, read, and so on. So you have all of these different, examples of, operations you can generate. And then what it does is it applies these operations to the, system under test, which is our cursor, it applies them to the Oracle, which is the standard IO cursor, and it makes sure that, makes sure that it all checks out and all returns the same answers. And this ended up finding 6 separate bugs. I remember it, like, I remember fixing one bug and it's like, oh, wow.

Speaker 4:

So this ended up finding 6 separate bugs. And then after the 6th bug, it was like, you know, pass. And then I was like, yeah. I, and, you know, at that point, it's like pretty good. Correct?

Speaker 1:

I mean, that's how boy, that is a lot. I at some point and I know I've I've sort of had this where you're just like, okay. Did I do anything right with this code? Like, how many bugs are there here? It's like, alright.

Speaker 1:

This is but this is exactly the kind of code they get. It's so hard to get right in terms of where where you've got as you say, it's all these fiddly little edge conditions where, you know, you're you're I, it's easy to envision how you could have yeah. That is that is really powerful. Can I ask you, what what do these strategy decorators denote?

Speaker 4:

Yeah. So, so this is getting into a little bit of the weeds, but, so the, so the way PropTest works is that, you, the way you generate random values is by creating strategies of that type. And a strategy is simply a way to generate random values, but it also carries information on how you can shrink values into smaller ones. And it, integrates both of those things together. And so strategy is a combination of those two things.

Speaker 4:

So in this case, if you look at line 19, there is a buff list strategy. A buff list strategy is, is a strategy that generates a random buff list, and the functions right below there. And the way it does it is it generates a vector of 1 to 128 bytes. And then it, combines and generates 0 to 32 vector, of them, 0 to 32 vectors, and then it combines them all into a single, buff list. So that's so that's what that is.

Speaker 4:

Now the fancy thing about prop test is that this is actually more clever than it looks because the prop collection vec, function that is called here is, also carries information on how you can reduce the size of the vector and reduce the size of the elements inside it. So all of that information is carried along with, with the constructor. And this is something, this is the thing that distinguishes prop test and hypothesis and and the other new generation property based testing libraries from QuickCheck. Because quick in QuickCheck, you don't carry that information along with it. And so it becomes much harder to do shrinking.

Speaker 2:

I mean, how would you contrast this with fuzzing? Because it seems like there's some kind of similar properties here.

Speaker 4:

Yeah. So, probably best testing and fuzzing are actually very closely related in many ways. So so usually when people refer to fuzzing, they will refer to, what is sometimes called, like, mutation fuzzing where, where, the there is something that is kind of, instrumenting the test and making sure that it tries to explore every code branch. With property based testing by default, it doesn't try and explore every single code branch. But but instead, the thing it does is it it uses structured data rather than the unstructured data that fuzzing has.

Speaker 4:

So so there's trade offs in both directions. I found that those two things can often be used simultaneously, and one use one use that we use pretty extensively in in a in a in my past life was, where you use all of the work that you did for property based testing, you use that to generate random corpuses for fuzzing because all of these generate random values and these random values are very high quality and mutation fuzzing really depends on a high quality corpus to work really well. So, so, you know, you can generate, like, you can generate, pretty good random inputs for from which, a mutation fuzzer can get going. And I remember that being really effective because, we were trying to fuzz a deserializer and, you know, deserializers can often be really hard to, hard to make sure we're correct. So we generated valid values and then, gave them to the fuzzer that got, that got bugs in like a few seconds.

Speaker 4:

Whereas if you just ran a fuzzer with no carpus at all, it would take like hours to find, those edge cases.

Speaker 2:

That's really cool. I really like this example too, Raina, and it did get me thinking. I don't want to start a debate here, but it seems like we could we use TLA plus for this too, you know, for kind of modeling this kind of system with, again, all these kind of fiddly edge conditions, you know, encoding the logic into TLA. I don't know. What do you think, Greg?

Speaker 2:

Like, is it you would this be a good, you know, formal methods example as well? Or do you think, you know, it's not stepping on the toes of PropTest?

Speaker 3:

I I think well, I think I think they're separate. I don't think it's all it's stepping on the toes of PropTest. I I think they're they're very complimentary. One thing about the schemes like TLA is that depending on the kind of data that you are working with, for example, you might find that TLA plus is just not amenable to the particular problems that you were working on for one reason or another. For example, if you're trying to do something with floats, no.

Speaker 3:

Sorry. You're gonna have a bad time. There's not really a native like, floats have have complicated semantics. That's not something that TLA plus does, you know, especially well. I mean, in I I think a lot of the, there are some cases where things can overlap.

Speaker 3:

I I think one of the things to keep in mind with with the formal modeling stuff in particular is that the models aren't code, and they're not the shippable artifact. And so, you know, you do wanna also make sure that you have you're bringing all of your your powers to bear on making sure that the the stuff that you've done that actually implements the design that you've got, upholds all of the properties and invariants and things that you actually expected to uphold, you know, in the in in in that operating environment. So I think it can can kinda be both and. Like, there's there's there's probably I'm sure that there are some cases where, like, you know, if you were doing a brand new algorithm and you can specify it, like, in a model checked kind of of way. You can certainly, get TLA to do a lot of things in in that respect.

Speaker 3:

But there's there's still a lot of room too, I think, for for good, for good exhaustive test frameworks and sort of a a broad spectrum of things, especially when it comes time to verify the things that you actually mean to hand over, to your to your customers.

Speaker 1:

Adam's attempts to generate formalism fight club have failed.

Speaker 2:

Well, you gotta try. Right?

Speaker 1:

Exactly. But I think it's actually Yeah.

Speaker 3:

I mean, so the, yeah. Yeah. I mean, the the models the models are great. The point is that they only get you so far. Right?

Speaker 3:

And especially with no with with very limited, you know, power for for automated translation into things. You wanna make sure, like, you know, you didn't you didn't make a mistake then either, and that's that's where you wanna have as many powerful testing techniques in your arsenal for your production code, as you have for the for the designs up front.

Speaker 1:

Yeah. And I I'm kind of like I and I love the fact that these are kinda complimentary. And because, I mean, the also, like, when you're using prop testing, it's like you're actually you're gonna be bug for bug compatible with the thing that you are using as the oracle. Right? So the the and right now, it'd be interesting to know if you have you ever in doing this, have you ever found that there's actually a bug in the thing that you're using as the canonical implementation?

Speaker 4:

So many times. That's like, honestly, like, there's so many places in which, you know, there's, like, give written an article and, yeah. So I think, you know, just like DLA plus, you're often in dialogue with the Oracle and with the system under test and, you know, you're like, oh, actually, I didn't think about this edge case. And is this is this an issue in the system under test, or is it in the Oracle?

Speaker 3:

And that's where you

Speaker 4:

have to think about it. I found that, sometimes, the Oracle can even be more complicated than the system under test, and that's where it gets spicy. So so as an as an example, so I wrote this Rust library called Guppy, and Guppy is a way to track and query cargo dependency graphs. And one of the things it can do is it can simulate cargo builds. And so we can throw it a given list of, like, packages in our work space and what features are enabled and whether dev dependencies are enabled or not and all of these other flags, and it will tell you which, which packages were built in the end.

Speaker 4:

And that's a really, really cool, set of things, that's a really cool thing you can do, but how do you make sure it's correct? Well, we have the oracle, and the oracle is cargo. Right? So Right. You generate a random query, you throw it at cargo, you throw it at, Guppy, you make sure that they produce the same results.

Speaker 4:

Now it turned out that that ended up finding way more bugs in cargo than in Guppy. Totally. There's a whole list of bugs in cargo that it found. And so, you know, who's the article now?

Speaker 1:

Well, that but it's really interesting. I mean, I think that it is actually kind of the power of having these different implementations that are operating under different constraints or the the you're taking in in Guppy's case, it's actually trying to do something different than what Cargo is doing, but it would it's trying to use Cargo to validate that. Disease. I mean, it's it

Speaker 3:

correct me

Speaker 1:

if I'm misspeaking there, but it sounds like it it it and so the the the in other words, like, these are different things, but they have a very important intersection point where they can be used

Speaker 3:

to check

Speaker 1:

the other.

Speaker 4:

Exactly. They're they're completely different, data structures with their own models and their own, you know, places where they're meant to be used. And there are places where they do the same thing. And so you can compare those things to make sure that, you know, if there's a bug, there's a bug in both of them. Right?

Speaker 1:

So this is kinda asking you both, like, purely selfishly because it's a problem that's in front of me at the moment. But one of the the challenges that we've got are in kind of dynamic systems where we are and I and, Adam, I'm I'm thinking about some of our, the some of the age old challenges about right transaction sizing, with CFS, the trend size of the transaction group. And we see a lot of these things in a dynamic system where you kinda, like, how do you you know, you got throttling behavior and so on. Is there any can formalisms offer us things where we've got these, like, very dynamic systems where the the the system is not it it it's not a a question of correctness, but but, one of optimality and of kind of eliminating queuing delays. Am I and sorry to just be freewheeling here.

Speaker 1:

I'm really asking you to do my homework for me, because this is a problem that's right in front of us as we kinda tackle some of the performance issues in front of us. This is where I wonder to Discord that again. Yeah.

Speaker 2:

Yeah. That's right. You've disconnected. Don't worry.

Speaker 1:

I disconnected. Don't worry. Exactly. I don't know. It's like, what the what is he even talking about?

Speaker 1:

Like, what is this pad? How do I even tell him he's not even wrong? It's like, did that.

Speaker 2:

Maybe a related part. Can we express constraints that are related to, you know, activity or or or or things of that notion. Is that kinda where you're getting at, Brian?

Speaker 1:

Yeah. I mean, just like how do we it kinda model some of these these more runtime aspects of it, that are not necessarily invariants, but, are this is not a well formed question. I'm realizing it. Maybe it would have been better off Discord had done. Discord did eye on me earlier.

Speaker 1:

The the the app actually croaked. I'm like, you know, I, Greg is not getting my audio, actually. But, Adam, you can hear me. We've gone split brain. Weird.

Speaker 3:

I can hear you. This before.

Speaker 1:

We being haunted by Twitter Spaces, is it possible that that The

Speaker 2:

ghost of Tetris Spaces?

Speaker 1:

Musk is stabbing his discord voodoo doll right now, right in these right in the split brain, which is definitely we've had all sorts of, the or or or is is it yeah. Is Greg Greg, you're trying to make some meta point about verification? Is this a is this a verification joke? Is this verification humor? The I this Matt's pointing out The yeah.

Speaker 1:

The it is it is strained. Yeah. It is actually maybe we do need some audio problems. Just trying to mix things up around here. You know?

Speaker 1:

It did get a little it got a little boring with the audio being so reliable.

Speaker 2:

That's right. No random crash of the Twitter app that brought everyone down.

Speaker 1:

Oh, my god. Especially going back and relisting. I know you have out of 2, going back and relisting some of our old spaces where it's like, okay. Got another audio problem. Alright?

Speaker 1:

It's like, okay, guys. You have anything else to talk about other than audio problems?

Speaker 2:

Yeah.

Speaker 1:

Yeah. It did the if there's 30 seconds of silence, man, that was definitely me saying what I thought was a, me making what I thought was an interesting observation and then re But

Speaker 2:

that was only the first 10 seconds. The next 20 seconds was everyone in stunned

Speaker 1:

silence. It's stunned silence. Right. Exactly. Look, I've been known to have conversations with nothing for plenty plenty of times.

Speaker 1:

So, where even are we? Where where where are we even talking about? Oh, the you know what? Actually, we're gonna we're gonna, like, never mention my question again. Let's actually let's just pretend I didn't ask it.

Speaker 1:

And I'm gonna do I'm gonna do better. Wait. Where are we? I think we're back on your, like, your moronic question, I think, is where we were. It's like, you know what?

Speaker 1:

What's that? What's what's what's what's what's what's what's what's what's off of that?

Speaker 2:

Well, we can come back in a couple of weeks when you solve that bug and explain how formal methods had helped you.

Speaker 1:

Explain how formal. Well, I think that I I mean, you know, what are some of the things to kind of look for? Because, you know, Rain, you were mentioning that, you know, you're kinda coming into this project at Facebook, and, you know, you've got this model, and you're kind of getting into prop testing as a way of of checking the model. What are some things that people should be looking for and thinking about to, like, boy, I I need to go use prop testing or I need to think about actually formalisms with respect to to TOI plus and its derivatives? What what are some of the thing what's some of the kind of the spidey sense you get when it when it's time to go go to this particular tool?

Speaker 4:

Yeah. I do wanna bookend this by saying that, you know, PropTest is a tool. Right? It is not the be all and end all of testing. There are, like, you know, other signs of testing and everything at their own uses.

Speaker 4:

But for me, I think, when I have an algorithm which has fiddly bits where there are a lot of numbers and indexes where things can go wrong, then I'll be, you know, I'll be actively on the hunt for, for prop testing. I think another really common use case is anything that feels like it can it should round trip. I think that is another really, really good use of property based testing. And that's where, serializers, these serializers. So so as an example, you know, you yeah.

Speaker 4:

So it's pretty common to have a serializer and a deserializer, and especially if they're handwritten. So, this is a bit less of an issue with Serdi, but often if you're handwriting a deserializer or serializer, you might wanna make sure that everything round trips correctly. You wanna make sure that you wanna make sure that you're, you're not losing any, any data in the middle and that the resulting value is the same. There's a really good, really good, list of, properties the on this, in this blog post that I really like because it kind of covers the general sets of things that you wanna think about. And I think for me, the, round tripping and also the, oh, actually, you need to write a second implementation that has different constraints, thing tend to be the most common, places where I'll spot, opportunities for property based testing.

Speaker 2:

Okay. Brian, now now Brian asked his selfish question. My selfish question is I've got a collection of crates in this weird JSON schema universe,

Speaker 1:

where I

Speaker 2:

take, you know, Rust types, generate JSON schema, and then reconstruct Rust types out of the JSON schema. A good case, do you think, for this kind of round tripping?

Speaker 4:

That seems, I think

Speaker 2:

Or too hairy because we're generating Rust types in bizarre ways.

Speaker 4:

Yeah. There needs to be some notion of equality. I think that you need to define between the, existing Rust types and the new ones. So, I mean, one way you could do it potentially is by generating random values of those types and make and defining equality on them. Though I'd have to look at the details to be, like, if if this is the right approach.

Speaker 4:

Okay. But, you know, there needs to be some way of comparing equality there. And as long as you can define that, I think you

Speaker 2:

can use this. Gotcha.

Speaker 3:

Cool. Thanks.

Speaker 1:

Yeah. That is really the the I I I love the idea of, like, looking for round tripping of, like, when I I send the system I I send the data through the system, and I basically should expect the same thing out. I go through the and that's a good a good candidate.

Speaker 3:

Yeah.

Speaker 1:

And and then, Greg, it sounds like from from your perspective on on, using either TOI plus or or, again, its derivatives, the plus, what have you, it feels like when you're hitting the notebook and writing down the I mean, feels like it's a it's a natural fit, obviously, for distributed systems. But when are what are the kinds of problems that are a particularly good fit for that that kind of formalism?

Speaker 3:

So I I think anything dealing with concurrency, this I understand. The the techniques in here are really, really powerful. The this TLA in particular is a really great set of tools for being able to reason about con like, concurrent systems with lots of things happening in lots of different orders and and the various interleavings that you have you have between them. Just to, like, give give a, like, sort of spitballed number of that, I've I've got a model up here that I've been working on today, that it has oh, let's see. So it at at its widest point had, like, 86 distinct steps that it took and 3,252,318 different distinct vary variable values that it went and iterated over.

Speaker 3:

So lots and lots and lots of different configurations of the system and lots of different things that it can go and do. I I do think that, it's not limited necessarily to concurrency. One of the nice things about, having to specify a model formally is that you have to be very precise and very well formal about the things that you were trying to express in in the model. You have to make sure, for example, that, you know, all of your if you were doing pure 2 pure TLA, Pascal will help take care of this for you. But in pure TLA, you have to be very TLA, you have to be very explicit about all of your variables, even if they are not changing, have to be referred to properly in in each predicate that you write and in each step that you take.

Speaker 3:

And little, you know, as it it becomes apparent to you, I think, as you as you write more of these things, what assumptions you were making and what things you were baking into, you know, into the into the system that you were designing. Oh, yeah. I have expressed this in such and such a way that I'm assuming that, like, this particular thing is atomic or that this happens as a, you know, as a transaction or something like that. And, just the act of writing that out, even if you just end up putting a comment there about, oh, yeah. You know, we're assuming such and such, and so this condition has to hold in order for this to operate this way.

Speaker 3:

It can be very valuable, I think, to have your thinking clarified by tools that are that are gonna be strict with you about what things you say to them and what sort of notation you use and, how how you've actually gone and expressed everything. So even if you are not necessarily dealing with a, you know, tremendously concurrent system, it can be and you have a more, you know, straightforward algorithm that you're working on. It can be very helpful to go through the the specification exercise just as a way of being forced to think about your problem very clearly and precisely before you sit down to write any code.

Speaker 1:

That's great. When I I I gotta say this is one something I love about Rust and the the kind of the the rigor in Rust is that it really attracts people who are attracted to other rigorous approaches. Yeah. And I I think that this is where I mean, I don't I don't think it's an accident that, that Rust is kind of a a a common thing that is binding a lot of this. And, you know, if you're someone who, who likes the rigor in Rust, you're gonna like the rigor found in these other formalisms as well.

Speaker 1:

And then, I I mean, especially with PropTest, being able to actually kinda pull that in and really synthesize that, Rain, as you've done, is is really, it's really great. I mean, it allows us to not to just to generate more reliable, more robust software to to to, to verify that what we have written is what we intended to write, and the the it verifies what what are the way we're thinking about the system. It's just terrific stuff, honestly. And it's been I I also, I have to say, I love the fact that you are kinda coming from 2 different folkways, if you will, from Microsoft to Facebook, 2 different companies that were, I think people don't realize the degree to which the hyperscalers really are these larger companies, the banks, whatever you wanna call them, whatever we call them these days, really are adopting a lot of formalisms, and it's something that Microsoft has really been leading the charge on a lot of ways. So, it's really been fun for me personally to see the way you're using it at Oxide and been super educational.

Speaker 1:

And I I need to find my own just like it reminds me, Adam, of kind of, like, when I was looking longingly at lust Rust trying to find the right problem to use Rust on. I feel like I'm looking for the right problems for to use prop test on or to use TOA plus on.

Speaker 2:

So, Yeah. Totally.

Speaker 1:

Because I think you you've not used either of these either.

Speaker 3:

No. I haven't. I'm

Speaker 2:

already thinking about round tripping my way through through my dumb, you know, JSON schema stuff. But, yeah, definitely looking for the next one.

Speaker 1:

Yeah. And I I think that, you know, just to, to channel, Greg's younger self or being counseled by kind of a more senior engineer at Microsoft to really explore this stuff. I think it's gonna be I when I get the opportunity, I'm gonna be or if and when you find the opportunity, use these formalisms. I wanna really embrace it because I think it's gonna be, it's really powerful stuff, and it's just fun to see again what what the 2 of you have done. So thank you very much.

Speaker 1:

Greg, Rain, thank you both for joining us. It's been a lot of fun, and complete with some audio problems. So just like just like old times here, it's been great.

Speaker 4:

Thanks for having me.

Speaker 1:

Yeah. Good. Thanks for

Speaker 3:

having me as well. It's a lot of fun.

Speaker 1:

You bet. Keep oxide and friends weird, as as Matt was saying in the chat. Alrighty. Well, we will, we will see. I I don't know if it's gonna be next time or it'll be, but I at some point, Adam, I think we're gonna have to revisit async.

Speaker 1:

I think, Rain, we'd love to have you back for that as well. I think we're gonna have to we we we've had some, some some async challenges as of late that I think think it may turn into an async therapy session. So, maybe maybe we look to a an oxide and friends coming near you for for some async. Async love slash therapy. Alright.

Speaker 1:

Thanks, everyone. See you next time.

Software Verificationpalooza
Broadcast by