Oxide and Friends | Transcript: Threads, async/await, Promises, Futures

Threads, async/await, Promises, Futures

September 12, 2022 / 01:15:08/S2 E27

A problem has been eating at Adam: we use async/await in many languages and yet we're not so good at explaining the moving parts. Bryan and the Oxide Friends therapeutically explore the space.

Speaker 1: 00:00

Well, no.

Speaker 2: 00:00

I feel that we need to have some esteemed guest who has never been on Twitter Spaces, who has nothing but WhatsApp to get the the phone, the phone that

Speaker 3: 00:12

you I I did wanna start with, it's not a mispronunciation, but I did think,

Speaker 1: 00:18

you

Speaker 3: 00:18

know, the drinking game for tonight should be if, someone used their people are welcome to use the word performance, but they will need to define it after they use it. So that that's the threat.

Speaker 2: 00:30

You're welcome to use the word performance. Adam's gonna make you wear this performant hat on your head that says in large letters, I think performant

Speaker 4: 00:38

is a word.

Speaker 2: 00:39

No. And then it's like This

Speaker 3: 00:40

is what I think it means. Right?

Speaker 2: 00:41

Right. I so you also I mean, I do not use performance, obviously.

Speaker 3: 00:48

No. I no. I mean I mean, I don't know if that's obvious, but I'm relieved. I don't use Performant. I try not to use per I don't use Performant except when I'm, like, accidentally mocking someone.

Speaker 2: 01:01

And and because do we we feel the same way about learnings. Right?

Speaker 3: 01:05

Well, so let's see. I also don't use the word learnings. I think we maybe feel differently, but I I try to only use the word utilize very narrowly, like, when I'm talking about utilization, like, as a fraction of the whole. But Performant in particular, I just feel like people sprinkle around to mean nothing in particular, just sort of goodness, abstractly. Whereas learnings is is a different personality defect to refer to things that, you know, have a much less horrible name.

Speaker 2: 01:40

I think learnings sounds ridiculous, and I think it I'm sorry. And I I do with no offense attended to the I honor those who use it, I guess. But I think it sounds

Speaker 1: 01:49

Do you?

Speaker 2: 01:50

No. I don't. I I I I I think it sounds ridiculous, and I think it is especially ironic that you are speaking to the quality of being educated using a word that sounds so made up is what the I and I accept that well, do I accept that language is evolving? I am instructed that I have to accept that language is evolving. By by whom?

Speaker 2: 02:09

Because the person who instructed me of this was you. I instructed you the languages evolving only to make my own point about something. And that was that was I was only trotting that one out for my own very narrow self interest. There is no way I I mean, and I I I this I have inherited from my mother. If anyone knows how that language should be evolving only at the time that is advantageous to themselves, I learned that from my mother.

Speaker 2: 02:34

That's

Speaker 3: 02:36

Yeah. That that's fine. I mean, it makes sense that that tracks.

Speaker 2: 02:39

Right. At at all other times, language is actually not, in fact, evolving. So right. And so we're gonna try to avoid the word performance or people are going to have it. Sure.

Speaker 2: 02:48

The okay. Can we we'll just get through all these and then be done with it. We can go on to ASEC and everything else. Leverage on the other hand, I know people get very upset about leverage as a verb. That one doesn't bother me.

Speaker 3: 03:00

It, I think leverage like, utilize to me are just there are simpler, more precise stand ins. So I think just says more about the speaker.

Speaker 2: 03:14

Alright then. Okay. Okay. Okay. Alright.

Speaker 2: 03:21

So so and and I'm sure that we've got other classes of these as well. We wanna we'd get on to the the main event Yeah.

Speaker 3: 03:27

That's right.

Speaker 2: 03:28

Which is, so your tweet, this is you are basically, it actually reminds me of you know, Feynman had a line that when I cannot explain something to my freshman physics x class, this kind of famous ex a class of accelerated physics class at Caltech of very bright 18 year olds. And Feynman, I'm sure I'm I'm I'm mutilating the quote, but Feynman conveyed that when I can't explain something to that class, it's because we don't completely understand it. I think he was using that in in relation to to, quantum lecture dynamics, if I recall correctly. But the which I thought was always an interesting way of of kind of the the metric for our own understanding of something

Speaker 5: 04:26

we were talking

Speaker 3: 04:26

a bit about kind of designing products. And we're in a phase now at Oxide where, I think we're less or I I I'm feeling less in design documents and more, like, how do we explain it to our users?

Speaker 2: 04:39

Yeah. Exactly. Yeah.

Speaker 3: 04:40

Right. And and writing documentation for the, you know, varyingly educated user is always humbling. Because if you realize that it's hard it's hard to explain a thing

Speaker 2: 04:49

Totally. It's

Speaker 3: 04:50

like, well, maybe maybe you built the wrong thing.

Speaker 2: 04:51

Well and I and you and I both believe this, and definitely this is a deeply held oxide value that, that engineers should be documenting their own ideas because in that documentation of your own ideas, you learn how the ideas themselves have flaws. Flaws that won't that won't necessarily be seen by other people, by the way. The the subtle flaws. And we kinda got here because you're trying you got a a you your precocious son who is a model for all children, who aside from his natural experimentation with c plus plus, and you're trying to explain async to him. Is that right?

Speaker 2: 05:28

Is that

Speaker 4: 05:28

is that a fair assumption?

Speaker 3: 05:29

And I should I and I should back up for a sec because we refer to my my son. My son's actually variously on this program.

Speaker 2: 05:35

Wait. Your 5 year old was not you're not a No.

Speaker 3: 05:37

That's what I mean. That's what I I just wanna emphasize. The one we talk about for the reason I need to behave in a hurry is the 5 year old, not the 16 year old who is and these are 2 different, very, very different humans.

Speaker 2: 05:49

Very different.

Speaker 3: 05:49

But what's but they do have one unifying characteristic, which is when I say something like, based on my years of experience, the thing that you're about to do is gonna end in disaster. They both have the exact same reaction, which is more or less to tell me to get fucked.

Speaker 2: 06:03

Yeah. Yeah. Yeah. Get that. Yeah.

Speaker 2: 06:04

Yeah. Totally. Yeah.

Speaker 3: 06:05

Exactly. I mean, it it there's a there's a lot unifying that a unified front in that regard. But yeah. So, well, older boy, learning c plus plus, mostly by himself despite my many offers to help, which

Speaker 2: 06:17

is fine. It's good. And that's age appropriate. Age appropriate.

Speaker 3: 06:20

That's good. Exactly. I'm not delighted. But there are there are these rare delightful moments where he comes and asks me a particular thing. Or, you know, I walked into his room and the book that I discovered that he had bought with without my knowledge was one on systems programming.

Speaker 3: 06:33

I mean, talk about being a proud proud father on that one.

Speaker 2: 06:37

Absolutely. I did these are feelings that I actually I don't know anything about these feelings because and is now the time to talk about what I my parenting experience was over the weekend? I feel like we can't get through this without

Speaker 6: 06:49

Oh, I

Speaker 3: 06:49

don't know. Absolutely.

Speaker 2: 06:51

Okay. My so we're I've and I I I I actually, I didn't tell him, and he didn't make me promise that I would put photos on the Internet. But for I'm not gonna put photos on the Internet of my 15 year old who decided that he wanted to get a buzz cut from a friend, over the weekend. It's like a dumbass 15 year old decision in exchange for an Oculus. So a a friend offered it this makes no sense.

Speaker 2: 07:15

And it's like, this is who you are. You're a human guinea pig. Like, you have some self respect, but he was not sure if he gets to buy a video version. Join the marine corps. Pretty much.

Speaker 2: 07:24

So he his friend is giving him a buzz cut, and the Clippers break halfway through this haircut. Giving leaving him with truly and, Adam, you've seen the photos. Truly the worst haircut I've ever seen in my life.

Speaker 3: 07:42

I mean, it does look like an industrial accident or

Speaker 1: 07:46

And to which

Speaker 2: 07:47

my counter is, an industrial accident would have more symmetry than this haircut. That that that that that that a machine that has whirling blades. Oh, and the whatever tragedy that this thing would would do to you, it would actually have more in this haircut, which looks like it is a deliberately bad haircut. It's amazing that. So so while you are, like, trying to to explain the subtleties of asynchronous systems to to your son discovering, you know, walking him through the beauty of systems design.

Speaker 2: 08:16

I am literally, like, trying to help my 15 year old figure out how he's going to, like because he's like, I'm just gonna have this cleaned up professionally. I'm like, walk me through that. Are you gonna wear a towel on your head when you go into a professional establishment? Because when you take that towel off your head, everybody is going to stop and laugh very hard at So there we go. That I I that that's that's off my chest.

Speaker 3: 08:40

Yeah. So so my summer, but but, you know, Will came to me and and was asking about threads and or, because the c plus plus book had had come to threads. And Brandon Arthur mentioned this to you as well, but he, we actually were in Providence as well. So I was able to introduce him to our systems professor, Tom Greiner.

Speaker 2: 09:00

Oh, that's so nice.

Speaker 3: 09:01

Who actually, I I'm I'm gonna bring up again later later in this space, from one of your collaborations, Brian, with him years years ago. But, so, you know, real interest in systems asking about threading, and I was talking about blocking IO and and threading. And and then, one of the things I've been doing a lot of in Rust at oxide is async programming. So started trying to explain that again to someone who has, you know, dozens of lines of code to their name. Right?

Speaker 3: 09:30

Like, not hundreds of lines of code. Like, dozens of so there's not a lot to hold on to. And I realized that my you know, it was tricky to explain. So I started looking through a systems book and a c plus plus book and a bunch of other places. And everything was sort of like, well, a task is like a thread, but it's different than a thread.

Speaker 3: 09:47

And there are these things called futures. And, of course, to practitioners, you know, we get into it. It it makes sense eventually. But I found it really

Speaker 2: 09:55

To explain threading or does it explain the asynch?

Speaker 3: 09:58

Pardon me. To explain Yeah.

Speaker 2: 10:00

I think threading is is actually much more intuitive. It's much easier to explain.

Speaker 3: 10:05

That that at least my experience, you know, explaining it to him and seeing his understanding of it makes sense. Right? I think there are analogies that that seem very straightforward that, that he would Cooks in the kitchen is my go to on that one. The kitchen as these things do. But, but yeah.

Speaker 3: 10:24

But then, you know, async, and the distinction between it, some of the definition of these terms just was squirrely. It is.

Speaker 2: 10:32

Yeah. I mean, it's hard. And also it's like you got in any system, you have synchronous elements in any system and asynchronous elements in any system. And you're not gonna have a system that is purely asynchronous, and you're not gonna have a system well, I think you're less likely to have system that is pure you do have systems, I guess, that are purely synchronous. But not every single operation in a system, I don't think, is gonna be a the asynchronous.

Speaker 2: 10:55

You're you're gonna reserve these for your relatively longer operations, and how do you kinda keep track of the state that you've got while you're waiting for this thing, this longer latency thing to happen with for some value of latency. Yeah. Yeah. And and so the you know, there are a bunch of places I wanna go here, but one of them, Brian, and

Speaker 3: 11:18

you can take it now or later, is thread mod. So, you know, in I think pre Solaris 9, and correct me if I'm wrong, there was a 2 level scheduling model. That is to say, there were kernel threads and there were user threads, multiplex on top of each other. And and I know that async await is not exactly that, but I think there's a bunch of stuff in common.

Speaker 2: 11:40

Yeah. And I think and a bunch of that that that there's kind of the multilevel threading, I think, is a historical accident, 1st and foremost. I think that, like, a lot of that actually comes from people having operating systems. We're talking in the, late eighties, early nineties that are in which multiple threads are not supported. Multiple processes are supported, but threads are not supported.

Speaker 2: 12:05

And if you have an operating system that only supports a single part of control and, you know, you're a wily programmer that is aware of John von Neumann's gift, you can implement what threads on top of that single kernel level entity. And I think that's where a lot of this is born, honestly. As I was just you kind of like because you you have to unpack the arguments for having these different levels of control. And what we had, by the time I walked up to it, in the mid nineties was this very ornate monstrosity of where you have many kernel level scheduled entities, which and in Solaris parlance, were were called lightweight processes, LWPs. And then you had some other number of user level threads, and this is the m to n nature of that comes from you have m threads and n LWPs.

Speaker 2: 13:06

And, like, why would you do it this way? And I and the arguments were I and I remember as an undergraduate just finding, like, these these arguments are not really like, I don't get them, actually, is what I remember thinking. I'm like, I don't understand them because the arguments for that, which end I mean, it kind of there's a big difference between the the end to end thread scheduling model and these kind of pure async systems and async await and so on. So I don't wanna cover them all with the same brush. But the arguments around the multilevel threading were or that it it would make synchronization primitives would be much lighter, and that it would be faster.

Speaker 2: 13:47

And I'm like, not always. And then there were also the these ideas that you could create millions of threads. Well, I'm like, alright. Well, I'm gonna go create a 1,000,000 threads. And, of course, you cannot create anywhere close to a 1,000,000,000 threads.

Speaker 2: 13:59

It's like the whole system would the whole system, kernel included, would grind to a halt when you created, like, on the order of low thousands of threads. And it was very clear that, like, you just gonna take all these arguments. Like, the the arguments just were not supported. They were not evidence based arguments. They were not data based arguments.

Speaker 2: 14:17

They were kind of was it was Conway's law. It was a lot of things going on. So, yeah, that was my you know, early intro to some level of of asynchrony. But that is not, you know, I'm I'm not sure how much of that carries over. How much of the so I'm yeah.

Speaker 2: 14:41

Well, you know, one of

Speaker 3: 14:41

the things that it got me thinking well, first, I started this, you know, kinda of thought experiment of, like, how would I divine create an operating system that was optimized around async await? Like, what would that look like? And I started thinking, like, is that that different from threads or from these 2 level models? And, you know, I also got to thinking back to the 2 level scheduling model, there were these problems, right, that came up from multiple schedulers unaware of each other or only tenuously aware of each other, where, you know, one would be trying to assert affinity that the other was breaking. So, for example, in those, you know, Solaris battle days, you'd have the kernel working really hard to keep the schedulable entity it knew about the ULWP on a particular, on a particular socket back in those days.

Speaker 3: 15:30

And the user level scheduler might be swapping around these different contexts. So, like, know, any locality or affinity was was lost.

Speaker 2: 15:41

Absolutely. And we saw that. And you also had this problem of and actually, in the battle days of Solaris, it was even more pernicious problem. They there was a single scheduler lock that would was held at user level. And that lock, because it didn't have global system visibility and it could pull some of the tricks of the kernel, you had some certain operations that simply were never gonna scale at user level, because it was really hard to implement the system without kind of limited visibility.

Speaker 2: 16:09

And then you're operating, as you say it across purposes with the system beneath you. So do you think that that is a more general feed, though?

Speaker 3: 16:18

I mean, it's, like, I just haven't seen that addressed. Right? I think that, and I think the the the bigger question or that I kept on running into is I I don't think anyone's saying that async await is always better than threads or that threads are always better than async await. I don't think people are making that argument. Although, I guess, in JavaScript, you don't have a lot of options, like, you don't have multiple threads.

Speaker 3: 16:44

Yeah. So then, you know, folk like, I haven't seen folks talking about this the you know, when one approach is more appropriate or how to examine a problem to infer whether async await or threads is going to be a better solution or more likely. And then I also haven't seen, you know, critiques of async await that say, you know, you're gonna you're losing some of these attributes. Right? You're the the kernel is gonna be would normally work hard to give you this locality, and you're losing that.

Speaker 3: 17:19

And for the and so if your problem looks a regular shape, you might not wanna do that.

Speaker 2: 17:24

Yeah. And and what do you think about comprehensibility too? I mean, and this is where Oh

Speaker 3: 17:28

my gosh. Yes. I mean, this is a place where async I mean, now I'm talking a little bit more about Rust async.

Speaker 2: 17:34

Yeah.

Speaker 3: 17:34

But one of the one of the great things about threads is I can walk up to a process, and I can say, what are all the threads that you have, and where are they? Yeah. And and this also got me thinking about the definition of a thread versus a task. But one of the neat things about a thread is that it's only at one place and one time.

Speaker 2: 17:53

Yes. And and that sort

Speaker 3: 17:54

of that may seem obvious that it

Speaker 2: 17:56

This is really, really important. No. They would and Eclipse got a great line about this, about it the the program counter is a very important piece of state, and that's exactly it. You can only be in one spot at one time, and so all of your threads of control can only be in one spot at one time. And we use that all the time in an in optics implementation.

Speaker 2: 18:16

Because, you know, like, we implement a synchrony in Hubris by having a series of asynchronous, but actually, in fact, purely synchronous tasks. Let's say each task itself is purely synchronous. You don't have multiple threads of control in that memory protected region, but you can have the the tasks are asynchronous with respect to one another.

Speaker 5: 18:37

It it may be interesting to note that this is true in almost every operating system. Right? I mean, UNIX has been multithreaded since UNIX 4th edition. And what I mean by that is that once every process, even in ancient versions, UNIX that didn't support user space visible threading, every process was backed by a thread of control in the kernel.

Speaker 2: 18:55

Right.

Speaker 5: 18:55

And so when you would wrap into the kernel, you were automatically running in this multi threaded context. Much of this discussion comes from the fact that the UNIX design specifically was meant to isolate asynchrony. Doug McElroy put it explicitly. You know, IO was not asynchronous as it was in many other contemporary operating systems at the time. So Adam said, hey, what would an operating system look like if it was designed for basically async a wait?

Speaker 5: 19:22

The answer is basically VAX VMS and then Windows NT. Because VAX VMS specifically had queued IO as a sort of native primitive, and then they had this notion of an asynchronous software trap. So you would queue an IO. When the IO is completed, your process would get a trap. And effectively, everything was like you're writing async a wake code.

Speaker 2: 19:44

Yeah. And and IO completion ports are kind of like the thing that that VMS slash n t does, like, pretty well, actually. They've got a good good abstractions to that. Yeah. Yeah.

Speaker 2: 19:53

It's a good point. And, Dan, very good point about the history of Unix. And I feel that, like, every Unix goes through this. Certainly, this happened for I can say for this happened at Unix for sure, and I this obviously happened at Linux where in Linux where someone says, like, hey. Wait a minute.

Speaker 2: 20:10

So I've been told that I need to implement threads, but what are threads if they aren't just processes that share memory? And you would be like, oh, I don't know about and you end up with this kind of threading implementation where, actually, they they are actually processes that are just now gonna share memory, kind of an r fork. The the Irix's old r fork. I guess it's a plan 9 isn't 2.

Speaker 5: 20:33

They were

Speaker 2: 20:33

9. Yeah. Right. Yeah. Yeah.

Speaker 2: 20:34

Yeah. And that ends up having its own baggage associated with it. I mean, I think it's like the thing that that Unix was, to a certain degree, denying that I think, Adam, you're kind of, like, the part of the the the part of the quest is that we do have to acknowledge that user level have a right to their own internal parallelism, their internal concurrency. And, you know, the traditional UNIX model really didn't allow that at all. Like, that concurrency is all external.

Speaker 5: 21:08

Exactly. Actually, that is fundamentally one of the big reasons that this always breaks down in the UNIX model. You have these end to end threading models. But then you have a user space thread, which, oh, golly, executes a blocking system call. Well, okay.

Speaker 5: 21:21

Now one piece of your parallel would have basically gone away. Yeah. Right?

Speaker 2: 21:25

Right. Totally. Totally. And we get to so do you know how oh, god. This is so gross.

Speaker 2: 21:31

This is also disgusting. I wish I could purge from my memory and retain more important facts about life. But do you know how Solaris dealt with this? So you like, how do you deal with this? Like, you are get you because and and just you say, Dan, you've got a blocking system call, takes that LWP out effectively.

Speaker 2: 21:47

So now this user level thread and in this kind of Fantasia, I have millions of user level threads. And one of the one of these hordes has taken this valuable LWP and is now blocked in the kernel. And now, you know, we only have now we have kind of, an n minus one of these things. Do you know how Slaters would deal with the last one?

Speaker 5: 22:10

I I'm not a guest, but it sends a POSIX signal to

Speaker 2: 22:13

the thread

Speaker 3: 22:14

that would Ding

Speaker 2: 22:14

ding ding ding ding ding. We have a winner or we're all losers, depending on how you wanna interpret that.

Speaker 1: 22:20

But yes. Exactly.

Speaker 5: 22:21

I I, you know like, again, I think you go back to the deficiencies in the UNIX design. And at the time, these were not considered deficiencies. I don't mean to paint them in a in a light that is inappropriate, but they purposely made it so that you could you know, they purposely decided we will not expose asynchronous IO operations or

Speaker 2: 22:40

any you know,

Speaker 5: 22:41

you're you're gonna block you're gonna block period in the story. And they just didn't expose any sort of activism that you could do anything other than based on signal to the threat.

Speaker 2: 22:51

Right. And so that's what I mean, in in so whereas it was sig waiting, and, actually, Adam, I don't know if you just said that threadmon actually had the ability to drop a sig waiting on the target process, which is like, totally violating the prime directive of the debugger. And it would actually, like, generate LOBPs. It's pretty silly. Because because, I mean, there was also I mean, the problem with over overloading that mechanism is there was nothing to prevent you from dropping a sig waiting signal on a process.

Speaker 2: 23:21

It's like, here, have this message from the kernel that says you should go

Speaker 5: 23:25

create a

Speaker 2: 23:25

port off movie. It's like, oh, okay. What's going on? Okay. I guess I guess I'll make an LVP.

Speaker 2: 23:30

Like, nothing to do, but you're right.

Speaker 3: 23:31

But yeah. Right. There there is

Speaker 5: 23:34

a place where this mechanism lives on today and I I actually, can you guess what it is?

Speaker 2: 23:41

Uh-oh. No. Where?

Speaker 5: 23:43

The Go runtime.

Speaker 2: 23:45

Oh, this is where okay. The Go runtime does. If you execute

Speaker 5: 23:49

a system call that shunted over onto a special thread, they have

Speaker 2: 23:52

Yeah.

Speaker 5: 23:52

People keep running threads to do system calls. But, you know, like, there's some system calls you don't know whether it's gonna block or not. Yeah. Right? And what they do is they set a timer, and if that timer expires, then you send a signal to the

Speaker 3: 24:05

thread. Oh,

Speaker 4: 24:06

jeez. That's

Speaker 2: 24:07

right. Man. And then, you know, even Rust async has, like, similar pathologies. Right? Like, it

Speaker 3: 24:13

it does a pretty good job or a good job of making sure that it's calling non blocking IO. But if you call some blocking thing, then again, one nth of your thread pool is dedicated to waiting.

Speaker 5: 24:26

Well, I mean, again, I go back to some deficiencies in the UNIX system interface. Like, you really truly do not know when things are gonna block. I mean, so for example, the open system call takes as arguments basically 3 arguments. Right? But there's a path name, which is a string, which you specify as a pointer, takes a mode, and then potentially takes, like, permission flags or something like that if you're creating a new file.

Speaker 5: 24:51

However, it is always synchronous with respect to resolving the file name into a file descriptor. And so if the file name argument exists in memory that's been paged out, because it was part of the read only text segment or or read only data segment or something like that, or if the kernel has to walk through the directory components and those aren't in the directory cache, then you're potentially blocking many, many times.

Speaker 7: 25:14

Yeah. Right?

Speaker 3: 25:14

Yeah. But if if that's some some read only data that's paged out, and you have to fetch it over NFS or whatever, yeah. Like that's a long

Speaker 4: 25:21

way away.

Speaker 3: 25:22

Yeah. I mean, so, you know,

Speaker 5: 25:23

just the things that we don't necessarily think of being blocking are really blocky these days.

Speaker 2: 25:30

Well, yeah. I mean, it's kind of like a very I mean, it's basically the halting problem. Right? The blocking problem basically is the halting problem at some level. I mean, it's very I mean, obviously, they're not the same, but the the it's very hard to reason, especially in across system call boundary where you really don't have any idea, It how and then also a very subtle change implementation can completely change behavior.

Speaker 2: 25:52

And I think that this is gonna get gets to something that I am not in love with about async systems, and, Adam, I'm loving on your take on this. The the one of the challenges in the async systems is they're they work well until they don't. And when they don't, you get this nonlinearity of response time where all of a sudden you hit queuing delays and response time just explodes. And it can be very because the queues in an all async system are often hidden, that they're they're architecturally hidden, it can be really hard to reason about why the hell are we not making forward progress. Who is responsible for progress right now, and where the hell are you is it's something that I find really can be frustrating in an in an async system.

Speaker 3: 26:40

No. Absolutely. And I I think that debugging these systems, both in those kinds of specific cases, but also generally is hard. And, you know, this is speaking for now rust, which is where I've been spending most of my time. I'm a little disappointed by the amount of effort put into making these systems to bug debuggable before tripping them.

Speaker 3: 27:03

And also even making them comprehensible. I'll tell you a, a point of confusion that I ran into, which was when we were building the DTrace, like the USDT interface for Rust. So Ben Nacker and I were working on this. And one of the things that is very common in DTrace is to key data based on the current thread. But most of the stuff we were looking at was async.

Speaker 3: 27:25

So like the thread didn't really matter if you use the current Yeah.

Speaker 2: 27:29

If you

Speaker 3: 27:29

use the current thread, you would it's, it it's as almost as good as random. Right? Like, because the current thread will switch next time you're on the CPU. So, I spent a bunch of time, trying to negotiate with Tokyo, not, like, the community, just the source code, to try

Speaker 2: 27:46

to infer, like, a task ID. Because surely,

Speaker 3: 27:50

I thought there must be this task ID. But now I've realized that even if there were a task ID that was readily accessible and and they're, reasonably or they take the reasonable position of not making that accessible. But even if it were accessible, it would still be wrong because a future can effectively be in multiple places at the same time. So using using the task ID would be sort of meaningless in a way that having a a threat ID is very meaningful.

Speaker 2: 28:19

Right. Because you've got no way and then you all because you also are that there's another really key bit that now no longer makes sense, and that's the stack back trace. I mean, it is actually really nice. A stack back trace is great. Glory be to the stack backtrace that tells you a lot of contextual information from an understandability perspective of, like, where am I and how did I get here?

Speaker 2: 28:44

And that, I think, is one of the big challenges. By the way, I think it's I mean, I definitely agree with you that there's, that there's a lot more to be done for the debuggability of async await in Rust, but, oh my god, if you I mean, this in node with futures. Futures were completely undefuckable. And you would have a a a in particular, you could have a, something that's executed in the future that actually hits, like, a typo. Like, if the this object doesn't exist undefined, you have an exception, it is absolute bedlam and trying to trying to correlate that back to what actually induced it.

Speaker 2: 29:20

This is ultimately what induced the crack up between Joanne and Node. Js. Like, this is the reason we got divorced, divorced. It's over to that exact issue. I mean, of course, as with many divorces, it's like, that was the

Speaker 3: 29:32

it's complicated.

Speaker 2: 29:33

It's it's complicated. Right? It's like it's like it's not actually just about this act of infidelity. It's about infidelity more broadly and what that represents and you know?

Speaker 3: 29:43

Yeah. But so I I remember, I I I remember that you guys went your own path and and were sticking with, with callbacks. But, presumably, those have some of the same attributes. Right? It it mean, sure, futures Oh, yep.

Speaker 3: 30:01

Complicate the matter, but but just the like, it doesn't get you out of some of the the challenges of the intrinsic asynchrony. You're not gonna

Speaker 5: 30:10

get a good back trace out of that stuff.

Speaker 2: 30:12

You're not gonna get a good back trace. We got better on some of that stuff. But yeah. No. It's still I mean, there's a look.

Speaker 2: 30:17

There's a reason that we're in Rust because you are with and I think that part of that real challenge with Node is you are combining all of the laxity of JavaScript with this very kind of potent weapon in terms of an a of an act actually synchrony, and the results are kinda predictable. I mean, you're definitely dropping handguns off at the preschool and then, you know, come back a couple hours later, and it's a mess because, you know, the the, so, yes, we that did not work out well for us. Yeah. So the other aspect of kind of Asynchrony though that was ultimately much more gutting though, and I I I don't know how how this kinda relates. Because I know that like like, look.

Speaker 2: 31:03

I know Alex Wilson was threatening to airline bomb this Twitter space, and I know there are in fact, there may be plenty of you, like and Airline's a really have you used Airline at all, Adam?

Speaker 3: 31:13

No. Never. Have you? Is that something you've you've kinda used in anger?

Speaker 2: 31:18

I would say it used me in anger. The so we had, I have not ever really cut Erlang for a living, but I was deploying a reasonably large Erlang system, RabbitMQ. And when that thing would misbehave, it would go into what we called the rabbit hole. And the rabbit hole was undi undebuggable undiagnosable. It was very hard to determine.

Speaker 2: 31:45

And I think that one of the problems with the kind of the the church of Erlang, if you will, is there's kind of this received smugness that the Erlang model and this kinda activist model that actually has a lot to be said for it, yields a system that is so robust that we tautologically do not need to debug it. And that was my kind of experience there. I'm sorry to be so bitter about the airline experience. I may have had some production outages that were really, really, really painful.

Speaker 7: 32:13

Yeah. I mean, I had similar problems with running RabbitMQ in production. It's it, it was definitely challenging, when we did run into problems to be able to work out exactly how to, like, pull ourselves out of them. As well as just, feeling like we had competence and understanding of how the thing worked under the hood. We just did not have that knowledge to be able to feel confident in, like, moving forward with the debugging process and being able to understand what was happening so that we could find a ways out of it.

Speaker 7: 32:52

The the sole key for us was to move away from that as a technology.

Speaker 2: 32:58

Yeah. And I don't know. And I'm Yeah. We can hear you. Yes.

Speaker 1: 33:01

And I

Speaker 2: 33:01

know you've got some thoughts on on your way, so hop in here. Yeah. I I was just gonna I was just gonna mention the call. Yeah. I it is very important to

Speaker 3: 33:13

Sorry. You just faded to nothing. Like, we heard you just the you,

Speaker 2: 33:17

and then it faded.

Speaker 6: 33:19

Can you hear me?

Speaker 3: 33:20

Yes.

Speaker 6: 33:20

Okay. Better now. Okay. I was good. Okay.

Speaker 6: 33:23

I think it's very unfortunate that a lot of people's first experience with Erlang is deploying RabbitMQ because that is also my worst experience with Erlang, and I've had very good experiences with Erlang.

Speaker 2: 33:34

Yeah. There you go. I totally okay. Yeah. Absolutely.

Speaker 2: 33:37

I mean, for whatever it's worth, I do feel even at the time, I'm like, I I know that this and actually, I don't think it's a rabbit issue. I think I had a node AMQP issue. The problem is the system didn't help me debug it. So, yeah, go on. I I how have you talk to us about how you actually implement large systems in LA?

Speaker 6: 33:53

Honestly, I'm not the best expert to talk about for this, but what I can tell you is one of the things that, one of the things you brought up earlier about, like, 1,000 to 1,000,000 of threads. Erlang runs on a VM that is designed to be able to handle that kind of situation. And it is very pervasively. It it's it calls it processes, not threads. It is very pervasive in all Erlang code that you write code as processes.

Speaker 6: 34:16

Processes are isolated. You do very small things with it. You might write a process in Erlang where you would normally write a function in other languages.

Speaker 2: 34:24

Right. And a process is strictly synchronous now. Right? You'd everything a process is gonna do is going to be effectively synchronous with respect to itself.

Speaker 6: 34:30

Is that correct? Essentially, yes. And then messages are, I hope I haven't written it for, like, in a few like, it's been a few months. But yeah. Yeah.

Speaker 6: 34:41

Essentially, yes. And then and then everything you do between different processes is all through message passing. There's no other way to, accomplish anything.

Speaker 2: 34:49

Okay. So let me ask you this because then you end up with these kind of millions of entities and totally agree that Veeam supports this well. Do you, have you created a system that is almost biological in nature though in terms of its emergent behavior?

Speaker 6: 35:02

And That is so easy to accidentally do. Yes. Even in small projects.

Speaker 2: 35:08

Yeah. Interesting. Well, in pretty good because I feel like as soon as your you know, what is your reaction the system's reaction to confined resources? And when that reaction is, I'm going to create more resources. It's like, I don't know what I'll do.

Speaker 2: 35:23

I'm gonna create, you know, more processes, more tasks, more actors, more entities, more computational entities for this. You're down the path to ruin, I think.

Speaker 6: 35:32

I can't disagree with you there.

Speaker 2: 35:36

But and so that's it. But it it it isn't I think there's a lot to be said because then the flip side of it is, like, really powerful things do fall out. Right? Because, like, React was a system that was able to be much more easily distributed because it was an airline. I think it's a fair statement, and I we use React too, and React was no nowhere near as dramatic as as Rabbit.

Speaker 2: 36:00

Also, I actually the other thing, Ed, I don't know. For those of you, you you it sounds like you deployed Rabbit. And the the thing I let me just say is a quick aside. The other thing that would drive me that drove me nuts about Rabbit is you would have a problem with Rabbit. You can't figure out what's going on inside of Rabbit because there are only, like, 3 tools, and you've already run them all.

Speaker 2: 36:20

It's like, yes. I I have no queue depth. I I know that my queues are all, like, 0. That's not the issue. Like, there's an issue internal to Rabbit, and you're trying to Google the symptoms that you're seeing effectively.

Speaker 2: 36:30

And the the meth the all, like, the news groups that discuss Rabbit feature very prominently how reliable RabbitMQ is. So, I mean, if you Google RabbitMQ, it's like, you know, RabbitMQ is messaging that just works. And it's like, you are not messaging that just works. That's why I'm here. If you're a messaging that it just works, I wouldn't be googling you right now.

Speaker 2: 36:50

You don't I don't know. I don't know. You had this happen.

Speaker 3: 36:53

Yeah. I mean, you're kind of getting gas lit by the the the I am lucky, Google search there.

Speaker 2: 36:59

Totally. And you're just like, I really need if you we could just be not quite so smug just a second because it's not working. And it's like, I've got a broken outage right now that I tried to debug.

Speaker 3: 37:12

Yeah. Like, let let's get past the marketing material and into the debugging guide such as it is.

Speaker 2: 37:17

Yeah. And if if you've gotten sorry. Go ahead.

Speaker 6: 37:21

I sorry. I just remembered, earlier on debugging. So the first I've said this on Twitter while I was having issues, but the first time I understand I understood a shingle weight was actually when I used the model that's, the SIG programming language has.

Speaker 2: 37:36

So that's pretty interesting. Could you elaborate on that? I thought it was a very interesting comment but I didn't Yep.

Speaker 6: 37:40

Yeah. So it doesn't do futures. It doesn't do promises. It doesn't do anything like that. When when you do an async call on function, what you get back from that is a stack frame.

Speaker 6: 37:49

You save that stack frame, and later when you want the value of whatever you had executed, you await that stack frame, and then that you that's what turns into the value. It's essentially the same idea as a future or promise, but it's explicit about what's under the hood.

Speaker 2: 38:05

Okay. So that is really interesting because that is basically taking one layer of magic away. That's saying, like, look. I'm gonna make this marginally less magical for you, but then by putting you kind of explicitly in charge of it, you're gonna have a better idea kind of where these frames are and when they get

Speaker 6: 38:21

re executed. And that is exactly that that is very much the Zig philosophy. Everything Zig does is kind of in that realm. It gives you control. It takes away a layer of magic so you can understand what is actually happening.

Speaker 6: 38:35

And then another thing about that is it it's by default when you run a program in Ziv, when you're using single wait, if you don't use the event at IO system, it's all single threaded. You you can use async and await, but you're you only have one thread of execution. What, if you if you do something that's a blocking call, I believe under the hood, it will go and execute a wait, async things that you haven't awaited on yet. But, normally, what happens is just it whatever async call you do is suspended until you actually await on it again. So it's essentially lazy evaluation.

Speaker 6: 39:07

Or you can use that to basically also do coroutines, which under the hood, it also has suspend and resume, which are actually used to implement async await, which are essentially the same thing as coroutines would would you.

Speaker 2: 39:20

Interesting. Of course, insert the path of ruin that Dan described earlier of not being able to reason about those things, especially in the kernel that actually block. But

Speaker 6: 39:28

Yeah. But here's the thing. Because it's stack frames and because of how Zig handles some of its debugging stuff, you like, when you break on something and you look at a stack trace, you can't actually get much more of an idea. Or if it or if it breaks and dumps to the console, You actually still do get an idea of what's going on even with the async and await and the coroutines in there.

Speaker 2: 39:47

That's interesting. And is debugability kind of one of their design centers for this?

Speaker 6: 39:51

They do try to make things very debuggable. Yes. I mean, the whole the whole idea is it's supposed to be a replacement for c that doesn't have c's pitfalls in a similar way Rust kinda replaces c plus plus So a lot of it is about making sure you don't do the wrong thing and you're able to debug when something does go wrong.

Speaker 2: 40:07

Yeah. That is that is really interesting. Hey, Quickaside, has Twitter reduced the number of speakers we can have, Adam?

Speaker 3: 40:14

Oh, really? Can we only can we only have a few now?

Speaker 2: 40:16

Do we So, Jimmy, I'm trying to approve you.

Speaker 3: 40:18

Yeah. Yeah. I I was able to approve him.

Speaker 2: 40:20

Okay. That's good. Alright. Yeah.

Speaker 3: 40:21

I don't know. It still says 5 spots open, but now it's there's an error at Jimmy. Yeah. Let's try again.

Speaker 2: 40:28

Where what what secret console are you on? Are you, like, on the Twitter control plane? Where are you see this?

Speaker 3: 40:33

Yeah. Yeah. You don't have the control. No. It's, like, just on the

Speaker 6: 40:36

I have to wonder if Twitter thinks there's, like, 5 of me on or something.

Speaker 2: 40:39

No. No. No. I think we always see we'll see one of you. I think it's just that I can't see you.

Speaker 3: 40:43

Alright. Yeah. Yeah.

Speaker 1: 40:43

Sorry, Jimmy.

Speaker 3: 40:44

I don't know if you're, like, home. You got your,

Speaker 2: 40:46

you got your head up. I'm sorry, Adam.

Speaker 1: 40:48

Oh, I

Speaker 3: 40:48

was just saying I don't I

Speaker 2: 40:49

don't know.

Speaker 3: 40:50

Jimmy, if you're calling in from your desktop or something, but I don't know. We're not unable to approve you. Sorry. Go ahead, Matt.

Speaker 8: 40:56

Yeah. So, so getting back to the topic of, how how to introduce async to a, yeah, to to a bright but inexperienced programmer like your son and, at at at the, at the risk of getting off onto a bit of a tangent about my misadventures as a young programmer discovering async, oh, about 21 years ago. Well, first of all, I wonder if it might help to start by in in instead of instead of working at a high level of abstraction where where you look at tasks and and futures or promises, maybe start with, maybe start with a callback based approach like, like old school node so so so he can understand a little more of what what's actually going on. And now, of course, if you wanna really go down to a low level, you would write your own event loop using select or what have you. But, so that

Speaker 2: 42:04

is an interesting approach. That does require though that and I always felt that JavaScript is not really, taught very well because they don't get into closures early enough. That requires you to really teach about closures pretty early and about how state like, where is this variable state coming from and what does it mean when you modify this variable state?

Speaker 8: 42:24

Well, when I so, when I first stumbled into, async, in 2001, It was in, it was in Python, and, I briefly looked at the what?

Speaker 2: 42:39

Is this Twisted Python?

Speaker 8: 42:41

Yes.

Speaker 2: 42:42

Yes. God. Ivy so I've never have you ever used Twisted? No. I only know Twisted from the the the wounded coming back from the front.

Speaker 3: 42:54

Okay. What what is what is twisted Python?

Speaker 2: 42:57

Go ahead, Matt. You wanna explain what twisted is?

Speaker 8: 42:59

Sure. So so Twisted is this this whole, async, framework for Python, and they've they've they've got implementations of several protocols, HTTP, of course. They they also have, like, a a DNS client and server and and implementations and varying levels of maturity of several other protocols, but, it's basically a, you know, an event loop, framework for, yeah, for for Python, for, yeah, mainly for writing, network servers and clients, although they they have integration with GUI toolkits as well. Brian, I you you you've got me curious now about, your, sales from the wounded.

Speaker 2: 43:52

And maybe I'm just, like, self selecting for the wounded by getting involved in Node in 20 10, but everybody coming to Node in 2009, 2010, especially 2010 was coming from Twisted and from Eventbrasheen in Ruby and just having tried to build systems in Twisted that were that became, like, absolute monstrosities that they could not reason about. And with with the emergent behavior and production that was just incomprehensible.

Speaker 8: 44:19

Oh, well, I could I could tell you I could tell you a story about some of my own emergent behavior back in 2001. And and, in particular, I was I was just thinking about you you you talked about, accidental blocking earlier, And, I had to deal with, an elusive production bug, in in my first twisted based server that, was because it was it was, it was, spawning a process and communicating with it using pipes, and there was a bug in Twisted where I think they forgot to actually set one of the pipes into non blocking mode. So it was hanging the one and only thread.

Speaker 3: 45:04

Whoops.

Speaker 8: 45:07

So I but but, I so, when yeah. Like I said, I started I started on this when I was, when I was about 20. So a few years older than than Will, and I had possibly been programming for much longer than he has so far at that point. But, still writing servers was new to me. And I I think my very first serious attempt at writing a server was, so back back then, I was I was really into the, the Shoutcast protocol for, like, Internet, yeah, Internet radio stations, you know, like streaming audio.

Speaker 8: 45:53

And it's it's basic well, that there's two sides of it. There's the, there's the the the the the the the sending side of it where where someone running Winamp with the shout cast plug in is sending, sending an audio stream to the server, and that's that's basically just, you know, send, like, a sudo HTTP. No. No. No.

Speaker 8: 46:21

It's it's a custom TCP protocol, and I I don't wanna get too off into the weeds here, but, you know, send, like, metadata about your stream and then just start sending m p 3

Speaker 2: 46:32

on And and do I understand correctly that the shout cast is that the shout in shout cast is actually all caps?

Speaker 8: 46:39

Yes. Yes.

Speaker 2: 46:40

The shout is actually shouted?

Speaker 8: 46:42

Yes. And and and fun fact, the, the mixed case processing heuristic that some screen readers use, pronounces that as shoot cast because it's s h o u and then it thinks that the second word is capital t, cast.

Speaker 1: 47:01

Right.

Speaker 3: 47:01

Speaker 2: 47:02

Yeah. Maybe that's the best decision to to make a yeah. Like, interesting. Well, it's so so but but that's interesting in terms of, like, you are getting kinda started in programming, writing servers for stuff that you wanted to go do. Do do you is this what

Speaker 8: 47:18

you mean?

Speaker 2: 47:18

The the official

Speaker 8: 47:19

I mean, Shoutcast, of course, had an official server, but the there was this community based Internet radio station that I was involved with. And we we we didn't like the buffering behavior of the official Shoutcast server. So I thought I'm gonna write my own. And go ahead.

Speaker 2: 47:37

Well, I was just gonna ask is, like, is I mean, it is kind of interesting because I do feel that that you your introduction to async is when you have these extremely long latency events. And the longer latency the event and the more of them you wanna do concurrently, I think the more likely you are to appreciate what asynchronous operations could do. If you're so, Adam, I guess one question would be, like, are there things that the youth do today? Like, when when the youth are not actually getting buzz cuts from their friends that were probably that way through, wait. How how does the other half live?

Speaker 2: 48:11

How how does the how does the product how did the productive use live?

Speaker 3: 48:14

You know, he's mostly doing stuff with games, so I don't know that there is, like, a really strong motivating use case. And but I do think, Matt, you're you're right that actually, I mean, I think even my own use of async, like, the fact that I can use async productively all day, you know, just empirically without being able to rattle off a useful definition, I think is, you know, is is a testament to what you're describing. But it also just made me uncomfortable to think, you know, sure, I can show you how to do it. I I can even show you why it's better than other situations. And just disappointed that there's not it's sort of not well documented and not well understood and and maybe not even covered in systems textbooks.

Speaker 3: 48:56

I think this is a place where the the practice has has outrun academia. That one's true. That coverage.

Speaker 8: 49:02

I wanna see if I can articulate why I thought that async was worthwhile for this project because I my my async implementation was actually my second iteration of that server. My first very naive implementation of a ShoutCast compatible server was, multithreaded. So, like, a a thread for the the source connection, that's that's the the client that's sending the audio stream, and then a thread per listener. And the thread per listener was just kind of sitting there in a read call waiting for the listener to disconnect after after it had received the, like, pseudo HTTP request and sent the pseudo HTTP headers. But and then the thread for the source would, like it would receive some data from the source, and then this is where the very naive part came in.

Speaker 8: 50:00

It would go in and sequentially make blocking write calls to, to all the listeners in sequence. I said sequentially, didn't I? To, to send out the audio. And, what we found when we were testing when when my my friends and I were testing out this server was that, like, if someone who had, like, a flaky dial up connection would would because this was a thing in 2001, would, would tune in to the, the audio stream, then they would kind of mess it up for everyone else because, you know, blocking right syscalls in sequence to to send the audio out to all the listeners. So I thought, okay.

Speaker 8: 50:46

I need to do something different here. And then combine that with the fact that, my second iteration of the server was going to be more complicated because in addition to just replicating what ShoutCAST itself did, but with no upfront buffer because we wanted lower latency. To, like, send the stream off to an instance of the LAME MP 3 encoder. That that's where our pipes to another process came in so so that, we could, yeah, re encode it down to a lower bit rate for those pesky dial up users. And so I thought I'd

Speaker 7: 51:29

be curious you're using lame as

Speaker 2: 51:30

a proper noun, not an adjective there. Right.

Speaker 4: 51:32

Right.

Speaker 6: 51:33

Right.

Speaker 8: 51:33

Yeah. Yeah. Yeah. The the the most popular MP 3 encoder then and now is called LAME.

Speaker 2: 51:41

Yeah. Interesting.

Speaker 8: 51:42

So, so anyway, I I thought, okay. I'm I'm gonna have some concurrency here, and and do I really wanna try to get my head around threads and locks and condition variables and whatever synchronization primitives I had back then? And somehow, fortuitously, and I I don't remember how, I stumbled upon well, first, just the concept of async through the, the async core module in the Python standard library. And then I found Twisted, which was better because it would let me do things with with I I think the big deal was was that I could set timers so I could set I I could more easily do timeouts. But I thought, okay.

Speaker 8: 52:33

With if I if I have just one thread with mutable state and callbacks, I thought that I could get my head around that more easily than, you know, actual multiple threads.

Speaker 2: 52:47

Yeah. That's interesting. That's a good way to think about it. And and a very good kind of motivating example. We do have a couple of hands that are up, so I wanna get in.

Speaker 2: 52:55

Okay. The the the other thing I I, we we are are gonna need to we're end, closer to an hour, for so Adam can get to his, not the not the not the teenager who I I mean, there there are certain things that all teenagers have in common. I don't think your your teenager, like, you could be on here all week for, you know,

Speaker 5: 53:15

like, you know Absolutely.

Speaker 2: 53:17

But but you've got a 5 year old who's gonna want dinner. There we go.

Speaker 3: 53:19

Get a

Speaker 2: 53:19

it can go to his mother. So, Ian, I wanna get to you and then to Jimmy. Ian, go for it.

Speaker 7: 53:24

Yeah. Just briefly on RabbitMQ, the main challenges we had were around network partitions and general cluster interruptions. We luckily had a bit of a out there in the, this was used to be able to feed messages from, the the Trello monolith through to the WebSockets service. So we could recover mostly by kicking everyone from WebSockets and forcing them to reconnect and catch up.

Speaker 3: 53:54

Yes, I do.

Speaker 7: 53:55

And the catch up mechanism was not using RabbitMQ, as a storage device. So Right. Right. We we, did have a way out, but it was not pleasant. In terms of twisted, HitChat was written on on

Speaker 2: 54:11

PlayStation. Yeah. There you go.

Speaker 7: 54:13

But its successor was not, so that may give you some indication as to how the experience went.

Speaker 2: 54:22

Right. Exactly.

Speaker 7: 54:23

Instead instead of using Node though, they did move to Go line. But, again, I think that that was motivated by, our desire to have a kinda a more bait more baked in, story as to how to handle, concurrency.

Speaker 2: 54:42

Totally.

Speaker 7: 54:43

In terms of, answering Adam's original question, I think that the, HHBM slash hack, description, of async operations is is pretty solid in terms of the why, and talking about hiding a I o latency and and data fetching latency. I think this is one of the big motivating factors for, for Facebook to fork away from PHP. Beyond the, the main performance, like, straight line performance reasons. This actually unlocks quite a few performance benefits for them.

Speaker 2: 55:28

Yeah. Interesting.

Speaker 7: 55:29

But, yeah, the overall the the the diagrams are good. The code example is not so good, but the diagrams are good as well. It's just the high level explanation is as a as a large Node. Js code base on Trello is, kind of, with async await is that event loop starvation problem where, we hit a certain number of outstanding async operations and each of them can only get a a small slice of the event loop at a time, and no one is making meaningful forward progress. And, when programmers first encountered that in production, often the reaction is, oh, we'll increase our concurrency and that does not help the situation.

Speaker 7: 56:18

Right. So there there's a need to be able to, for programmers to understand that underlying event loop model and potentially dip into a CPU profile or something to be able to realize what's going on

Speaker 6: 56:30

there.

Speaker 2: 56:30

Yeah. Interesting. And this is a great resource. I don't know. Adam, if you've the HSPM stuff.

Speaker 6: 56:34

Yeah. I was

Speaker 2: 56:35

just taking a look at that. Yeah. That that that's that that's a really I I really like the way their visual approach to explain what's going on.

Speaker 3: 56:41

Yeah. And, you know, a friend of the show, Keith Adams, had a big hand in HHVM.

Speaker 2: 56:46

Yeah. Okay. Although, I I I'm not detecting Keith in this documentation, I have to say. Jimmy.

Speaker 4: 56:57

Alright. Can everyone hear me?

Speaker 6: 57:00

Yes.

Speaker 4: 57:00

Yep. Excellent. I believe the reason why I couldn't join before, literally iOS 16 update. Nice. But, yeah, I I just rebooted it right after that, turned off and on again.

Speaker 4: 57:11

Everything's fine.

Speaker 6: 57:12

Oh.

Speaker 4: 57:13

Yeah. Hearing, how, like, the Zig concurrency story goes, I could just, like, hear the Vietnam helicopters, like, in the background being like, oh my gosh. No. Like, this is exactly kind of, how I was dealing, like, a unicorns. You're kind of taking the current state or, like, even, things like gunicorn, which is like the Python, like, unicorn, like, Ruby clone, is basically, like, freeze the stack, pause it, like and then multiplex based on different, like, IO events.

Speaker 4: 57:54

And so all of this is, like, kind of similar hacks.

Speaker 1: 57:57

Speaker 4: 57:58

I just, like, got immediately triggered when I heard that, and was reminded even more so of what I think is the number one async hack I've seen in any programming language ever. So I know that's, like, a lot of buildup, but, it's a it's a thing called Trollius, which came in Python ecosystem right after

Speaker 2: 58:20

Trollius?

Speaker 3: 58:21

Like like a troll?

Speaker 4: 58:22

Okay. Yeah. T r o l l I I u s? U s? I think.

Speaker 4: 58:27

Yeah. I u s. And so so what it was is someone decided that they were, like, crazy to try to back port async IO to Python 27. Oh. So

Speaker 2: 58:40

Oh, no. How do you

Speaker 4: 58:41

do that? And I actually built a production system on this for years. It was truly awful because it was easier than, updating Python to Python 3. Yeah. So so here we go.

Speaker 4: 58:54

Someone realized that you could get multiplexing of, like, different coroutines by abusing the exception model in Python.

Speaker 2: 59:04

Oh, no. No. No. No.

Speaker 4: 59:07

You would literally raise. You would raise when you wanted to pause pause the system, and, like, yield to another co routine.

Speaker 2: 59:17

Oh, gotcha.

Speaker 4: 59:17

And then there was a core event loop that worked around catching exceptions and, like like, iterating to the next the next, like, state machine to run. And

Speaker 2: 59:26

and there should be something to describe this because there I'm sure to whomever this occurs them, it's like, oh, we can reuse this mechanism for this radically other different purpose. And it's like, you can but now it is

Speaker 6: 59:42

That's going to give me nightmares.

Speaker 2: 59:44

Right. Exactly. I know. And it's it's it can feel elegant. It kinda reminds me of like, oh, like, I'm gonna have like threads by having, like, process your memory.

Speaker 2: 59:53

It's like, it may feel elegant, but not having the purpose fit abstraction there and reusing this this other abstraction that doesn't really mean the same it's not intended to be the same thing. It can be just absolutely brutal. Yeah. Jimmy, that must have been, the the combat pay for sure.

Speaker 4: 01:00:10

To to be fair, it did make when well, so I actually stopped working on this project. This is actually Quita IO which is the first private Docker registry. The build system.

Speaker 2: 01:00:21

Right. Yeah. Yeah. I remember that. Yeah.

Speaker 4: 01:00:22

Yeah. So this was what was building everyone's containers. We had this Python, like, build up integer and You

Speaker 2: 01:00:28

typed up pronunciation. You pronounced that quaidot i0, not key.io.

Speaker 4: 01:00:33

It's it's it's complicated. But

Speaker 2: 01:00:38

because I was always pronouncing that key I was pronouncing that key. It's it's q u a y which I would I mean, you know, I would pronounce key, but I guess I guess it was alright.

Speaker 4: 01:00:48

It is it is key. The founders are both American, so they, they say key.

Speaker 3: 01:00:53

I I was like Oh, dear. Retconning.

Speaker 5: 01:00:55

Oh, god.

Speaker 4: 01:00:55

Key. It's the key tier containers of the first private registry. Like, come on. It's that's a good pun. Like but no.

Speaker 4: 01:01:03

But yeah. So so it had a build manager. We're just orchestrating building Docker files for people, and, that build manager was kind of like a distributed process. It ran on every single node that was running Quay, and it basically talked to SED to maintain state and, like, format jobs, like, to Kubernetes that then built people's, built their containers in a VM. So, like, there's a lot of this async state happening in each of them.

Speaker 2: 01:01:26

Was this a pleasant system to be in? This sounds like a This was an awful system. Okay. I mean, I am gonna I'm gonna wake up in, like, cold sweat. This system sounds it sounds like that would be a system that would need some production level feeding.

Speaker 2: 01:01:41

That would be really frustrating.

Speaker 4: 01:01:44

You would get a stack trace, and there was no debugging, the stack trace. Like, you you because you you would get a stack trace. That was the state of the event loop of,

Speaker 6: 01:01:54

like, the previous 3 cover team.

Speaker 4: 01:01:57

It was, like, the current cover team and, like, the 4 behind it. Like, there's nothing for you to look at.

Speaker 3: 01:02:02

Just like, is this your card? No. That's not my card.

Speaker 2: 01:02:05

No. It's not my card. Helpful.

Speaker 6: 01:02:06

Hey.

Speaker 2: 01:02:06

How about these other 3 cards? I'm like, I don't even know who these those people are. One of those is attached to a severed finger, by the way, and I've got no idea who any of those people are. So yeah. That's, yeah.

Speaker 2: 01:02:15

Oh, man. That sounds brutal.

Speaker 4: 01:02:17

But it did make eventually basically migrating it to to Python 3 quite quite simple.

Speaker 2: 01:02:24

So I I do think there is an important point here in that the that the abstractions have improved over time. Like, we are getting better at at these abstractions, and one would not, I would love to believe, build that system de novo today. Today, there are better things, whether it's Rust or Zig or even Go. There are better things to pick from today.

Speaker 4: 01:02:44

I have to say, there is, like, one thing that kind of frustrates me about all these systems. Like, you were saying earlier, and I don't know if you talked about this because I tune in late, but it it basically just that, like, every once you start hitting, like, you start hitting basically the run time skewing throughput, like, everything starts to fall apart. Mhmm. Why the hell has no one given us a handle just to nice things? Like, we have nice

Speaker 2: 01:03:11

Okay. Because okay. That that that's a great question. That's a great and it's because you are so actually, Adam, I'd like to say if you need to go address Yeah. Yeah.

Speaker 2: 01:03:21

Yeah. I I no. No. That's fine. You can be around it.

Speaker 2: 01:03:24

I I will be Adam Adam has warned me that I'll be I'd I'll be solo when we hit the top of the hour, which is fine. The because I wanna get the other hands that are up. But I think that this this is a great question. And it's because it's the asynchronous systems are, to a certain degree, they're telling a bit of a fib with respect to what the system can actually do. And once you are exhausted the fib and you are hitting queuing delay, I don't know really know what the system can do because it's like you yeah, like, you overbooked the flight.

Speaker 2: 01:04:00

You know? It's like, yeah. Like, some of these people can't travel to Denver today, because they we don't actually and the the the result is gonna be I mean, to me, it's like that is endemic, and the question is more, can you please help me debug all this stuff? And don't let these queues be hidden. I mean, one of the problems that we had and the, with with post crash replication, true asynchronous nightmare where the the you've got walls being the right head log being shipped effectively asynchronously and then applied synchronously.

Speaker 2: 01:04:39

And that once that backs up, you've got no visibility into it, first of all. When that backs up into the primary, it is really hard to figure out what the hell's going on and why. And it's it's a now you got a multisystem, asynchronous system that's misbehaving. I wanna get get to, Rob, I wanna get to you and then to to Brantley. So, I I love Kelsey Hightowers.

Speaker 2: 01:05:04

Unmute yourself and introduce yourself. But, Rob, you there? Yeah.

Speaker 1: 01:05:09

Hi. Can you hear me?

Speaker 2: 01:05:10

Yep.

Speaker 1: 01:05:11

Awesome. Yeah. I, I sort of just could through my experience, could speak a little bit to, like, what async kind of is or what we use for, which is a question that I don't think has a clear answer. It's so I university first week in there. I got right into, chat.

Speaker 1: 01:05:33

So, like, and talking to people and these, you know, or some more thing. And I became like, how does this work? And so a copy of Unix Network Programming, the Stevens book, which was, like, a significant outlay for me at the time as a as a grad student, but, you know, I saved up for that season. It's it's still on my shelf. I I refer to it, at least once a month, for some detail of TCP, but, you know, I started writing the very basic, like, start with the echo server and and these sort of things.

Speaker 1: 01:06:16

And there is a section in there about well, here is you know? So you have found that if you want to read from 2 sockets at the same time, one of them will block. And here is a bunch of strategies for working around it. You know, this is old enough that it's really it's talking about select, and this was just after, like, POSIX threads that appeared. So it talked a little bit about that.

Speaker 1: 01:06:35

And so I spent a lot of time learning these things, and but it was always the question it was always trying to answer was, how do I not block? And that was that was the only thing. I didn't have parallel computation to do. I didn't have you know, this is just chat. This is like you are paused most of the time.

Speaker 1: 01:06:56

This is just making sure you don't block. And from there, I, you know, went on to learn about, like, Dan Kegel, the the c 10 k problem and learned lots about that. How do we make systems deal with 1,000, tens of 1,000, 100 of 1,000 of connections. Then my career has mostly been as a Sysadmin adjacent to that. So now, you know, you're running your engine x, which is, you know, here are a 100,000 connections that I'm proxying through to somewhat somewhere else.

Speaker 1: 01:07:22

The entire purpose of this thing is to accept the connection, create a back end connection, and get out of the way and let the kernel proxy that. So, you know, I mean, these are all good things, and, Brian, I know you've had some experience with epol. We don't need to talk about it specifically, but the the

Speaker 2: 01:07:39

But that's the EPOL implementation. I mean, the I I actually like, I I the model is is very useful. I think EPO or Ben Torch or IO Complete support for audio. They're very useful model.

Speaker 1: 01:07:49

Oh, absolutely. Yes. And I mean,

Speaker 5: 01:07:50

this And, the Twitter base is just translated equal as Ebola. Is that what it is?

Speaker 4: 01:07:59

You know,

Speaker 2: 01:08:00

you know, putting spaces, maybe I've misjudged you. I know.

Speaker 1: 01:08:06

It's very good. So so I guess and now, you know, I'm I'm still running those kind of systems. And but for me, yeah, the the story of, like, async interfaces to the kernel at least has been about getting my user space program out of the kernel's way. And I mean, on Linux recently, I don't claim it's the first time. It's just what I'm familiar with.

Speaker 1: 01:08:27

There's recently been work on a thing called, IOU ring, which is

Speaker 8: 01:08:31

Yep.

Speaker 1: 01:08:31

Base basically an alternate Cisco layer that has a shared some memory shared between the kernel and user space that lets you submit requests directly into this memory pool and get your get the results of those back out. So it's an alternative syscall interface, and it kind of splits the syscall in half. So it's very it's kind of future looking if you squint a little bit, but, but just that idea of just getting into the kernel as fast as I can. So that's been like most of my career.

Speaker 2: 01:09:02

Yeah.

Speaker 1: 01:09:02

When when futures and promises became a a sort of a, you know, a big idea in languages, the developers I work with, you know, started looking at that and introducing it, and I did not get it. And I did not and and, like, I got there eventually, but it didn't fit onto any model I had. And I'm still, you know, this year, and we're coming to realize that it's because in some ways, we're kinda talking about different things a little bit. Like, what what we why we actually want to do async, why we what concurrency means, what we're trying to do. They're trying to do work, and what's the next thing, and what's the next thing, and what's the next thing, and kind of complete this multi sequence task, you know, as as we're ready to do the next phase, whereas I'm just like, accept connection, get the hell out of the way, and it's been I'm I'm not making a point.

Speaker 1: 01:10:01

I'm just speaking to that idea that there were kind of different views into into this, and maybe there's something about how we teach it that in that

Speaker 2: 01:10:12

Yeah. You

Speaker 1: 01:10:13

domain. The

Speaker 2: 01:10:15

interesting appreciate the contribution. So I, Brantley, you're gonna you're gonna take us home here. Cool. I just wanted to,

Speaker 9: 01:10:25

say a little bit about the history of Python and Yeah. I think, which is, you know, back in 2 point whatever. I don't know. They the people from Eve online came together and and made stackless Python, which I don't know if anybody remembers, but it's basically, like, taking the stack out of c, and they were able to do a lot of concurrency based on that and the green thread.

Speaker 8: 01:10:51

Basically, coroutines. Right?

Speaker 9: 01:10:53

Yeah. Exactly. And

Speaker 2: 01:10:55

then some really smart people figured out, well, we can just do that in Python, and

Speaker 9: 01:11:00

they made a Ventlet, and then Gevent came out of that. And I was using I mean, gevent, I think

Speaker 2: 01:11:09

I mean, event label was just amazing. You could do these async operations without having to

Speaker 9: 01:11:16

deal with, you know, all of this threading stuff. And and so from my perspective, it has always been async has always been sort of counter the threading. These threads, you know, just steal you know, you're going along, and then it just steals the the pointer away from you, and it's doing something else. And you have to lock and do all these things. So from my perspective, a really good way to teach it is maybe teaching threading and then teaching, async as as cooperative threading.

Speaker 9: 01:11:47

Right? The idea that I'm I'm giving up, execution when I'm done and then or or when I'm waiting on a IO, I'm giving up execution.

Speaker 2: 01:11:59

Yep.

Speaker 9: 01:11:59

And then you can come in.

Speaker 2: 01:12:01

Yeah. It's no. I think it's a good point. And I I I also think it's interesting that it was embedded by Eve Online, which I swear I I am not I'm not I only hear about Eve online when there's this, like, apocalyptic event happening in what feels like this parallel universe? It's very Yeah.

Speaker 2: 01:12:15

Right. I mean, if there's, like, some grand act of, like, intergalactic warfare happening, and I'm like, wow. This is a very ornate world.

Speaker 9: 01:12:24

So Imagine what's happening in their servers at the same time.

Speaker 2: 01:12:26

I mean, right. Totally. No. That's it. It's like yeah.

Speaker 2: 01:12:29

That's like I could see why you have to invent new kind of concurrency mechanisms.

Speaker 8: 01:12:33

So what you might not know about Twisted is, Twisted originally came out of a multiplayer game project. The the the the original author of Twisted was working on a game called Twisted Reality, And and, I I don't know whatever became of that, but, that that the the the framework, came out of that project. And I I gotta see if I can dig up his his earliest writings about, async versus threads because I

Speaker 2: 01:13:09

I mean, he

Speaker 8: 01:13:10

he I I I I remember reading a a fairly convincing explanation of, you know, of why he thought async was better, and I the and, of course, because because we had to make a religion out of it, I was a convert.

Speaker 2: 01:13:27

Right. Exactly. Alright. Somebody Yeah. Yeah.

Speaker 2: 01:13:30

See if you dig that up. Meanwhile so we're, unfortunately, we're we're kinda out of time here. I got I'm gonna have to I gotta split myself here. I know Adam's already taken off.

Speaker 3: 01:13:38

Brian, do you wanna tease losing the signal?

Speaker 2: 01:13:40

I definitely wanna tease losing the signal. Yes. Thank you. Okay. Thank you for for, reemerging long enough to the so, we are really excited.

Speaker 2: 01:13:49

We talked about losing the signal, last week. This is this terrific book, by Jackie McNish and and Sean Silcox on the rise and fall of RIM. It is mesmerizing. It is very well written. It's very well researched, and, very happy to announce that Sean is gonna join us in 2 weeks.

Speaker 2: 01:14:10

So no Twitter space next week. I'm gonna be traveling to the open source firmware conference, But in 2 weeks, Sean is gonna join us, and we're gonna talk losing the signal. And, Adam, you and I had to preview that conversation today, and think that's that's gonna be a banger.

Speaker 3: 01:14:25

Oh, gonna be awesome. And, so so give it a read if if you wanna participate and soon to be the basis of a major motion

Speaker 2: 01:14:33

picture. Picture. Exactly. So

Speaker 3: 01:14:36

lots of lots of reasons to check it out.

Speaker 2: 01:14:38

Yeah. It's good. And, outstanding book, really. I think actually that particular rise and fall has got a lot to teach us. So really interesting stuff.

Speaker 2: 01:14:46

That's gonna be in 2 weeks, and we'll look for Adam. This is a great topic. Obviously, a lot to be said. I feel like you and I rightfully predicted that this one could go on for days days nights. Absolutely.

Speaker 2: 01:14:57

I think there's that that there's a lot that to be said about this. So thank you everyone for participating, and we'll talk to you in 2 weeks. Thanks, everybody.

Speaker 3: 01:15:06

Thanks. See you.

Creators and Guests

Host

Adam Leventhal

Host

Bryan Cantrill

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere