The Pragmatism of Hubris

Speaker 1:

Hello, Brian. Can you hear me? Hello, Adam. So we, had a bit of a failed mic check. God, I want the green room.

Speaker 1:

The Twitter green room would be so useful.

Speaker 2:

Was, Cliff unable to participate?

Speaker 1:

Cliff was unable to participate. So, we'll see. Cliff is now exploring alternative methods. I would love to I I wish I knew how Twitter spaces worked. I I just like, why does it not work on the desktop?

Speaker 1:

And why it doesn't it doesn't work on kind of arbitrary devices with a mic. It's, the

Speaker 2:

What was he trying it on? Unclear. His his keyboard firmware that he wrote himself?

Speaker 1:

But, like

Speaker 3:

You're not wrong.

Speaker 1:

You're not that's not impossible.

Speaker 4:

Like, why

Speaker 1:

is this not working?

Speaker 2:

I wrote this firmware myself.

Speaker 1:

Right. Exactly. This this firmware, it should be it it looks exactly like an Android device. So I don't understand. Yeah.

Speaker 1:

I unclear unclear. Laura, I don't know if you had a sense of what he was actually He described it as a burner phone. For background,

Speaker 3:

Cliff doesn't install a lot of apps on his phones, and so he does not have the Twitter app installed on his phone. So he couldn't really participate in this, so he's been trying to set the thing up, and so maybe it will work out. But that's kind of the situation we're working on.

Speaker 1:

I hope so. I feel that, like, if it doesn't work out, should just sit here and, and just just talk about Cliff for a while. Oh, oh my goodness gracious. Here he is. Oh, wow.

Speaker 1:

Okay. That's so exciting. Cliff, are you there? Laura, are you there? Laura I am.

Speaker 1:

Laura's here, and I think Cliff might be there. Cliff, you there?

Speaker 3:

Cliff, you're currently muted, but if you hit the button, it should unmute you now.

Speaker 2:

Now you're unmuted, but we hear nothing.

Speaker 1:

We hear nothing. It's so tantalizing. Cliff is like, I actually I found a bug in my firmware, and I pushed to fix. So I'm still calling in from my keyboard with a microphone on it. But no.

Speaker 1:

It mostly works. Okay. You've already so, Cliff, you show up right now as muted. But you did

Speaker 2:

show yeah. Backup record I mean, old school recording, which I still think is the better recording.

Speaker 1:

It is the better recording. Let's just say It is the better recording. No. I I I think, actually, both are gonna be the old school recording is definitely necessary because there's not a they haven't generated a way for you to download an MP 3 out of it that I have seen. Yeah.

Speaker 1:

Yeah. Yeah. I I think all they have done is allowed you to effectively play the recording that they always make.

Speaker 2:

Oh, yeah. I mean, I don't know. It makes sense.

Speaker 1:

Like, they don't want you to they

Speaker 2:

don't wanna put you on someone else's platform. They wanna keep you on their platform. That's fine.

Speaker 1:

You know, I understand that. But then Yeah. The the problem is that after 3 days, it gets deleted.

Speaker 2:

Oh, that's a huge problem.

Speaker 1:

Yeah. Right. So yeah. I was like, Adam is being awfully cheerful about what I think is a kind of a limited facility. Yeah.

Speaker 1:

That's Jake. Okay. Cliff is now reconnecting, and Cliff is gone. So, yeah.

Speaker 2:

Hold my phone up, like, phone to phone.

Speaker 1:

Oh, the oh, we have definitely sorry if you weren't in some of the brainstorming conversations about this.

Speaker 3:

I was talking about OBS setups. Like, we were starting to get a little in detail. Nice.

Speaker 1:

Yeah. We we had a lot of ideas. So I'm hoping that that that Cliff can bounce back in. We will I I still, like, prefer the idea of Cliff speaking through Steve as his representative on Earth. And Steve but Steve but but but Steve only speaking in his own voice.

Speaker 1:

So Steve is like, let me I will convey that question to Cliff. Uh-huh, mhmm, uh-huh, Okay. Cliff has said

Speaker 4:

This is,

Speaker 2:

like, 2 middle schoolers who refuse to speak. Okay, Cliff. Brian wants you to know.

Speaker 3:

All those all those years paying attention in CCD are about to pay off big time, finally. I mean

Speaker 1:

That's right. I was so Now

Speaker 5:

I was thinking Steve, is no. Steve, Steve. Is Cliff in the room with you right now as we're talking?

Speaker 1:

Or Right.

Speaker 3:

So for background, for those of you who've joined or who don't know this yet, one of our last folks who is, like, kind of the architect of a lot of this is, having some technical difficulties. So we're trying to get him going and sort of

Speaker 1:

talking about the analysis. Difficulties is the glass cabinet.

Speaker 2:

Cliff Cliff, feel feel free to shout and interrupt at any point.

Speaker 1:

Can Cliff speak? All eyes turn to the gray egg that has muted itself. Gray egg. Unmute your oh, connecting. The, I so yeah.

Speaker 1:

So you alright. You were going, like, middle score. Your mind went to, like, middle score playing, like social in between. I was viewing more like old testament. Steve goes to the mount.

Speaker 1:

So Oh,

Speaker 2:

hey. We got you, Cliff.

Speaker 4:

Yeah. Twitter on Android seems to not like you to have a headset plugged in.

Speaker 1:

Why would it?

Speaker 2:

Oh. Seems seems like pilot error.

Speaker 1:

Oh, god. Well, I and I was just gonna say, like, you were saying the when you said that, the the Cliffs have some technical difficulties, is the glass half empty version? The glass half full version that I prefer is that Cliff is rightfully a Twitter conscientious objector. And it so, Cliff, thank you very much for being willing to, to join us. We obviously are very excited to have you here.

Speaker 1:

I'm also gonna be entertained how many, followers person with mouth, also known as okay spaces law ends up with out of this because, a terrific account. Oh. Oh. Hopefully, he's back. Cliff, are you back?

Speaker 4:

If you plug in headphones after joining, your mic will never work again.

Speaker 1:

Twitter spaces. I I am sorry. I don't really know what to say. Hey. It's

Speaker 4:

all good. I mean, so what are we talking about?

Speaker 1:

So what are we talking about? Alright. That's alright. Well, right now we're talking about technical difficulties with Twitter Spaces. So we are gonna we are talking about hubris and humility, the a system that we're super excited about.

Speaker 1:

We open sourced, coming up on 2 weeks ago. And we've talked about I think, some of the folks have not read it. We've talked about both some of the history of it. Cliff has a honestly, a Adam, did you watch Cliff's presentation on this, by the way? Have you I don't know.

Speaker 1:

Have you seen it? Yeah. Yeah. Yeah. I did.

Speaker 1:

It's it is outstanding. Really good. It is really, really good.

Speaker 2:

Yeah. Really great.

Speaker 1:

And I I don't I mean I mean, obviously, I'm biased, but I thought it was was really, really good. I just like they because I it is very hard to get pacing right where you've got something that has got dense technical content, but isn't just, like, overwhelming people with a fire hose. It's a failure in both that I may have fallen into.

Speaker 2:

Cliff, if I may, it was annoyingly good. It was like it was like, why why have I never achieved anything close

Speaker 4:

to this?

Speaker 2:

It was delightful.

Speaker 1:

It was very good.

Speaker 4:

Oh, thanks.

Speaker 1:

And it it was which is it was also a lot of work. So, it it it was good to see so many people react so positively to it. Clifton also posted a transcript of it along with a pointer to a FAQ, which also folks should read the fervently answer the fervently anticipated questions, which were great. But, Cliff, I wonder if you might because one thing we have not talked about is the prehistory of Hubris, namely your experience prior to oxide that kind of informed it. So I wonder if I'm not sure where you think the prehistory of hubris starts, but I would, I mean, I would love to know because I I you know, I've got my own kind of guesses, but I'm not even sure I know the answer to this, about the systems that you had used that informed some dissatisfaction or or where you saw some room for improvement?

Speaker 4:

Gosh. Well, in the beginning, when they were only adding machines no. I think most of the formative influences on this came from writing the firmware for Project Loon at Google. We were in a high reliability context that wasn't a human safety context. It wasn't a medical devices context.

Speaker 4:

But it was a case where once you let go of the device and it flies up into the stratosphere, no one can reach the reset button, and you're kind of hosed if anything goes wrong.

Speaker 1:

So Could could you give a little context on Loon? I actually even though because I didn't know about Loon really before working with you, and it's a kind of an it's an amazing project, honestly. So it I think people would love some, sort of a refresher on that.

Speaker 4:

Yeah. I mean, the the short version is is, it's a project I started with a couple of people in 2011 trying to extend the reach of Internet access to more underserved areas using high altitude balloons. And, I originally joined the project because I thought it wouldn't work, and I wanted to prove that, and I failed. So the project was escaping. Yeah.

Speaker 4:

I

Speaker 1:

I I have a a follow-up question that I'm afraid to ask, So just go on, Lune.

Speaker 4:

But I wound up, specializing as the project got bigger on, electronics and firmware, and then eventually, running the firmware team. So that was, that's the context for the need for high reliability Stratosphere firmware. It's kind of like going to space, except that, honestly, space is a lot better understood.

Speaker 1:

So, yeah, what what were some of the similarities? I mean, I guess, the clear similarities are going to space, but what were some of the differences about about the kinda high altitude flight versus space?

Speaker 4:

Well, so I've never done space professionally, but I'm friends with a bunch of people who have. And the main thing that we were able to take advantage of because we weren't doing space is that our launches were cheap. So if if you just have to fill the thing with helium and release it from some unpopulated area far from an airport, you can iterate faster than if you have to pay tens of 1,000,000 of dollars per launch. So in that sense, we were able to, we we had it easy in many ways. We could iterate faster.

Speaker 4:

We could launch new versions of the firmware more often than you could have in a space situation where you pretty much get one launch, I mean, until the Starlink stuff started happening, obviously. But, so that was nice. The downside is that we were building systems that were gonna be constantly buffeted by wind. We couldn't control which direction they were going. They don't stay in a constant orbit.

Speaker 4:

They're moved around by chaotic atmospheric forces. And so we had to build large weather simulators to be able to steer our things around. And, that eventually worked. But, you know, I was that that wasn't my department. I was more concerned with the power and thermal and communications happening in the actual device.

Speaker 1:

Okay. And and what were some of the problems that you had in the system software on that? Some of the challenges you had running the for the firmware for that thing?

Speaker 4:

I mean, there's the problems that we solved, and then there's the problems that we never really solved that led to Hubris. The problems that we solved were we had a relatively small team, many of whom were not trained in the area because I was sort of deliberately recruiting people with non aerospace backgrounds. Because I I started at the top of the software stack doing UI, and I've been gradually working down. And I find that there's a lot of software engineering process and principles that I picked up on the way down that people that were trained mostly from the bottom up may not have picked up. So in many ways, it was easier to pick up people with UI experience who already know things like unit testing and design patterns and teach them firmware than the other way around on our accelerated schedule.

Speaker 4:

So we were working with a bunch of people who didn't have a lot of firmware experience, and we were working in c and subsequently c plus plus So this meant that we needed to build frameworks where a person relatively new to the concept of riding drivers could knock out drivers quickly, without with with a minimum of shooting themselves in the foot. And so we built a framework called Major Tom on Loon that was in c plus plus and was not memory protected, that ran on similar processors to what we're using at Oxide. And that worked fairly well, and Loon was still using it up until they shut down earlier this year. But we had a bunch of frustrations with it because we had memory corruption bugs in flight that turned out to be straight pointer rights from some of the c code. We had, stack overflows and clashes.

Speaker 4:

We had buffer overflows. The sort of things you normally get when you're writing a large c application, Except it's really frustrating when you can't get to the thing to either talk to it on the console or look at the memory through JTAG or pull a crash dump because the thing is on the other end of an incredibly narrow satellite link. So we had to we had to sort of work with the constraints of our tooling. And the whole time, I was wanting a different system. And what that different system looked like evolved over time, but Hubris is sort of the latest run at the ideal of trying to build something that enables fast iteration on a more robust platform, for contexts where you can't get to the system to reboot it.

Speaker 2:

You know, Cliff, that that was one of the questions I had for you because because when you joined Oxide, we were down a pretty different path. Yeah. And I was curious the degree to which you came in sort of knowing the system that you ideally wanted to build. And to be clear, I I don't feel like you, I I think you gave that other path a pretty earned shot. But, did did you have that strong sense walking service.

Speaker 2:

He came to Loon that he was certain it was gonna fail

Speaker 1:

and he just wanted to watch watch out. The service. He came to Loon, that he was certain it was gonna fail. He just wanted to watch watch out.

Speaker 6:

And then prove it. Well and and and I was on the front of me project to Loon at at Google, one of one of many. That's right. And, major Tom was actually successful, whereas everything that we built in in GlobalBit completely failed. You know?

Speaker 6:

So when we talk about the difficulties of of space and things, like, the traditional approach, which is more or less what GlobalBit was trying to do, really didn't even get to the point of of a a single launch, where Major Tom definitely made it much further. And so while Major Tom had flaws, it definitely demonstrated there were alternate approaches that you could you could go. And, you know, Hubris definitely feels like the next version of that in in definitely improving some of the known failure modes and things that you do encounter. Still doesn't address things like, if you're in space, rad event you know, single single upset events and stuff like that happens, it's hard. There's other things you need to do to to deal with that.

Speaker 6:

That's right. This is a step in the direction of there's a lot of things that you can just do to make your day to day debugging experience and overall, like, reliability due to code quality be much better.

Speaker 4:

And a and a belt and suspenders approach can help a lot there, and that's what we did with Hubris. But to answer Adam's question, parts of Hubris closely resemble other prototype systems that I designed in the years since I left Loom. Some of them were on GitHub. But I didn't really have a clear idea. And honestly, you know, when I arrived, we were trying to use Talk.

Speaker 4:

And Talk actually meets a lot of my criteria. It lets you write memory isolated tasks that can fail separately. Talk even goes so far as to let you do in system upgrades of individual tasks, which I I consider to be kind of a an anti feature for our application, but it's no use to other applications. And, you know, we gave that a run. And, I mean, honestly, you you build the system with the team that you have.

Speaker 4:

So Hubris' design and I I wrote it up in an RFP that we haven't published externally yet. But, when I this was the the proposal that, oh, shit. Maybe we should write an operating system. The system design was based a lot on all y'all, actually. You know?

Speaker 4:

Like, trying to keep things to technologies that people have previous experience with and build a system that the team could could work on. And the fact that we had people with previous QNX experience and other, you know, Linux kernel experience and things like that, was really enabling in that sense.

Speaker 1:

I can you are walking the fine line between damnation and praise, and I can't tell which one it is. I actually don't know. That's interesting that we because I and I asked if you like and, Laura, maybe you were talking about, like because you'd done some of the early work as Cliff was coming to Oxide in terms of scoping out the different operating systems. And I just remember you having a line in your RFP of being like, well, writing our own operating system is clearly, like, everyone's dream, but is impractical for a bunch of good reasons that you outlined.

Speaker 7:

Yeah. And, I mean, I I'm pretty sure that it's going to to haunt me for, you know, as long as I continue to work at Oxide about the yeah. We're never gonna write our operating system. But, I mean, I I think more than anything in in reflecting on that and how we end up getting to Hubris is that I I think writing out was was was supposed to be available of all available, operating system systems like things for we what we could potentially use for, the service process and root of trust. And I think that that was certainly a good exercise, and I think we at the time, we're still leaning towards talk, and I think talk and a lot of things appealing.

Speaker 7:

But I think I I think ultimately what it was is when we eventually decided to shift away from talk, I I think we concluded that it's having already done this work that there really wasn't anything out there. So, you know, I I I think we had I I I put that that running your own was a last resort, but, I mean, it really was just because there wasn't anything out there, which is where we ended up, you know, jumping to hubris.

Speaker 1:

Totally. And And I you should not be haunted by that. I thought like that. First of all, I certainly agreed with that when you wrote it. It makes total sense.

Speaker 1:

It is I actually think to the contrary, the fact just so you said, it's like you made a good case for why we if there's something extent off the shelf, of course, we should use it. And and we when some would say I feel I personally went, like, real we we went really far. We really tried to get it to work, and it just was increasingly clear. And I was trying to remember, like, the ordering at which it was clear to whom when that, like, we really should give our own thing a shot. Cliff, I feel like you had the first realization.

Speaker 1:

I feel like I was a late adopter. I'm not sure.

Speaker 4:

Yeah. I wasn't actually sure how well it was gonna go over. Right? Because this is the sort of thing that can go over like a light balloon. And so I I put together RFP 41, which is the original design proposal, and floated that.

Speaker 4:

And I think, I feel like that was sort of the tipping point. But I think I think that the process that we went through, was absolutely the right process because had there been something off the shelf, it would have saved us time. And over the years, like, because I do embedded and I like good software, and I like not like having the computer check my mistakes so that I can think about harder problems. I've tried basically everything I can get my hands on. There are a few I can't get my hands on.

Speaker 4:

Green Green Hills Integrity is pretty high on that list, but I hear good things. But among the things that are available, open source or trial, there's just kind of isn't anything covering this corner of the space. Talk comes the closest. And I think that the evaluation that that Laura and Patrick and you were well underway on when I showed up, was critical.

Speaker 1:

Yeah. And then I think that sorry, Steve. Go ahead.

Speaker 3:

There also is, like, as somebody showed up after this was already done, like, Hubris existed by the time I showed up on the scene. But there was one comment to the Internet that I think, like, exemplified some of this and, like, showed this due diligence though is because somebody said, like, it's it's impressive how this is, like, the pragmatic choice for you all. And the only thing I could really say was, well, it's a little hard to sell writing your own OS as the pragmatic choice, but, like, you all did your homework, and it turns out that just actually is the case. Like, that that was in fact the right call. So

Speaker 2:

I mean, there were 2 things on the other side of the ledger too that I recall. 1 was, like, taking on a massive, like, compiler project, and the other was trying to turn a open source project that didn't wanna be turned the way we wanted to turn it. So it it it it it was a pragmatic choice.

Speaker 1:

And one of the

Speaker 4:

things that I appreciate about Oxide is that we had, from the beginning, language for expressing that. So the fundamental problem with trying to lead talk in a direction that isn't consistent with their goals is that we understood that value alignment and alignment in motivations and goals is critical. And that's the thing that we, like, actively reflect on and talk about. So when it became clear that we were gonna be pushing talk in a way that is effectively against their values, because talk prizes a number of things including, understandability of the system and teachability of the system to undergraduates. Neither of which are goals for our firmware, but they're still laudable goals and pushing talk away from those goals would have done them a disservice.

Speaker 1:

Yeah. That's exactly right. I and I would love to say that I learned that lesson the easy way, but I didn't. We learned that. That all comes out of my reflection of the joint divorce with Node.

Speaker 1:

Js, which, it actually and all honestly, I gotta give credit to currently one of the one of Oxide's investors, but was it also an investor, in Joyant, Charles Bueller, who really insisted that I speak at NodeConf after this divorce effectively. And I'm like Charles, I'm not gonna speak at NodeComp because I have nothing to say about Node that's gonna be productive. And he's like, oh, wait. That's now I definitely want you to speak. And I'm like, oh my god.

Speaker 1:

That's so broken.

Speaker 3:

Get your finest tomato repellent outfit and

Speaker 1:

get ready to be jeered at. Totally. And but I have to say, I really thank him for that because it really pushed me because I had not really processed it very well, I think. I think I was still in that mode of just, like, of not really thinking about what had happened. And it was actually helpful for me to go back.

Speaker 1:

Like, wait. What did actually happen here? And, actually, there was this values divergence, which I think is important because the it it just clip for exactly the point you're making. It's not that the values that node aspire to were wrong or the values that joint aspire to were wrong. In fact, it made it worse that they were both laudable values because we were both trying to make a relationship work when we actually weren't totally committed to the other's values.

Speaker 1:

Certainly not And in that talk, I talk about how really JavaScript wants this kind of, ubiquitousness of programming. It wants everyone to be able to program in JavaScript, which is an extremely laudable goal. And Joyant really wanted Node to be completely debuggable, also a very laudable goal. And there were points where those just came into conflict with one another, and you kinda had to pick 1. And we kinda made different choices, and then those choices became flashpoints.

Speaker 1:

And we definitely and, Steve, I know for you in terms of processing for me, like, coming to Rust, I love the fact that Rust was so upfront about its values. And, Steve, you've got a great talk too that people should see talking about that.

Speaker 3:

Yeah. I appreciate it. It's also on that note, something that Rick said earlier that I wanted to, like, mention too with the choice of using Rust for was, like, the and and also sort of what Cliff was saying about it's easier to teach high level people firmware sometimes. I used to joke, like, back in the day, early on in Rust, my job as, like, a Ruby person who's now in a low level language is just to, like, convince the other low level people that they are allowed to have nice things. Like, it's just like, when when Yehuda and Carl first, like, wrote Cargo, like, it almost didn't happen.

Speaker 3:

Like, Cargo actually was like a thing that the the people at the time were actually very against investing time into. And the only reason that Cargo exists is because Dave Herman was in charge of Mozilla, relatively speaking. He was like, you know, director at the time. And he said, I think this needs to exist and I'm gonna put budget into it and I'm gonna make it happen and I don't care what you guys say. And a lot of the, like, people at the time said, like, cool, you guys can use that toy or whatever, but we'll keep writing make files.

Speaker 3:

And then when they saw it, they were like, oh my god, I'm never writing a make file ever again. And cargo is definitely not perfect. Perfect, and we've wrangled a bunch of cargo to make hubris work. If you've checked any of that stuff, you know there's kind of a giant pile of things there. But, like, the point is is that it can be really, really hard sometimes and humility is not just, like, from the angle of using a language that wants to make these things nicer, but just also, like, I don't know.

Speaker 3:

I barely know how to use gdb. Like, it's terrible, and the interface is bad and it's confusing and I mess up all the time. But like, humility has like a lot of really cool stuff in it. It's totally achievable with GDB if you're willing to, like, fight the lion or whatever. I don't know if that meant.

Speaker 3:

That's not even a metaphor. You get what I'm saying?

Speaker 1:

Yeah, definitely. Sorry. Go ahead.

Speaker 4:

Oh, I was just gonna agree that that's not a metaphor because there is an actual biological lion living in the Yes.

Speaker 3:

At a at a meeting earlier today, I mixed, like, 5 metaphors into 2 sentences, so I've just been on a weird kick today anyway.

Speaker 2:

And, Brian, I mean, you cut the first code in humility. Do you wanna talk about that since we've alluded to it a bit?

Speaker 1:

Yeah. So, I mean and that kind of honestly came out of our need to debug talk. And so what because the other thing that that was that was making things, even more, complicated, I would say, in terms of part of this we came to talk is because we were looking at OpenTitan, which was based on risk 5 in FPGA or NASIC still unclear, and we never ponied up the half $1,000,000 to be able to get those question answered. But the, the OpenTitan was using Talk as a as its system software. So that's kind of how we kinda came into it.

Speaker 1:

And then you and Laura were engaged in, some of the things we needed to go do to risk 5, on to get Roby and Ruby to work. And what we needed that I don't know if I'm trying to remember if, like,

Speaker 2:

if It was to make it relocatable so that we could compile things without knowing a priori where they were gonna land in memory.

Speaker 1:

And why did we want that? What was the what was driving us towards that? I mean, clearly, like, for a bunch of good reasons. Why would you not want that? You're right.

Speaker 1:

You're right.

Speaker 4:

Like yeah. So the the the read only position independence means that you can compile these binaries and then toss them together into a memory image without having to aggressively relocate or rewrite them, which is great if you're an image builder like Talk. And this sort of thing can mostly work on ARM, ARM, and support at the time was a little behind on LLVM. But, the read write position independent stuff is so that you can use a single RAM image in Flash to support multiple tasks with different RAM areas, which would be a huge space saver and be great. Right?

Speaker 4:

So, we absolutely wanted those, and we still don't have those.

Speaker 5:

And and

Speaker 1:

then we were also at the same time looking at different FPGAs, really thinking that the the root of trust would be a secure FPGA. The problem is that the secure FPGA, there it's I I it's not quite a contradiction in terms, but it's close. There are not a lot of vendors. And as it turns out, they kinda rely on security through obscurity.

Speaker 4:

Yeah. If you wanna buy security oriented chips, do not decap them and inspect them, and do not disassemble their walls, because you will be very disappointed and numb where where we are today.

Speaker 1:

If you're right, it's because it turns out these things have these kinda cortex they these ARM hard cores, and that then led us down the road of, like, well, actually, what do we need to reconfigurable computing for? Maybe we should just go through and use hardcores. And then I think at it Cliff, I I I crack me if I'm I'm because I'm probably misremembering all this. But at some point, you're like, you know, the, the STM 32 f 4 on the f 407 discovery board is a real is a part that I, Cliff, know really, really well. This is a well understood part.

Speaker 1:

Everything's been reverse engineered on it. We should use this as a platform to go explore a new system. It's kinda how I remember that happening.

Speaker 4:

Yeah. I was approaching it from a slightly different perspective of it seems like this soft logic, RTL component of the project is going to be a significant risk to our ability to ship the product. Because good RTL people, and particularly good RTL people who are willing to interface with high level languages, are, worth their weight in gold and can be difficult to find and hire. And we've got a few now, but at the time, we had one and a half, and that was gonna be a problem. So, yeah, I I suggested maybe hard logic would be a less risky path, and, that's where we are right now.

Speaker 2:

Hey, Cliff. This is probably a flip comment, but I think at the time you also observed like, why do we even need to run multiple tasks on this thing? Just give me 7 ARM cores, I'll just run 7 different

Speaker 4:

tasks. Yeah. I mean, I've got there's something to be said for that. If you have if you can if you control the silicon, you can do silly things like that. We don't control the silicon, however.

Speaker 4:

So

Speaker 1:

As we learned the hard way, see Laura and Rick's vulnerability in the OPC 55.

Speaker 7:

Yeah. And also, I think to the, FPGA thing, I I think we also ran into issues just because we're dedicated to trying to do it doing everything open, and the state of the open source tool chain for FPGAs is just not there yet. And I think we're all eagerly awaiting the day when that actually comes.

Speaker 1:

Absolutely, Laura. Very good point. And the the vendor that was is the furthest ahead on the secure FPGA seems to be the furthest behind on open source tooling, in that week. We couldn't even get I mean, Arien was trying to saw Arien in here earlier. Maybe you can hop in and, and offer his perspective.

Speaker 1:

But Arien was just trying to get it to work on Linux. And they're like, we don't really support yeah. It should work, but you're kind of on your own. Just to get their proprietary tools to work on Linux. Oh, come on.

Speaker 1:

Let alone the the true open source ecosystem allowing us to actually put our own bitstream down. And we so Cliff, you, so you started, how long did that process take from kind of initial, like, I'm gonna take a swing at this to having something working? Because I remember that as being remarkably short, but maybe I'm misremembering.

Speaker 4:

Well, the thing people forget when they talk about writing an operating system is that you have a tremendous amount of flexibility in defining what an operating system is. So if you're writing a deliberately minimal kernel, you can get something up in a couple of weeks, which I think is what we did. And, you know, from there, it's just the the death of a 1,000 paper cuts of all the other things that you need to do, like the debug tooling and the build system and the bug.

Speaker 3:

I mean, we're coming up on that, the whole 2 that 22 100 lines of code thing, right, that we were joking about. Nobody got mad about, but we thought they might. The core kernel itself is today is, like, 22100 lines of code. So getting that written is, like, not impossible. Obviously, you need more than just the kernel in a system like this, but, you know, it's smaller than you would think.

Speaker 4:

It's also, like, it would be closer to 1500 lines of code if Rust format didn't really like using vertices.

Speaker 1:

I just love the fact that we can't talk about Hubris without possibly linking to our Rust format issue. But that but, hey. We wouldn't do this if they would close the issues. So, so we Cliff, you went into there for a couple of weeks. Someone actually asked me on Hacker News, like, hey, what was the process for this?

Speaker 1:

Like, did you have some big document where you described all this in advance? And my memory of this was we were hitting not just 1 or 2, but quite a few of these roadblocks. It was like, this doesn't feel this is gonna be Laura, I remember in particular when you, Adam, and I were talking about you going seriously into LLVM to, like, get all the Rovi, Rippey stuff working. We're just like, wow. This is gonna be rough sledding.

Speaker 1:

Like, this is it's all software, but this is complicated software for sure. The, but so we would, and my again, my recollection of this was Cliff is gonna go down a couple of weeks, and let's feel if we're on the right path or not. And, Laura, I love your recollection of this, but my recollection of that after Cliff's, like, only a couple of weeks, it felt like this is definitely the right path. It just felt, like really, really clear that the that the budding hubris was the right path.

Speaker 7:

You know, I'm honestly trying to remember this at the same time, but I've but I that sounds about right that, like, I I think we continue trying to do stuff with with talk in parallel and trying to figure out what was actually going to make sense. I mean, I think I did end up down pulling down LLVM and also start to do a little bit of looking at it, but also try to figure out exactly what would be a path forward. But, I mean, I mostly just remember QBRIS coming up and then you just being really excited to to play with it and actually see what what happened with it.

Speaker 1:

And then somewhere along the line about this time, we did a journal club, I think. Is that am I remembering this correctly? Because we did a journal club where, Cliff, you picked a, the the I think it was the Jonathan Shapiro paper. Right? I'm trying to remember the the actual papers that we had for that.

Speaker 4:

Yeah. We did, vulnerabilities and synchronous IPC systems by Jonathan Shapiro, which is a good sort of primer on how not to do synchronous wrong. And then we did, the Elfinstone paper on, like, 20 years of l four learnings and recollections that goes through the L4 family tree and things that they've learned, which is another great source of information to mine if you're designing a system based on message passing. And, I think that was those are those both of those papers are actually linked from the Hebrews reference documentation in the bibliography in the last section, if anybody's curious. But, they're they're a great sorta jump starter on how, it's it's like a concentrated shot of 30 years of learning in the microfilm space.

Speaker 3:

And for some small context, journal club is basically like oxide internal. Everyone reads a paper and then we get together to talk about it situation.

Speaker 1:

Yeah. And we had we deliberately have a shrink to fit process on that. I think that we a bunch of us had different experiences that we pulled together of journal clips that kinda hadn't worked exactly right and or hadn't done had been, like, too much or too little or and what we opted to do at the at Oxide, which certainly in this case worked great. I think generally works pretty well. It's pretty shrink to fit in that.

Speaker 1:

If someone comes up with a paper that they think is interesting, they create effectively as, like, hey. You're I want I'd like to discuss this paper. And as soon as I think you say what? 3 people have read it, then you schedule a conversation. And the idea is that to, everyone reads the paper before having the conversation, so it doesn't end up being a recap.

Speaker 1:

And that journal club, I remember being super excited. I mean, it was great for me because the, you know, obviously, find the QNX background. But I've always believed that microkernels were a, had been kind of disregarded prematurely. And, especially the l three, l four stuff is really, really interesting. Lidke, I think is did you ever meet Lidke, Cliff?

Speaker 4:

I did not. I did, however, work for Brad Chen, the man who killed Mok.

Speaker 1:

Yes. Yeah. Brad Chen. So elaborate on the man who killed Mach actually because that's

Speaker 4:

So microkernels were fashionable in the early nineties, late eighties. And Mach was a microkernel out of Carnegie Mellon that is actually still around in your Mac and iOS devices as part of the kernel. And it was it was sort of the hotness at the time, but it had some performance problems. And so a couple of papers came out that analyzed the performance problems and made the argument that these performance problems are inherent in microkernel design. That if you were following these principles, you would never have a system that could outperform a certain asymptotic level and the monolithic kernels would always be faster.

Speaker 4:

And this caused 2 things to happen. 1, it caused the mock let's put mock in everything micro kernel bubble to sort of burst roughly overnight in academic terms. At the same time though, it really pissed off a guy at IBM, who was Yochen Liike who wrote L4 because he had written L3 previously, and L3 is also a microkernel, but looks nothing like Mach. And so L4 came out along with an angry sounding paper from Liepke, describing basically, the thesis is microkernels can be really fast if you don't do them wrong. And Hubris is heavily l l four inspired.

Speaker 4:

I think it's a great system

Speaker 1:

to look at. That is I did not know that Lidke had effectively the same react I saw that's it is so funny that l 4 had the same kind so Littky, by the way, for those who are unaware, Littky has passed away. Away, died at a really young age. And I had read his l three paper as an undergraduate, and he has a, an IPC per a paper on the on IPC performance in l 3 and some of the tricks that he used in l 3 that were were, I thought, incredibly clever. It it to make IPC fast and showing that you could make IPC fast if you just dedicated yourself to the paper have you read that paper, Adam?

Speaker 1:

I did the No. Never. Oh, man. That really spoke to me at kind of, like, a critical time because I think part of the frustration that I was feeling as an undergraduate in computer science at that time is that the implementation was being dismissed as an implementation detail. That the the it was the architecture that was important and that the implementation was for little people to to use a little leonnellsley as and as someone who really valued the implementation, I kinda felt that, like, academic computer science was, you know, taking a dump on me to a certain degree.

Speaker 1:

And here was this paper that was actually enshrining the importance of the implementation, which really spoke to me. But I did not realize that l 4 was then of I mean, of course, it all makes sense. Because in addition to the this paper that that that really was disparaging of of microkernels, they were also disparaging of memory protection and pointed to the commercial success of Windows as evidence that people don't want memory protection in a system. And Shoot. QED.

Speaker 1:

That reading that paper was the moment I decided not to go to graduate school. That the like, I actually was so, like, just out of my mind in reading that that I'm like, I'm I actually I'm I'm going into this straight because I I I just was and it it's probably unfair, probably an overcorrection. But so I'm just Cliff, that is really fascinating to know that Licki had that also that direct inspiration on l 4. And I think that the Littky's work is is outstanding, and Shapiro also references Littky's work a lot if I recall correctly. Yeah.

Speaker 4:

I mean, he's he's imminently referenceable. The papers are fairly accessible even without an academic background. And, Hubris's IPC mechanism is not fast, to be clear. It is not currently fast, but it's designed as sort of a shadow of L4's mechanism such that I'm confident that if we have the time, we can make it fast. You know, it uses a somewhat awkward register ABI that I know we can make fast in context switch passing on ARM, and we just haven't actually taken the time to write

Speaker 1:

the fast paths yet. But

Speaker 4:

my hope is that we won't have any. Is that that will be straightforward of TVs.

Speaker 3:

We are not yet a true micro kernel, but we will be a true micro kernel when we feel like becoming a true micro kernel.

Speaker 4:

And and the reason why the fact that I wrote for Hubris goes out of its way to not use the m word is I somehow in my career, I've worked on a series of teams where microkernel has been kind of a contentious or bad term. And I came to Oxide from the future project at Google, which depending on who you ask, either is or is not a micro kernel. And I think that sentiment breaks down very cleanly around along organizational boundaries within the team. But,

Speaker 1:

you know, it's easier

Speaker 4:

to build software if we don't get hung up on these kinds of terminological distinctions. And,

Speaker 3:

I should just point out that the terminology microkernel falling along organizational lines is hilarious, due to, you know, the structure of both microkernels and yeah.

Speaker 1:

There is something that there's some some something karmic about it, isn't there? That, like, the is, like, your organization is structured as a micro kernel, so every organization can have its own feuds within it. I mean, there is something. Yeah. It's interesting.

Speaker 3:

See, I don't wanna pull the, like, who is the coolest kid kind of thing, but, like, I was obsessed with exokernels in college, which I know is a thing that you want. But that was that was my the equivalent of, like, no one believes that my preferred style of operating system is real, but I'm gonna make it real and show them. And yeah. They haven't that's that's a whole different kind

Speaker 1:

of beans.

Speaker 5:

Oh, yeah. Well, no. Okay.

Speaker 1:

So the exocurl is actually interesting. And I first of all, actually, Cliff, I wanna go back to you to the wisdom that you had because I think it's important where the like, labels in software can be helpful, but they can also serve to be really corrosive where, oh, I am actually just gonna move this label onto this thing that is assist that is something I don't like as an excuse to not like this new thing. Like and I I I mean, I I'm paraphrasing, but I think that's part of the reason that we you've deliberately wanted to avoid characterizing Hubris as a this or a that or something else.

Speaker 4:

I I think Toast case

Speaker 8:

matters, like, so much. One of the things that I've reflected on, from fuchsia, we like, our our APC mechanism could be a lot faster as well, but it's working just great where we're deploying it. I think one of the fastest ways to kill a micro kernel is to try and self host the c compiler system for it on top of a micro kernel. Because c compiler file system IO operations are extremely short lived, so they're like a pathological case, whereas that's not most of the workloads that we are building that we care about.

Speaker 5:

And and I would also go to the point of saying that no truly useful system ends up being purely within any category. Right? I mean, Linux Linux, you could call a monolith, and then you've got fuse floating over there, and you've got a handful of other things. And, you know, if you're not touching categories, it's probably not working out too well.

Speaker 4:

That's right. And I'd I'd actually like to amplify that point. The we computer people, if I can speak in gross generalizations, like to stuff things into categories. We like to have labels, and we like our labels to be mutually exclusive from each other. And in general, this is a complete fabrication that is for our own intellectual and communication convenience.

Speaker 4:

These labels are not real. We are making them real by putting emphasis on them. And if it's not evident that I was a liberal arts major, that'll become arrevious. But the fight over whether something is or is not a micro kernel is like a complete waste of energy because there's likely nuanced reasons why, well, in some respects, this resembles a micro kernel, and in these other respects, it doesn't resemble a traditional micro kernel. And in these other areas, it seems to have lifted ideas for monolithic systems.

Speaker 4:

And I've tried to be careful writing the hubris fact to be clear that these these categories don't aren't necessarily meaningful, but it's useful to draw sort of more nuanced comparisons to other things.

Speaker 1:

Yeah. Indeed. And I think that that there are so many attributes of Hubris that are just like that are orthogonal to those distinctions that are, I think, really important. And that there are a bunch of different things that that Hubris has done that at least I haven't experienced in another in another system. One of them that I think because to to me, one of the early indicators that the talk path was going to be really, really brutal was not just even when we were talking about going, you know, rappelling down in the LOBM and Ropey and Ropey and all these other things.

Speaker 1:

It all kinda felt, like, somewhat attainable. What felt was gonna be really, really difficult was dealing with the talk is built at Clifton. You said this earlier around dynamic programming loading, where I wanna load a program that I've never seen before because it's a student's program. And that student is going to be iterating many times over reloading a new program. And that makes sense for a teaching system.

Speaker 1:

And it does not make sense for a system that we're trying to attest, where we actually want everything at and and I just remember thinking, like, boy, that is gonna be just brutal. And I because I remember at one point you being like, you know, like, in our system, like, we don't want any of that. And that's gonna we're gonna have to do some totally new mechanism to deal with this. And it's like, oh, boy. This is gonna be and one of the things you did with Hubris that I that was certainly one of the early things that I that, personally, I was like, wow.

Speaker 1:

That is very clearly the right path for us. Is this knowing all tasks at compile time, not actually having dynamically loaded tasks and not having a loader at all. And it just eliminates a big class of problems. I would what what was your kind of inspiration there? Because that to me was was new and exciting about Hubris.

Speaker 4:

Well, I mean, the deeply embedded systems that I've worked on have always been structured that way, but their operating systems usually didn't support it. They contained like like, FreeRTOS is probably the simplest easily available RTOS. I guess Zephyr is getting more popular, but FreeRTOS has been around longer. FreeRTOS has dynamic task creation and, and destruction routines. And this is how you make your tasks.

Speaker 4:

And if I'm the one writing the application, these routines are called in a loop at the very start and then never called again. But they are still living there, and the data structures have been designed in support of that use case. And, fundamentally, as embedded programmers, we get to cheat. We are not building a laptop. We are not Chrome OS.

Speaker 4:

We are not the Google Home Hub. We know everything that the system is going to do in advance. So let's shrink-wrap effectively the kernel to the application and shrink-wrap it around the tasks that we need. And when we were using talk, I was just was reading the talk kernel code and imagining, like, oh, no. What mechanism am I going to have to build into here to turn off the dynamic program word?

Speaker 4:

Because, it is possible to do a remote code execution exploit without dynamic program loading, but, boy, it sure is easier if there's an API for it.

Speaker 1:

Right. Well, it it it James, it goes to your point too about talking about how the kind of self hosting viewing self hosting as a constraint of an operating system kinda guides you to building an operating system designed to compile other things. It's not actually Hubris will never be self hosting, I mean, virtually catalogically. And that that is a total non goal. I mean, I'm not intended

Speaker 4:

to stay challenge accepted in response to that.

Speaker 1:

That's right.

Speaker 4:

It's certainly not intended to be self hosted.

Speaker 1:

It's not intended to be self hosted. Yeah. Exactly. But so that kind of as you say, it's that cheap, cliff where we are really taking our our design for application, namely deeply embedded system. And then that realization then unlocks, I think, a bunch of things.

Speaker 1:

So, Adam, you were asking about the origin of humility. I have been building some debugging support for Talk. You and I both were were working on on Talkalater. Talkalater. Get ripped Talkalater.

Speaker 1:

It's where we were taking because when you run a when you synthesize risk 5 in, and Verilator will give you a complete instruction trace. And I'm like, this is gold. We should be using this. No one in the community or OpenType can be used. We're using this.

Speaker 1:

We use the instruction traces to actually show code flow, use it to actually debug a real problem, which was very gratifying. So that was kind of my entree into debugging Hubris. And I think to a certain degree, I that that kinda overcorrected me because, Cliff, you remember we were really focused on, like, ETM early. And the ETM is the embedded trace macro cell in in Cortex parts. So it turns out it's not in the m 7.

Speaker 1:

It's only in the m 4, and I'm not sure which which variants include ETM. But I was so I was kinda starting with the idea of, like, wow. We really need instruction phrases because that's what had been useful in talk. You know, you're just kinda fighting the last war in that regard. But then seeing what Cliff had done and then, Laura, you were starting to develop I mean, you were effectively the first Hubris developer, if I recall correctly.

Speaker 1:

And you were starting to actually develop, Hubris code for the 5th the LPC 55. Is that a is that a correct recollection?

Speaker 7:

Yeah. So I I think I ended up, the the initial port for Hubris was done on the STM chip. And then I think at that point, we had, like, finally decided, I think, pretty concretely that we were going to be going with the LPC55. And so I ended up just sort of picking it up just because no one no one had done it yet, just sort of an experiment to try and get it going. This was also actually the first port just to go, from, arm v 7, m on the STM ship to the arm v 8 m on the LPC55.

Speaker 7:

So this was also an experiment to see about how hard it would be to it to deal with quirks like that, and it turned out to be pretty easy so far.

Speaker 1:

And you were then developing I'm trying whether it was the spy support that you were developing earlier. I was trying to remember you were developing something early, and I remember, like, asking myself the question, like, what is gonna be helpful to Laura as she's developing this? I don't know if you remember this, Laura, but me asking you, like, would it be helpful to have, like, a task listing, for example?

Speaker 7:

Yeah. I think I did a lot of initial drivers kind of speculatively just to try and get it after the initial, you know, blink LED type thing to try and figure out what exactly just to try and I think even get to learn about what the LBC 55 was was in there and figure out how to make it work. So

Speaker 1:

And so I I remember asking you, like, would that be useful? You'd be like, yeah. That would be yeah. That's it seems like that would be useful. So I'm like, alright.

Speaker 1:

So I'm gonna now use this great cheat that that Cliff has of us knowing the entire system. And what does it look like to actually get because we now we know everything about the system. So with actually very little cooperation from the target system, we can understand what tasks are doing, for example.

Speaker 7:

This is also about the history of debugging. Is is that you know, we mentioned g d b before. I I think also previously, our debugging, option was semi hosting to be able to get print output. And that was pretty slow on some of the targets, which I think was, annoying you to know. And I was okay with it.

Speaker 7:

It was actually long after humility was out. But,

Speaker 1:

yeah. The I mean, semi hosting I mean, god bless semi hosting. It's important. Cliff, he

Speaker 3:

It's it's also just literally too slow when you're trying stuff. So, like, as somebody who's new to this, I remember when I was trying to turn one of the the ITC driver to be interrupt driven, and I used semi hosting for my initial, like, print out stuff to exactly debug it. And it's like, cool, it works. Let me remove all the semi hosting stuff. And I had to remove it and it wouldn't work anymore.

Speaker 3:

I'm like, oh, no. And it's because the timing of the waiting for the semi hosting made it work appropriately. So it's it's like a problem also even if it's not, like, a just a preference, I guess. That's what I learned anyway.

Speaker 1:

Yeah. I I yeah. Sorry, Sumeet. Go ahead.

Speaker 9:

I wanted to point out that I get the impression that stuff like semi hosting is the kind of things that people who are coming down the stack towards embedded systems are like, yeah, that that's nice. We can have nice things. Let's let's have something like that. Whereas, I don't know. It's like I've I've read some embedded programming books, and, like, the common wisdom is you have a UART, and that is how you interact

Speaker 5:

LED. You know, it's also that debuggers and embedded systems are built by people who don't believe debuggers work. And therefore, pathologically, they don't work.

Speaker 4:

I enthusiastically agree with that.

Speaker 5:

And, like, you know, if if anyone has ever tried to bring up, like, Jtag debug, you've you know, you're probably on Windows. You've got eclipse running. You're crossing your fingers that all this, like, tower of vendor babble is attached to a random STMicro Jtag adapter, which may or may not be trustworthy. It's it's really a complete nightmare and will probably take you longer than, you know, than inspecting every line of your code to get working.

Speaker 1:

So you're reminding me of one of several holes that oxide nearly fell in or argue we did fell in. The the so the debug the the on chip debugger is a as a separate chip effectively. It's got its own firmware that you use to debug the target system. And these things are, even the ones that are punitively open, leave quite a bit to be desired. And, Cliff, maybe it's still our fate to to design our own because I feel like we came super close to I just remember a a period where we got so frustrated with the stuff that's out there.

Speaker 1:

It being because, actually, would we would you mind actually, I giving people a just letting them know how semi hosting works? Because I think that and what semi hosting is.

Speaker 4:

Yeah. So semi hosting is an amazing hack perpetrated by Akhil, which is a tools company that Arm now owns. And it is an answer to the question of, I've got this embedded system, and I've got this debug link to it over JTag or SWD, which is arms debug serial protocol. And I would like to blank. I'd like to run some unit tests maybe and get the output from them.

Speaker 4:

Or, you know, maybe I just would like my print f's or, you know, print lines to come out somewhere. So what they've done is one of the things that a debugger, when it's attached to a chip like this, can do is halting debug. It can notice if the chip reaches certain points or satisfy certain conditions and stop the program. And in particular, on ARM, like on most instruction sets, there's a breakpoint instruction that you can insert. So semi hosting is a is a protocol in the sense, not in the network sense, but in the sense of a set of agreements and rules that your embedded software and the debug tools follow to allow breakpoints to serve kinda like system calls.

Speaker 4:

So your program executes a breakpoint that is formatted in a special way that your debugger on your laptop recognizes and says, oh, this breakpoint, it's not really a breakpoint. This is a printf. I'm going to look at the machine registers and figure out where the block being sent is and then print it to the screen and then resume the program. So this is cool not only because it's kind of a ridiculous hack. It's it's also cool because you can use it on a system when you have nothing attached to it other than, like, 3 wires of debug connection, which is nice.

Speaker 4:

Maybe all of your UARTs are tied up doing product things and can't be used for printf debugging. But it's it's also nice because it's almost entirely independent of the chip. Your semi hosting code will work on an ARM Cortex chip made by essentially anyone. You don't need to know the clock frequency it's running at. You don't even need to know if the clock is stable necessarily, which you do for a for UART.

Speaker 4:

So it it can be really enabling. The downside is that, as people have mentioned, it's super slow because the whole processor has to come to a screeching halt, wait for your laptop to go out over USB and notice that it halted, and then go and slurp some data out of its memory, and then tell it to start again. And then your software runs a little bit longer and then does another print call. So it's not perfect, but it it it works surprisingly well for how weird it sounds.

Speaker 1:

It would be

Speaker 3:

the opposite of a turbo button.

Speaker 1:

Right? It is. It is. Yes. It is the opposite of a turbo button.

Speaker 1:

It is also, like honestly, I think it's essential. I mean, it's a it's a very good little facility. Do not leave it on in shipping code though. Because if you do an h printlin in a in shipping code, your it will stop. Like, the the the target has no way of knowing that, oh, by the way, like, we're on a balloon or we are been deployed in a deeply embedded context.

Speaker 1:

Like, if we stop, there's no JTag header here even. Like, there's nobody can unstop us. Us.

Speaker 4:

Yeah. If you're really lucky, you can get the processor to deliver a debug monitor exception, but you have to do some setup to make that work. Out of the box, semi hosting will just halt your CPU if there's nothing attached, which is unfortunate.

Speaker 1:

It is unfortunate, and it is undebuggable. So the, which is obviously I I challenges with semi hosting. But, I mean, honestly, I and also a strange name. Although, Cliff, the way you gave such a good explanation of Well,

Speaker 4:

It's derived from the concept of hosting your software on your workstation during development so that it can reach out through an emulator or whatever and do you know, read test vectors from the file system or what have you. So semi hosting is halfway to that. It's running parts of the code on your workstation and parts of the code on the embedded processor.

Speaker 1:

Right. Yeah. There you go. So I guess it does make sense. And we there's a there's a faster way to instead of actually, stopping the CPU, there's something called ITM, with the instrumentation trace macro cell, which which is much faster, but it's also lossy, which is a huge problem.

Speaker 1:

One thing about that's great about semi hosting is it's not lossy. Like, you stop and you so you you're pretty much guaranteed that if you have semi hosting output, you're gonna see it. Whereas, you are not guaranteed for ITM output. I am really concerned that I just dropped. Have I dropped?

Speaker 1:

Am I still here? I'm still

Speaker 2:

here. You're still here. Okay.

Speaker 1:

You're here.

Speaker 4:

Yeah. Yeah. Just all the rest of us are gone.

Speaker 1:

No. I the the the Twitter Spaces is doing that thing that it likes to do. What if I just freeze everybody for right now?

Speaker 8:

Oh, did

Speaker 4:

you plug in a headset by any chance?

Speaker 1:

That's right. That's right. I'll plug in a headset. But so we were looking at and looking at at at ITM, which was, and we are we are using ITM. That's part of what how we got to the the we go wanted to do it on our own debugger path because the ITM support is really not great.

Speaker 1:

But the to me, like, it was clear that we could do we could be much more valuable by understanding the system at large and be able to show you much larger context. And, there are a couple of key moments in this. One clip was, like, what you did with respect to the the archive and just the amount of things that are that the the system knows about itself that are in the archive that we don't have to load.

Speaker 4:

Yeah. We haven't documented that, have you? It might be worth unpacking that, so to speak.

Speaker 1:

Yeah. Please.

Speaker 4:

So the Hubris build system produces a thingy at the end. A thingy that contains the firmware that you can use to flash another thingy. But the details of the thingy are kind of important. So rather than simply producing, like, a binary image that gets blown into Flash or an ELF binary that is basically a binary image plus a bunch of metadata. We realized early on that, so our build process produces 1 ELF file per task and then another one for the kernel.

Speaker 4:

And that's important because we keep those around because they have the debug symbols on them. And you can't really merge the debug symbols between tasks because they're memory isolated. So if they both declare a static variable called x, they're separate variables called x. They are not the same, which is, you know, important. So we were dealing with these collections of ELF files, and, we did what I think a lot of people in software in the past 20 years have wound up doing is, say, gosh.

Speaker 4:

Wouldn't it be more convenient if this was all on file? So the output of the Hubris build system is actually a zip file now that contains a well defined directory structure with all of our task ELF files, the kernel ELF file, but also the configuration files that were used to drive the build system. System and, soon the interface definitions for all the IPC messages that can be sent in the system. So you wind up with this one file that you can hand to the Flashing tool or to Humility that contains the entire state of the system that you might need to pull in

Speaker 1:

different interface definitions.

Speaker 4:

That would be super annoying. And could pull in different interface definitions. That would be super annoying. And having a build archive that we can slip sort of arbitrary metadata into keeps us from needing to do this. And then Brian did something kinda weird, with the humidity core dump support where the core dump support blows out an ELF file that looks like a core file on a UNIX system.

Speaker 4:

But in one of the sections of the ELF file, he stuffs the build archive. So if your field tech sends you a core dump from the embedded processor, you can feed this to the debugger. The debugger can take it apart, pull out the build archive, and be confident that it's using the right symbol set for the right version of the firmware that the tech was interacting with. So you can't get that wrong, which is nice.

Speaker 1:

It is nice. It's okay. It's kinda weird, but it's nice. Isn't it nice? I It's

Speaker 4:

kinda weird, but, like, so is semi hosting and semi hosting.

Speaker 1:

Yeah. There you go. Okay. Right. Okay.

Speaker 1:

Yeah.

Speaker 5:

I

Speaker 1:

should It it it is it is not that

Speaker 2:

weird in particular because how many times how many times it's done, you know, back in the ancient times, Brian, did we spend time trying to sync up a crash dump with a particular set of source files?

Speaker 4:

I am still confused right now.

Speaker 1:

That's because

Speaker 3:

I I feel like that was a child trolling.

Speaker 1:

That was a unfortunately, you know, Adam, you you and I, we try to be in the respect that that one of us is not dealing with a parenting situation at any given moment. And, you had an unmuted parenting situation, but I had a

Speaker 9:

muted parenting situation in

Speaker 1:

exactly the same moment. So yeah. Exactly. So CloudVic, thrive.

Speaker 3:

It's yeah. It's also I mean, it's not weird to eventually end up with a file that's totally not a zip file that's actually a zip file. Like, this has happened in almost every project I've ever worked on. Like literally cargo packages are like a dot crate file that's like, oops it's secretly a zip file. So it's just like what always happens eventually.

Speaker 3:

You want a bunch of things to be one thing so you make a zip file out of it.

Speaker 4:

Well and you can also see that with Java, JAR files, or secretly zip files. And so is every GIF file that I receive from a security job applicant.

Speaker 1:

And I think, like, once we started putting things in the build archive, I feel like a bunch of other things. That that There's

Speaker 3:

a lot of things.

Speaker 1:

There's a lot of things.

Speaker 3:

Many things.

Speaker 1:

Yeah. So you wanna elaborate on that, Steve? Because I think it's actually we've now used it as a way of solving a bunch of things.

Speaker 3:

This is a thing that I found super interesting. Again, coming from more higher level things and then kinda doing this professionally for the first time is like, there's a lot of people complain about Rust binary sizes, for example. And a lot of that is because they use ls instead of the tool that, like, actually says how much code there is because we include a lot of debug info. And so, you know, there's kind of the split between the stuff that's actually in your files versus what goes on the device. And so we're able to include all this rich debugging information that you like you already said, individually in these, you know, files that are in the zip files.

Speaker 3:

That way it doesn't go on your device, but you're able to look it up separately. I think that's definitely really cool. Small shout out to Windows where this is the way it works all the time. You have a separate file with the debug info instead of putting it in the binary. So that's a whole separate thing.

Speaker 3:

But just in general, yeah, there's like all the tasks individually. There's like a text file that represents the memory map. There's like a couple other random things. But it's just been super useful to be like, yeah, are you ever gonna need this later? Shove it in the zip file.

Speaker 3:

It's not going on a device anyway. It'll be really easy to check out

Speaker 1:

later. So as long as you're on the point of Windows, actually, Steve, could you elaborate on that a little bit? Because actually one of the things that was super interesting to me about having used Rust for all of this stuff is that you at first, but are by no means the only person at Oxide. I saw Nathaniel here earlier. I know Nathaniel and I mean, a lot of the double e's have to use Windows because there's there's tooling that's only available there for them.

Speaker 1:

We kinda got the Windows support for free for all of this tooling, which to me, I I still kinda marvel at.

Speaker 3:

I so when I joined, Hubris didn't build on Windows because Brian had used string concatenation in one place instead of the actual path.

Speaker 1:

Jesus Christ, that is so personalized.

Speaker 3:

That was it. That was

Speaker 1:

all this. That's so cool. No. What I'm saying is, this is

Speaker 3:

an important point. Like you were like I would never If I wrote some c code I would not expect to run it on Windows in the time. But it was literally like only one thing. Like I think it had like a a 6 character diff or something. And then it all just worked on Windows.

Speaker 3:

Like I'm saying minimal. I'm not trying to call you out. I'm saying like you almost a 100% got it correct by accident. And so like, you know that's like part of the thing. I I've been using Windows for the last 5 or 6 years because I used to joke it's like the more hipster option.

Speaker 3:

At this point Microsoft has reformed their image to the point where maybe that's not actually literally true anymore. But, I sort of showed up being the only Windows user on the Hubris team at the time anyway. We had more people sort of like join and also using Hubris, in Silver Lockside. And like 99% of it just absolutely worked. It's continued to mostly just work.

Speaker 3:

Honestly the most annoying thing about using Windows is Hubris is not Hubris itself, but the fact that GitHub actions is so slow for Windows builds. That that's, like, the only thing that's actually kind of an annoyance.

Speaker 4:

I I disagree with you on that. I think the core annoyance is that Windows, and to a lesser degree, Mac, are the operating systems we interact with that don't have package managers. So

Speaker 1:

Yeah. That's fair too. We almost

Speaker 3:

killed almost all that. There's only 1 if if if, you know, we just need to get rid of this, dependency on, what is it? An app or something?

Speaker 4:

App copy or something. Yeah.

Speaker 3:

Yeah. Yeah. Just one tool and then it'll just be all the Rust tools and it'll be totally fine. But yeah, like stuff stuff totally works on Windows. And it's it's also interesting and weird coming from that perspective because so much of embedded seems to be only on Windows.

Speaker 3:

Like there's a lot of people, when you see embedded stuff on Hacker News people are like, oh well, you know, this doesn't even work on Windows. And people are like, are you kidding? Like a lot of the embedded stuff is like Windows only actually. Because vendors give you this code that's like, well, we support it on Windows and that's it. Like, so it just sort of depends.

Speaker 3:

But, we're lucky to have a really good cross platform development story. And, you know, for the most part, it just works, other than the the couple dependencies we haven't finally killed yet.

Speaker 1:

It's pretty remarkable how well it works. And in the the the the set the path separator issue that you're highlighting, I mean, again, I feel very deeply personalized, of course. The I but Woah. It's good. It I mean, honestly, Rust could not have done any more to try to get me to use the platform independent separator.

Speaker 1:

And I, like, insisted on doing the wrong thing, basically. I just didn't know about it, actually. I didn't I didn't realize that they had abstracted that. And as you say, actually, it did not even occur to me that this is going to be the one thing. I kinda had that feeling of, like, well, there's gonna be lots of other reasons we don't work on Windows.

Speaker 1:

It won't be the path. If the path separator is our problem, like, give me a call. It's like, okay, here we are with a call.

Speaker 3:

Even then, humility which is bay was entirely your own code base at that point I think, worked perfectly. Like it was literally just like one small bug in hubris. Other than that, total cross platform, like, no big deal.

Speaker 2:

That happened to be introduced by Brian. Yeah. We get it.

Speaker 5:

I I was gonna say that if you told me it went off without a hitch, I would not have believed you. I'd have believed you.

Speaker 1:

Right. Exactly. No. It's been it it's been great. And, again, you know, we you Nathaniel was, you know, in the lab as we were doing a bunch of, like, of of hubris and humility work together and along with doing a bunch of FPGA work.

Speaker 1:

And it's just like it didn't even occur to me that, like, Cliff, you and I were building on Linux and he was building on Windows and everything was just kinda working, which was which was remarkable.

Speaker 4:

Yeah. I mean, he he built on Windows and flashed from there onto the product and then, like, take a core dump and send it to us, and we're on our Linux laptops, taking it apart and then sending a patch back. And, like yeah. I don't know. It just wasn't wasn't no big thing.

Speaker 1:

I yeah. It was it was not a big deal, which is which was amazing.

Speaker 4:

Good. The bug reports that we're getting, most of our platform support issues now are actually on Mac. They're Adam ran into this. There there appears to be a class of Rust installs on Macs that I think might be a result of using of mixing the package managers that are distributed for macOS, where, like, it looks like RustUp and Cargo were speaking the same language, but they're secretly not. So you get version mismatch.

Speaker 4:

And I don't understand it, but we we had another report come

Speaker 2:

in. It's insidious. And it and I never understood it. I just started deleting files that were dated 2016 until it worked. Yeah.

Speaker 1:

But it it didn't work. You ultimately got working, it sounds like. Yeah. Yeah. Yeah.

Speaker 1:

I got I

Speaker 2:

got it working, but it was it was, like, Cargo would say it would run one version. It would promise it would run one version, and it would run something completely different. It was bizarre.

Speaker 1:

That is really annoying. That's not nice at all. So a couple of the the also, like, CargoX x task, I feel, has done a lot of lifting for us. It I mean, I don't know. Cliff or Steve, you wanna talk about, like, how we've used x task and because I I think that's been

Speaker 3:

Yeah. This is a bunch of stuff I did, so I guess I'll talk about it and save Cliff a time. So basically, there's this pattern called X task that, MatClad, who's the primary force behind the Rust Analyzer project, had developed. And basically it's just like an easy way for you to write essentially build scripts in Rust proper and have Cargo be able to use them. So there's there's a bunch of tools that kinda layer additional state on top of Cargo like just and cargo make and all these things.

Speaker 3:

You have to install them before you can get going which is kind of an extra step that's sort of annoying. And so X task is kinda this pattern that sort of abuses this cargo alias functionality to let you kind of like write custom scripts inside your project and then they can get executed. So instead of writing cargo build, you write cargo x task build and now you have a fully scriptable environment around your build

Speaker 7:

or whatever. And so a lot of

Speaker 3:

the early work that I did on Hubris is like moving stuff into this kind of direction, and also just in general, I don't know, improving the build system stuff. But that's kinda basically the idea. So it's not as fully featured as like a make clone would actually be. It pretty much just drops you into a function main and you have to write up your own thing, but it does mean that you can script additional things. And since we're building an entire OS with all the tasks being built assembling them all together, cargo on its own just really doesn't cut it because that was not really designed.

Speaker 3:

It's designed to build, you know, one library and and binaries, not like an entire OS all at once. So there has to be some sort of layer on top of it.

Speaker 1:

And Steve, importantly, when you say scriptable, you you mean scriptable in Rust. I mean, scriptable is Yeah.

Speaker 5:

Yeah. Yeah.

Speaker 1:

We're we're writing Rust programs to build the system, and it has made it really easy to extend the system, I think. And it's been really nice to actually have your build system be built as part of your system is really nice, actually.

Speaker 4:

I cut a section about this from my talk when I learned that I had 10 minutes fewer than they said I did. But the I I I've tried to do this in c projects. And c projects, if I can make generalizations, have I think a very healthy fear of doing this. To the point that like if you look at OpenSSL, their build system and generation stuff is in Perl. And it's in Perl because people probably already have Perl.

Speaker 4:

Perl doesn't have to be compiled. So you don't have to invoke the build system to figure out how to run the build system effectively. And Cargo managed to do this really well to the point that having Cargo responsible for building a bunch of tools that then are responsible for running the build system that calls into Cargo to build your other code kinda just works without really thinking about it. And if, I mean, I've had to write build systems in previous jobs, and the fact that they got this right without conflicts or, like, surprise, I built your your your, library for the long target architecture, is is pretty cool, actually, and it was really enabled.

Speaker 1:

It was very cool. I personally had no idea how any of that worked. I don't know. Like, okay. I just run like cargo x task disk.

Speaker 1:

Like, I knew I knew what to type, but I had zero idea what was going on. And then I needed to go extend it in some way. I'm like, oh, wait a minute. Like, we we actually generate and you go in there. You're like, wait a minute.

Speaker 1:

This all just works? Like, this seems so complicated, and it all just

Speaker 3:

1995. I say with my Pearl tattoo, ripped Pearl. But like, yeah. Like you need something to get it started but then it's like not a big deal. And so it's definitely a little unwieldy in the sense that you have to kind of invent the universe but also you get to do whatever you want which is not true for most of cargo and so that's helpful.

Speaker 3:

I still think there's a bunch of things that could be made a lot better with it, but it just kind of, it is what it is.

Speaker 1:

Yeah. That's been pretty great, honestly. And I feel like it's been another one of these things that I feel we've used more and more rather than less and less. I feel we have seen kind of more and more opportunity. I mean, Rick, Rick, the stuff that you did, with the the with the task slots, I feel like really beginning to leverage all of this and getting out of kind of that, you know, in in part being cross platform necessitates this, but getting out from underneath calling kinda random utilities to do things as part of the build and doing all of that effectively in Rust has been really powerful, I

Speaker 6:

think. Yeah. When I when I started implementing test slot, it was it was one of those interesting periods where I was like, wait, I can do things by packing it into special linker segments. But that means that I have to go write the other side that actually parses these files and and does things with it. And then I looked at the X task stuff again and went, oh, wait.

Speaker 6:

We already have whole frameworks that deal with parsing these things. It's actually really trivial now. So, you know, it's kind of that whole thing of because of us leveraging that pattern and and using existing Rust libraries to deal with parsing ELF and and dwarf information and things, it suddenly becomes a lot easier to build more intricate, tools for for more special purposes.

Speaker 1:

Yeah. And we have leaned on the DWARF information information heavily, which has been I mean, honestly and, Steve, remind me that the that there's basically a single individual that is responsible for the quality of dwarf information that that Rust emits. And boy, I'm I'm grateful for it.

Speaker 3:

I I it's definitely very close to it's like maybe 1 or 1 and a half people that that do a

Speaker 7:

lot of that work and so it's

Speaker 3:

it's just classically true in open source. There's always like one person somewhere who's doing all the stuff in that one niche that you actually need to rely on. So something super

Speaker 1:

helpful. And it could be, I mean, I actually honestly, when I went into the dwarf support, I kind of thought, like, well, this is gonna be there's gonna be a lot of stuff that's missing here because it just isn't generally important in most projects. And I you know, dwarf is weird, and there's, like, definitely weird dwarfisms that you have one has to deal with. But, basically, there's a lot of information there that we're able to use in lots and lots and lots of different ways that are, and I feel like especially when you kinda accept that as a new constraint that, like, I know the dwarf information is gonna be there when I'm good at building my debugging infrastructure or what have you. Then there are a bunch of things that become possible, or easy

Speaker 3:

even. Yeah. The most stuff that we run into that's a problem and not we in the hubris humility sense, but we in like the Rust sense is that dwarf was definitely designed for like the sea ecosystem and so representing rust concepts in dwarf sometimes can get confusing in my understanding. Like it's not it's not like it has native understanding what a trait is for example. So you kinda gotta do some stuff but, you know, it works and it's better than better than nothing.

Speaker 4:

All I can say there is thank goodness for Ada because the not really. The the the dwarf representation that Rust uses to describe data bearing ineums, ineums of fields, is is support that was added to the dwarf standard for ADA, which got something vaguely similar in 1983. And so it uses all the terminology, like variant part keeps appearing in the DWARF standard without exception.

Speaker 1:

That's an ADAS. Standard term. Oh, that is so good to know. That's so good to know because I was actually I was having those same positive feelings that someone came first, and I was having to ascribe those to c plus plus, which was leaving me with very complicated feelings of, like, I think I'm grateful for c plus plus here, but it's such a relief to be able to move those over to ADA, which I feel not conflicted at all about. But those are so so, Cliff, those are the data bearing the the data bearing enums, which are so important in Rust.

Speaker 1:

Those are that's an EDA ism. I did not realize that.

Speaker 4:

Well, so the way they're expressed in English is an Right. So ADA has variant records which are kind of like Rust enums but are different in ways that we can go into if you're really bored. But they're not exactly the same thing, but it did mean that the Rust team was able to reuse the existing definitions of GWARF and not have to, like, add their own extensions to the WER standards representing us,

Speaker 1:

developing hubris tasks. And one of the things that I personally wanted was the and that I've I feel like I've coded up a gazillion times in kernel development is a quick little ring buffer where you are storing you're storing data into a, a memory buffer that is going to be circular that you're gonna deliberately overwrite. And it's effectively an event log of the, you know, the last n things I'm interested in. And kind of the realization that, like, wait a minute. I can actually use data bearing data bearing enums for this and use the dwarf information to parse it.

Speaker 1:

And now I can make it super quick to go sprinkle a couple of ring buffer entries in some code of interest and be able to debug it. That to me was like a big light going on in terms of, like, wow. This is this is actually really powerful.

Speaker 4:

Yeah. The the ability to derive a debug instance in system for a complicated enum type or to do a similar pretty printing of a complicated enum type through the enum type through the dwarf information means that you have this opportunity to use types defined in Rust as a user interface mechanism. Types that you may never actually parse in the application, you're writing them into a ring buffer purely so they can be printed by the debugger. And this is a thing that would be very difficult to pull off without dynamic dispatch or, or dynamic allocation in a language that did not have Rust style, some types, or or data bearing in use.

Speaker 1:

Yeah. It is. It's tremendously valuable, and I think it also gets to another thing that I think has been a valuable decision. Not necessarily by fiat. It's like it it would in principle be possible to support tasks not in Rust.

Speaker 1:

But as you I think is in your phraseology early, Cliff, it's a nongall. And one of the things that that is a goal is rust for tasks as well as for the kernel. And being able to kind of assume that we've got rust everywhere, allows us to build abstractions that I think are are more powerful, and allows us to develop the system faster.

Speaker 4:

Well and to go in a big circle, that actually brings me to one of the issues that we ran into with Talk that caused us not to pursue Talk further was that at the time, TOC didn't have user LAN support for Rust. They wrote their programs that ran an unprivileged mode in c. They used Rust for the kernel and wrote their drivers in Rust in in safe Rust through some stuff that they do. And that wasn't really what we wanted to do because we knew the bulk of our code was going to be in task. And we wanted to write that code in a memory safe language because life is too short to debug part of problems all the time.

Speaker 4:

And it's it's like I wanna limit my comparisons to Fuchsia as well, but it's interesting to note that Fuchsia has exactly the opposite combination. Fuchsia uses, memory unsafe languages only in the privileged TCB and allows for, memory unsafe lang or allows for memory safe languages like Rust in, processes that run outside of the kernel. And this was literally the first thing I asked my boss about after I joined the team. And, they have their reasons, but it's it's we wanted to do something different, because the kernel seems like the last place that I want to have memory on safety problems. Yeah.

Speaker 5:

Reasons sounds like a very charitable take. Or It

Speaker 1:

is a very it's

Speaker 3:

a it's diplomatic.

Speaker 1:

Well, you know what? You know, Cliff is a very charitable person. He's also a very diplomatic person. It may surprise you to learn.

Speaker 4:

I mean, I I actually think that, I mean, it's like I mentioned early on in this in this conversation that I designed Hubris in part because of the people we had around. You know? You you design a system that your engineers can work on. And,

Speaker 1:

I think you

Speaker 4:

should, to some degree, did the same thing, And that's not an unreasonable choice to have made. It's just not the choice we made here.

Speaker 1:

Yeah. Interesting. And so, Cliff, I I want maybe to to get to some of the the things that we see, the the kind of the big next problems in in Hubris. Certainly, you're working on on one of them right now that I'm personally very excited about that maybe you wanna expand on a little bit.

Speaker 4:

So due to a misunderstanding early on in the design of Oxide, possibly Brian, I honestly don't remember because, like, Hubris it started right at the beginning of the pandemic, so, like, my memory is all scrambled. But, I thought somebody really hated IDLs. Interface description languages, these these languages that people use to define, like, RPC message formats between head names. So Hubris didn't have one. I wanted to see how far I could get just using the rest type system to model messages.

Speaker 4:

And the answer is, you can get pretty far, but modeling the messages isn't actually the part that hurts. It's generating or it's it's writing client and server stubs for handling introduce bugs. So we just hired Matt Keeter recently, and he he was approaching a project with fresh eyes and said, hey. Do if I wanna start a server, do I need to copy all this all this spoiler plate? He said very diplomatically.

Speaker 4:

And I said, sorry.

Speaker 1:

I didn't

Speaker 4:

Sorry. And we got to talking about it. So now we're doing an IDL, so that we can generate client server stubs to make writing tasks easier. And, that's coming along well.

Speaker 1:

Cliff, I would like to say that I I would like to praise you in contrast to Adam for you think it only may have been me disparaging IDLs. Whereas Adam would have said with absolute confidence, no. We were in an argument, and you shouted me down over using IDLs. But I don't think I've been anti IDL. Adam, if I have an anti IDL, would I I'm just this has given me a complex that I am, like, that I have, like, somehow anti IDL.

Speaker 1:

Oh, you don't remember you.

Speaker 2:

I I'm just kidding. I'm just kidding. No. I I don't remember you

Speaker 1:

being anti IDL. I'm not

Speaker 2:

gonna chuck you, throw you into that bus.

Speaker 4:

So one of the really awkward things about using humans as code generators instead of computers is that we all come into our jobs carrying all of our scars and trauma from all of our past jobs and the rest of our life. Right? So, I'm honestly not even sure that this conflict happened to this job, like, with scars from a previous IDL argument 15 years ago that then manifested. Right? It's it's really

Speaker 1:

fun. This reminds me of my my mother had a coworker that was just being absolutely vicious to her for reasons that she couldn't figure out. She's like, I think I must look like his ex wife or something. I dislike and I feel like that's I feel like we have that a lot where we have something that reminds us of a past trauma, and then we don't actually anyway, I I hope it wasn't me. I I'd like to say for the record, I'm pro IDL.

Speaker 1:

I'm super excited about this IDL, which is very on brand with Cliff, has got a terrific name. Cliff, do you wanna do the reveal?

Speaker 4:

Well, so we're I mean, I don't I don't think it's that funny, but so Hubris has an unofficial mascot, becoming gradually more official. So after I named the project, I was looking for an icon for the chat channel at work. So I went and found a picture of a broken statue, which was referencing the, Percy by Shelley poem Ozymandias about, you know, the grandiose claims of a long dead king. So Ozzy is our is our bot that does reformatting some stuff.

Speaker 1:

And you have to say what it says when it does the reformatting.

Speaker 4:

Oh, yeah. So it it, the commit message when it reformats your code is look upon my reformat and despair in all caps, which is great because these commits are, like, the most worthless thing.

Speaker 3:

It it did until earlier today when I killed it in humility and will kill the hubris tomorrow morning. But yes. Right. One more one more day of those commits happening.

Speaker 1:

Because we had a bug where it apparently, in the all public world, there are things that now broke where that that used to work. And in particular, it was doing these commits to the wrong repo, which I even I I love that even more. That it's actually doing

Speaker 4:

the wrong branches. Repo. It was doing them on the wrong branch. So someone would pull up the PR and Ozzy would try to commit to their branch, be denied, and, like, throw a fit and go mess up other branch. Just

Speaker 5:

I will say upon my works, ye mighty, and despair.

Speaker 1:

And despair, all caps, like wrong branch. I just feel like god. It it felt very it all felt very poetic. So Ozymandias, I will I will miss your commits.

Speaker 4:

Well, so the the IDL is, in reference to Ozymandias, is called eyeball, I d o l, and the repo is called idolatry because I have this habit of using slightly disparaging terms for projects.

Speaker 1:

And it's the I mean, it's it's the false idol of the ideal. It's good. I think it it that's gonna be. And I'm really that I think is gonna leverage a bunch of the stuff that we've done. And the I mean and, Cliff, I actually, maybe this is actually good.

Speaker 1:

Because the only thing I wanna just touch on briefly, I'm not sure if this is another thing. It was in, like, the missing minutes of your hubris talk. But the power of build RS is something that took me a while to appreciate.

Speaker 4:

It is, in fact, in the missing minutes of the talk. So what gosh. Where to start? One of the things that I really like about the Rust ecosystem is that it tends to be very pragmatic. And I realize that that probably sounds ridiculous to see programmers looking at Rust and all of its, moving parts, but Rust has three ways of doing cogeneration.

Speaker 4:

There's the macro rules, kind of matchy macros that are written right there in a source file that was the original thing that shipped with the language, and then there are sort of the so called right way of doing code generation is the proc macros that can derive traits automatically for your structs, which side note is probably the number one thing I would miss if I had to go back to

Speaker 1:

C plus plus right now.

Speaker 4:

But then there's this build.rs file. And build.rs file's been around since the 1 0 release or before. And it looks like a weird afterthought kind of thing. It it's like the the the gist of it is you can provide this file that Cargo will build and run before building your project, which sounds silly or, like, is this a security problem or things like that. But fundamentally, this is the equivalent of putting some shell commands in your makefile in a c program, except that you can have dependencies on external packages.

Speaker 4:

You can have your build script pull in Serdi and read some files formatted in JSON and then, like, gzip compress them into a binary that it deposits into the build directory where your project can pull it in as an array. There's it's just incredibly powerful and totally unstructured. So there's a few cases where we're using build dot RS files to do various kinds of cogeneration or paper over things that we're doing that Cargo didn't see coming. For example, Cargo has this notion of features that you can turn on and off that provide sort of a limited form of conditional compilation. In embedded context, it's incredibly common to need effectively features with values.

Speaker 4:

Like, you know, you need to define something to tell which board you're targeting, and it's not you don't want a Boolean for every board. You want a name and possibly 3 different names for, like, the processor type and the board model and the revision. And so you can actually do that from a build on RS, and and we do.

Speaker 1:

And that was what what that was kind of my entree to build on RS. I'm like, alright. This is just some file. I've gotta go edit some boilerplate that could be able to get board definitions. I guess, I didn't really appreciate what it did at all.

Speaker 1:

And then, Cliff, you had turned me onto it when I was doing some cogeneration for, the the some cogeneration for PM Bus. Yeah. He he may wanna look into this. And then, like, holy god. I can do what I mean, forget proc macros.

Speaker 1:

I can do whatever I want and create whatever source I want. I don't know if you have played it on the build or not. I mean, it's just like it it is amazing.

Speaker 2:

And you can do I mean, you really can. You could, like, read in the whole source tree and make some

Speaker 1:

you can, like, take an FPGA bit stream and compress it and actually then deposit it in in a binary that it can be then included in by your Rust program, which is how we

Speaker 4:

That's the thing we're literally doing. Right? So That's

Speaker 1:

the way we're literally doing. Yeah.

Speaker 4:

The the firmware for our, server board picks up an FPGA bitstream, which will eventually be built from RTL, but right now it's checked in as a bitstream. And loads builds a crate implementing a simple compression algorithm for your host architecture, usually Intel, although less so with time, and deposits the result of that into the source tree. The firmware files then include that using Rust Include bytes directive, which is really nice.

Speaker 1:

Amazing. Amazing. Include bytes is so amazing. Include bytes, Include str are amazing.

Speaker 4:

And then the firmware depends on that same crate, but Cargo knows to build the crate for your target embedded architecture when the firmware references it. So the Build RS and the firmware are using the same crate to sort of pass data between them in a compressed form, which is super handy and would have been a giant pain

Speaker 1:

to do

Speaker 4:

most other build systems.

Speaker 1:

It is it is remarkable. And then it and then we'll leverage all you're leveraging all that same stuff for idolatry. Right? For the taking the these idle files and being able to generate

Speaker 4:

The way it is code is you have your build script to depend on the idle crate, and it just calls in and says, I would like you to generate me a server stub, and please put it here, and then you include it from your source

Speaker 1:

file. And then the actual definition itself is in Ron, right, the the Rust object notation, which is Yeah. It, is there are definitely some things about it that I definitely like. I think at some point, we're gonna have to improve the error messages.

Speaker 4:

Yeah. You're gonna have to rewrite the parser at some point. The it it probably won't stay in wrong, to be honest. So we're at a stage in the project where we're iterating rapidly. We don't know what information needs to be present in the interface definitions yet.

Speaker 4:

We're still learning. I'm porting all of our existing IPC interfaces over to this to sort of learn how I did it wrong. And during that stage, it's really nice to be able to iterate by only updating the data structures, the structs, and enums, and whatnot that you've got in the cogenerator, and then use Serdi to load them from files. So right now, our IDL files are just data structures expressed in currently run that we load with Serdi to be structs in Rust and then manipulate. And this is fantastic for iterating, but it does mean that like every other case nowadays where you've got a configuration file stuffed into somebody else's meta syntax like JSON or YAML increasingly, it means it's never quite right.

Speaker 4:

You know? It doesn't it doesn't feel quite right. And so I suspect eventually we're gonna wanna parser. But for now, being able to just pull a crate off the shelf and parse complex data structures is hugely enabling. And, like, Ron itself is popular in the Rust game engine community for doing, asset definitions and engine configuration.

Speaker 4:

So it's getting a lot of attention from there, but it also has a bad habit of failing in ways that say line 0, column 0. And why that is, but I haven't gotten in to fix it yet.

Speaker 1:

I I feel that error message should say, like, hey. You might as well start commenting out half of your run and and it's like because it's that's That is the new development

Speaker 4:

process. Yes.

Speaker 1:

Yeah. Right. Right. Moving

Speaker 4:

the common characters around until it stops failing.

Speaker 1:

But Ron has common characters. God bless it. So I Yep. There there's a lot to be said for it. So that's gonna be exciting.

Speaker 1:

That's gonna and then we can also once we have that, once we actually have we know, oh, by the way, this task has this interface definition. We can leverage all the things that we have built to then say, oh, so now if I can if I see a message from, say, the debugger, from humility, from this task to that task, I actually know how to interpret it, and I can actually show it to you, and which is something that I have always wanted. I mean, haven't we all the ability to see these messages?

Speaker 4:

Yeah. Message trace was pretty, specifically. And having all of the ideal definitions in the build archive means that in theory, once we finish this mechanism, you can hand the build archive to Humility and have it pretty print messages it's never heard of before that exist in your application, which is really hot. It's also weird knowing that, like, per the wheel of incarnation of reincarnation that constantly turns in our industry. When I joined Fuchsia, they were solving the same problem.

Speaker 1:

And they they did it in

Speaker 4:

a really nice way that's different from what we're doing, but, I just think it's it's interesting that we're building the same thing. This is is the hand icon on somebody means someone's hand. Yeah.

Speaker 1:

Yeah. It means Simeon, go for it. He's got a question. So, this is

Speaker 9:

a question which is kind of like, I've I've heard a phrase mentioned in a few occasions by Oxide folks. You, Brian, and, one or 2 other occasion that that is hardware software co design. And if I hadn't heard that phrase 19 odd years ago, I would have thought, oh, that means, you know, we have a small team. We don't have big silos. We have our hardware and software people work together.

Speaker 9:

You know, smart move. Okay. Done. Move on. And, you know, hang in there.

Speaker 9:

This this has got something to do with cogeneration. But I happen to have heard of that phrase 19 years ago. And and and the connection there was a series of papers that came out of a research group at Berkeley who did work on basically the idea that you you model your entire system, perhaps, software, hardware, you know, you know, your your RTL for for your FPGAs or whatever. And then you have, like, this big button that you press in this amazing software suite that generates all your code for you. Now I you know, that's not something that I I think necessarily was going to work or would work.

Speaker 9:

But, you know and and there's an interesting story arc that kind of connects these things together. But, oh, sorry. And and they called that co design. But but I'm kind of curious, you know, when when oxide folks say that, do they, you know, what do they mean?

Speaker 4:

It probably differs for each of us. You should separate us into separate rooms and ask us individually just to make sure.

Speaker 5:

So you're suggesting he take the micro kernel approach like this?

Speaker 1:

Exactly. Well played. I also feel that, like, Cliff now, I've got, like, each of us in our own interrogation room. Be like, no. No.

Speaker 1:

I'm not gonna tell you what heart it's like Leventhal already Leventhal already told us everything. We actually already know it all. We just wanna confirm that you yeah. So

Speaker 4:

I mean, you just yeah.

Speaker 1:

I mean Yeah. Cliff, give your answer to this. I I I'm actually I'm very curious on your answer to this. I've got something in my own, but I'd love to hear your perspective.

Speaker 4:

I mean, this is gonna sound like every cliff answer ever. But, you know, it really depends on on what you mean. So it's a spectrum yada yada. No. The what we are doing concretely, call it co design or not, what we are doing is closer to the first thing that you said.

Speaker 4:

We have the same people concerned with software and hardware. We are hiring hardware fluent software people when possible and software fluent hardware people when possible, and making sure everybody talks to each other in their own organizational boundaries so that we can make compromises across the software and hardware stacks to make the product better. Rather than, say, having the hardware platform handed to you and you have to write the software for it, we can make a better product if we can, do that in both directions. Back annotate in in if I can abuse some schematic terms from the software back to the hardware, and that produces a better product. I'm also familiar with the sort of academic end of code design that you're referencing.

Speaker 4:

And the notion that you could have some sort of common modeling language or system and have a machine decide where to draw the lines is compelling. You know? It it sounds great. I worry that, I mean, we've gotten very good at teaching machines to do massive scale applied statistics, which is what most of what people call machine learning actually is. And the thing that they're that the machines tend to be kind of bad at is the intuitive leap.

Speaker 4:

And I feel like one of the things that we can contribute as engineers based on experience and and resonance with each other and different backgrounds and stuff is even just the decision on where to draw that scene between the hardware and software, even if everything was generated other than that, I think is incredibly powerful and can make the difference between a product that works and a product that doesn't.

Speaker 9:

Yeah. So one of the purported benefits of the of that, the code, let's call it the Berkeley definition of of hardware software co designers. This idea that you simulate your entire system and then you you have new knowledge and you use that new knowledge to partition. So you decide, you know, this, this part of the data path we're gonna do in software, this part we're gonna do in hardware, because we've now learned We've understood resource requirements, flexibility requirements, you know, how things fail and that kind of thing. I I kinda like the idea.

Speaker 9:

I I don't wanna take up too much of people's time, but but the sort of thing that connects the 2 is it turns out that 19 years ago, I was dating a girl who whose dad, was building his own hardware and writing his own software for it. And he, and so I I did some work for him, and he was like, okay. So I have my own RTOS, and this is how it works. And I learned all of these techniques from these Berkeley papers. But then the really cool thing about that is that Keith made it pragmatic.

Speaker 9:

He said, okay. I know I'm not going to have this amazing tool where I model the entire system. I press the big button that generates everything. But his approach was to say, I'm going to mock out all my hardware interfaces within a single binary on my desktop system. You know, in his case, it was, he needs to speak to a GPS.

Speaker 9:

So he wrote a virtual GPS that may generate, you know, GPS strings on a on a serial port as, you know, for example. And then write his firmware to run as a desktop system that essentially simulated the target. And then he had the knowledge to say, okay. Now I'm going to go and build the hardware. And, of course, it was all in, you know, in his head.

Speaker 9:

He was, you know, one guy doing all of this. So he has a lot of advantages of being able to you know, he doesn't have a communication gap and that kind of thing. But I thought that it was quite a cool technique. It's not something that I've seen elsewhere. So, you know, every time I hear people say, what are we talking about?

Speaker 9:

It's like, wow. Okay. You know, maybe maybe somebody else is doing this too.

Speaker 6:

I I will definitely say that the the theory behind it of treating the problem as something from all the lower layers of the hardware up through the highest levels of software as one problem space And then trying to understand what challenges you run into, and then figuring out where in the stack is the appropriate place to deal with those challenges applies. It's just that we're not taking the approach of and and most folks who talk about hardware software co design take an approach of it's more human solving the problem rather than be making a big modeling system and having it try to solve it. But it's definitely the instead of feeling limited by the boundaries that already exist in the system, do you know, I don't need to respect the, EFI bound as a boundary. I can change both sides develop workarounds for limitations in those interfaces.

Speaker 9:

I I feel like that is, to me, the way that I've understood the story of oxide from the start. You know, it is it is that vert vertical control through the stack.

Speaker 4:

Well, I'm not having a PC handed to you that needs to be able to run Windows 3.1.

Speaker 1:

You know, if if we're reading that, we have

Speaker 4:

a lot more options on how we can build the system and which abstractions we can discard. And

Speaker 3:

For the service processor being totally not a BMC, in the sense that we don't support any of that stuff that, like, is traditionally necessary, but it serves the same function,

Speaker 1:

but Right.

Speaker 3:

In a totally different way because, like, we don't have to care about people connecting their existing I don't remember exactly what the acronyms are. VNC or whatever it is to be able to, like, log in to your stuff.

Speaker 1:

Right. Yeah. And I the it's it's about having all these tools in the toolbox. And at just as Rick says and this is Cliff is saying too about, like, replace the the the difference is we're taking a similar approach. It's just that we've got human judgment that's and and engineering that's actually making those decisions instead of a single unified software system.

Speaker 1:

But to me, having the ability to come up to a problem and say, like, well, what is the right solution to this? Is it is it an FPGA? Is it a microcontroller? Is should this be something that's done in the house CPU? Should this is should we I mean, just that ability and to me, like, I I I think the way we do power sequencing really represents that where we kinda have an option.

Speaker 1:

Do we do we do this in the service processor? Do we do this in an FPGA? And in this instance of the product, we're doing it in an FPGA. An FPGA whose bitstream is loaded, you know, the build RS mechanism that Cliff alluded to. In the future, we might change that, and we will change that if it made sense.

Speaker 1:

But this this is absolutely makes sense as the way to do it now. And the ability to have all those tools in the toolbox, hardware and software, that to me is what defines our our co design.

Speaker 3:

The the way I added it is narrowly Okay, Eric.

Speaker 6:

Oh, I was just gonna be, we did narrowly avoid using Concept HDL, which as we were trying to figure out why Concept HDL, the the Cadence product, is the way it is. We stumbled upon some of the original origins of that back to one of the system design projects, and it and it was originally based around building supercomputers through hardware software codesign in the more modeling sense. And, you know, we kinda looked at that and said, hey. That's really cool. Let's not do that because that's really complicated for reasons.

Speaker 3:

The the way that I think about this as a a former web programmer is, like, you know, historically, you'd, like, see this system boundary and be like, oh, we can't do better than CGI. Like, I can only write something against the CGI interface. But then, you know, eventually other people came along and invented Fast CGI and WSGI and all this stuff. And, like, what we're able to do is both say, like, are those specs better, but also just, like, do we need them at all? Let's build the web server and the web location together because that interface is overhead and make the appropriate trade off for whichever part of the system that we're building at.

Speaker 3:

And sometimes that is conforming to existing interfaces, and sometimes that is eliminating the distinction and coding on both sides of it entirely and just, you know, not even care.

Speaker 4:

And like part of the reason that this is important to me, other than that, I think it makes better products, is that you could look at this from a interpersonal and organizational dynamics perspective as a a value for counteracting Conway's Law. Conway's Law being the the notion that the the structure of of a product or a software application tends to reflect the structure of the organization that created it. And I think that in general, that's a bad thing. And keeping things fluid and keeping the product from reflecting a siloed organization in a siloed designed with narrow interfaces, if you can pull it off, gives you the ability to not only iterate faster, but produce a better product in the end.

Speaker 1:

Totally. The yeah. That's, repelling repealing Conway's Law is or or, definitely something that we can you can do when you have all those tools in the toolbox. So I, I know we we're kinda coming up on the 2 yeah. Sorry, Mark.

Speaker 1:

Go ahead. Go ahead.

Speaker 5:

Yeah. So just one of the other things that I I have to say as someone who's been sort of watching the whole hubris, presentations occur here is that it's it's incredibly interesting to me to see what happens when people who are typically used to sitting high in the stack and have all these nice things encounter embedded development and and all the rough edges people have been put up putting up with and you know things like just having gone through and put in the humility diagnose, which at least in the web demos looks phenomenal. Like, it's like, oh, something's wrong. Well, okay. Just tell me what it is.

Speaker 5:

Is is fairly alien to anything that I've ever seen in an embedded system, at least. Just like this is, basically a strip line of phenomenal ideas or at least different ideas.

Speaker 1:

Yeah. I mean, we are and and I don't know, Cliff, if you wanna give a little bit of backstory on on Humility Diagnosed. That that's definitely, Cliff's invention as we were beginning to see some of the raw parts we had in front of us. Yeah.

Speaker 4:

I have a script on Loom called autocliff so that when I wasn't around, somebody could run into a problem and then execute autocliff, and then I could go on vacation. But

Speaker 1:

Alright. Then doesn't autocliff just say it depends or it's complicated? I mean, the the

Speaker 4:

I mean, more or less. But keep in mind, I I was younger and, you know, may have seen a little more black and white at that point. But, the thing that's often like, GDB is a great example of this. GDB notionally supports ARM B7M and the Cortex M Processors. But frequently, if it takes a fault, it can't tell you what fault it took.

Speaker 4:

And, like, that's kinda table stakes. That's the first thing I wanna know, and it's in the architecture manual. It's not a secret. So I we we didn't wanna wind up like that, and I wanted to reduce round trips between engineers that were using Humility and familiar with Hubris, but not deeply familiar with the processor architecture, like our excellent electrical engineering folks, and and me. You know?

Speaker 4:

I I'd like to be asleep sometimes. So providing things like that in response to user demand, this is also one of the really nice things about doing all of the testing bring up and QA ourselves is that we're getting customer reviews. We're getting feedback. We're we're hearing what tools we need in the field, and we can roll them out. And, yeah, I I honestly put a lot of the credit for this at Rust's feet, not because Rust as a language uniquely enables people to write diagnosis scripts or anything like that, but because there the list of things that I'm not thinking about is really, really long right now.

Speaker 4:

And a lot of the things that I would have been thinking about and trying to handle through conventions and code reviews in a large c plus plus code base, I'm just not thinking about. The type system has me. So I can fritter my days away writing diagnosis scripts instead.

Speaker 1:

Well and I think humility diagnose tacks into the fact that if a if a task is failing, it is very likely panicking. It it it it is explicitly panicking. That can be that can be hard when the system is automatically restarting tasks to figure out why is this thing panicking. And Humidi diagnose allows it to, like, okay. Well, let's just do the thing that, you know, that let's do the auto clip thing, and let's actually see if this thing is restarting.

Speaker 1:

And if it is, here's the panic message and so on, which is really, really helpful.

Speaker 6:

And and, definitely, I I learned at Google that there are a lot more people who know a programming language than are very, very, very intimately, experienced with microcontrollers and applications or things? And

Speaker 4:

Oh my god. Yes.

Speaker 6:

That definitely there there's this whole aspect of it's not necessarily about making the current developers' lives easier. There's next group of people who are building on top of this don't have to gain all that that deep experience.

Speaker 4:

Some people will still have to have a hard time break. Say again?

Speaker 5:

I think we lost him. But, you know, it is great in that, like, it's both very familiar in the sense of, like, production validation and test code. You know? Like, that ends up getting written for every board that ends up being produced in substantial volume. But taking that same philosophy and applying it to the operating system, and to embedded code is something I definitely intend to steal.

Speaker 4:

Please please. Please

Speaker 1:

do it. Please. Please. Can we give you a list of other things that we can steal so we're not the only ones doing it? So can can you please steal memory protection as well?

Speaker 1:

I'm Would you mind? Also, can you please steal the need for secure silicon?

Speaker 5:

Definitely. The this is something we we badger vendors about at length. Anyway but yeah. You know? And and also, like, you know, the the debugging ring buffers, which are, you know, floating around in real kernels, but not so much in RTOSes all the time.

Speaker 5:

Those are getting stolen for even more embedded things for sure.

Speaker 4:

Also, the ring buffers that are floating around in most kernels that I have experience with, you either get an formatted string of bytes or you're doing string formatting at runtime.

Speaker 5:

What could possibly go wrong with throwing print ups everywhere in your debug code?

Speaker 4:

Yeah. The the ability to just to throw, enews with meaningful field names into a ring buffer is it blew my mind. This was Brian's work. This was not me.

Speaker 1:

Well and the ability to do, for example, like just one of the challenges that we were having, with Hubris is if you, for example, just have a, an h printlin equivalent where you are formatting a floating point number, you have added an indeterminate amount of text, and stack consumption. Because as it turns out, formatting a floating foot number is very fucking hard. And you you you I mean, it's it's no slight against the code that does that. But, boy, is it nice to be able to do all that foot that that floating the floating point processing in humility and not have to do it at all in Hubris.

Speaker 5:

What what, Brian? You mean you don't always wanna go percent 06 dotf3f?

Speaker 1:

Well, you actually that I am totally cool with. Maybe it's because I have got I I've, like, lowered my standards for myself. I got no problem with that. What I've got problems with is, like, oh, by the way, because you did that that that h print line, you now overflowed your stack, or you're that print line has overflowed your stack. And now what was a working task is now a dead one.

Speaker 5:

Well, clearly, just don't write bugs. But, you know

Speaker 1:

Exactly. Alright. Well, so I I we we are at the 2 hour mark. And, I I feel Cliff, I I thank you so much for for joining us. This has been a really exciting conversation.

Speaker 1:

These are recorded, obviously, so people can go back and listen to it. But, you know, I think on behalf of all of us, thank you for for Hubris. I mean, we've all been really excited to participate in it. Very excited to have it opened and out there. And it's been really exciting for the rest of the world to see what we saw.

Speaker 1:

And I think that, you know, I'd it'd be interesting to know your take on on this on how it's been received. But to me, it's been received really well, and people have seen a lot of the same value that we have seen. So thank you very much.

Speaker 4:

I'm just delighted that people seem excited about it. You know, that could've gone way worse.

Speaker 1:

It could that that could've gone way worse. And, it would it's been it's been really exciting to see. It's all again, it's all open source. All Hubris and Humility is open source. If you haven't had a chance to check it out, one of my also one of my favorite videos is, a, Rick, a friend of yours that was, that had some free time on the the day we announced.

Speaker 1:

It's like, I I got the I got an eval board lying around here, and I'll install this, I think, on an f 3 or f 4. But, it was really fun. That that's also a great video to check out. We'll We'll put a recording in the show notes. But the because it's someone who is kind of, like, peeling the onion back as they are installing hubris and appreciating humility and so on.

Speaker 1:

And then, Rick, you did obviously did a great job talking about why we had made the decisions we we had made. So that's also really good recommended viewing. And then also, man, the docs are great. Huge tribute to a lot of folks. Cliff, you especially, but, Steve, you as well, and others.

Speaker 1:

Check out the documentation as well. And with that, I think this, I guess, Adam, I guess we'll leave the option open to do it next week, but we may cancel next week because we're coming up on a on the holidays. And if if we don't, if we don't see until the new year, Adam, this is what? Our 20 is this our 25th or 26th?

Speaker 2:

26. 26.

Speaker 1:

26. And it's been a lot of fun. We love the we we are loving getting to know folks this way, really enjoying it. So thank you again. Thank you for a great 2021.

Speaker 1:

And, Cliff, thanks again for for joining us today.

Speaker 2:

Yeah. Thanks,

Speaker 9:

Cliff. Holidays. And Wednesday.

Speaker 1:

Great. Take care, everybody.

Speaker 3:

See you all later. Bye.

The Pragmatism of Hubris
Broadcast by