CrowdStrike BSOD Fiasco with Katie Moussouris

Bryan Cantrill:

Oh, I'm so excited for this. I'm so excited for this. Katie, thank you so much for joining us. Sure. So Katie is here.

Bryan Cantrill:

Yeah.

Katie Moussouris:

You might hear in the background a kitten or 2 kittens. I just wanna let you know that I don't know if you heard that.

Bryan Cantrill:

Oh, no. On cue. On cue. You know, so Katie, you and I actually share a very important property. You and I both have, cats with nerdy names.

Katie Moussouris:

Yeah. Actually, my nerdy named cat passed away in December. He was

Bryan Cantrill:

I'm not sure that.

Katie Moussouris:

Yeah. Scappy. Scappy the cat.

Bryan Cantrill:

Scappy. Yeah.

Katie Moussouris:

He's not a dumb mother. Yeah.

Bryan Cantrill:

It's such a great name for a cat. I'm so sorry. Scappy. I'm so sorry that Scappy passed away. Do have you continued the tradition by naming your your cats after intrigent detection programs or other programs anything else for that better?

Bryan Cantrill:

Or do they what are your cats' names? I feel so direct and personal. Yeah. No. These these cats,

Katie Moussouris:

they're 7 month old kittens from, you know, brothers from different mothers. And, the little one who's being vocal right now is called Mochi because he was a little bite sized snack. And, the and then the other one is called Selkie, s e l k I e, because he's a blue kitten and he looks like he's smooth and shiny like a seal, like a baby seal. And no, these are none of these are security questions. So you can't

Bryan Cantrill:

Oh, you know, and here I I thought that I I god. Okay. I'll check that off the list. Darn it. Well, we need to talk about, like, your birthday is a very special day

Katie Moussouris:

to you. So it could be tough.

Adam Leventhal:

And who could forget their first car?

Katie Moussouris:

The street I grew up on.

Bryan Cantrill:

And your mother's maiden name always always has a special place in your heart. The, well, so I have a I have a cat named after a defunct computer company, Royal McBee. This was a Oh,

Katie Moussouris:

I was gonna say, is it digital equipment? Because, I mean

Bryan Cantrill:

You know, bless you for referencing a deck. So, you know, we have these logos in the office that are that we reenvisioned oxide logos as the logos of defunct computer companies. And many people stare blankly at what is the deck logo reenvisioned as the oxide logo, which is to say, like, oxide in that deck type. And I'm like, oh, man. This was such a dominant company, and it's so gone.

Bryan Cantrill:

It's not you. Oh, as I tell people, it's like, it's not you. You're so little too young. But can you you and I are of similar generation and vintage along with Adam.

Katie Moussouris:

Yeah. No. We've seen things. We've seen some things.

Bryan Cantrill:

So just to and Adam and I are trying to get better about doing intros. So we're I'm gonna, I'd we we've got with us today Katie Maceras, who I k. I love the fact that so many of the intros of you on the podcast podcast you've been on are if you haven't heard of Katie, you should have. Basically, shame on you. And but, real info sec pioneer, in many different dimensions.

Bryan Cantrill:

And you you've been on the the the kind of the the hacking side of this. You've definitely been on the the infosec side of this the security side of this, pioneer in bug bounty programs, pioneer in the way vulnerabilities were disclosed. And we are so I'm I feel so lucky to have you here because we are talking about an absolute whopper today. And I know the I I you you do wonder if someone at CrowdStrike didn't, like, persuade Nancy Pelosi, like, hey, now is the time, because if you could scare us off the news, that would be great. I think anything could happen to just get us, like, I don't know.

Bryan Cantrill:

You know? I I don't wanna tell you how to do it, but, you know, you could get maybe the president not secret reelection maybe. But the, we so that obviously news broke over the weekend, but this CrowdStrike outage is ginormous. It is and, Katie, I would love to get obviously your perspective on this, but I think you can see all of compute from this outage. It is the intersection of so much.

Bryan Cantrill:

It is, I think it's the largest IT outage in history. They just to get sort and to give, and we'll we'll give a little more context on the outage, but, it it is something that has affected airlines. It has affected hospitals. It has affected 91 centers. It's affected a lot of, like, real life stuff.

Bryan Cantrill:

It is, it is the result of a distributed system gone amok. It is the result of a whole bunch of technical issues, and we've got a lot of open questions about that. It is not exactly, an a cyber attack, but I love the fact that, like, the first thing that CrowdStrike says is, like, this is not a cyber attack. It's the classic, like, my t shirt is raising questions that are clearly answered by my t shirt. It's, like, this is not a cyber attack.

Bryan Cantrill:

Okay. But it does look like one. Right? I'm so the it it's part of the reason that that that you're kind of the perfect person to walk us through this, because this is intrusion detection software that has gone a mock, and it this is, where the, I I guess the, the hunted has become the hunter, and the, where the thing that was designed to prevent us from having these kind of problems has induced a very, very large problem. And there and there are there are technical failures here.

Bryan Cantrill:

There are operating system issues and bootloader issues, and there are virtual machine issues. There are CICD issues and lots of technical issues and lots of organizational issues. There's the way you there's crisis management issues. I think they're gonna be legal issues. So there's like, you can just see it all from here.

Bryan Cantrill:

It is a Cheesecake Factory menu of fail. It's it's we have we have sat down. We 3 have sat down and they've given a phone book in our lap. Like, god, I don't even know where to start. I've it's like kinda like 16 pages of appetizers.

Bryan Cantrill:

So here's where I would like to start. Could you because this outage affected so many people in so many different ways. Could you take us through kind of the your sequence of learning about this thing and beginning to realize, like, oh my god. This is really big, and it's at the intersection of a bunch of things that I know. And, I mean, I'm sure you had some thinking of, you know, is this an attack or not?

Bryan Cantrill:

So can you kinda walk us through that when you this first started we sort of first started getting reports of this, like, on Thursday night, I think, or very early Friday morning Pacific time, about these blue screens of death.

Katie Moussouris:

Yeah. Well, I mean, you know, lucky for me, I have insomnia, so I was awake when all this started coming out. And, and I also you know, I'm time zone challenged because I just got back from vacation, and my vacation was, you know, in several different time zones that were over the Pacific. So, anyway, a lot of things were going on with me and the time zones at the time that this this all started coming up. But, like many of us, I found out via, you know, one of my little signal group chats.

Katie Moussouris:

So this was where I first heard about it, and it was just, you know, it was a signal group chat of a bunch of of friends, nerdy friends, and and it linked to, some of the first reports showing up on Twitter, or we're not gonna call it the other name. But You're

Bryan Cantrill:

not gonna call it the other name. You are in you're in good company. No. Only Twitter.

Katie Moussouris:

So yeah. So it was, you know, it started to emerge that way. And at the time, I think it was, you know, there was also that other, Azure outage issue. So the the first I was looking at it was, is this a Microsoft thing or is this a CrowdStrike thing or is this Microsoft thing being caused by the CrowdStrike thing? So that was sort of the initial, how I was exposed to the news.

Katie Moussouris:

And then, you know, I think it it started very quickly, unraveling that, yeah, this is they may be separate issues. They may be related. Nobody knows. But that, yes, this was definitely a CrowdStrike thing. And the extent to which it was going to mess with everything wasn't quite known yet.

Katie Moussouris:

You know, it started in Australia, right, of where it was really messing with things. And then I remember seeing a lot of people saying this is this is what they warned us about when y two k didn't happen, but this is really happening. Right? And I remember y two k because I'm of a vintage of that totally remembers y two k and was in fact, you know, among the people who were helping to prevent the y two k disaster from happening. So, you know, there's a

Bryan Cantrill:

lot of with y two k, and essentially it's certain Australia because y two k had this kind of interesting property that if it was going to be bad, it was gonna be bad in New Zealand first, and then it was gonna be bad in Australia. And this was, like, I did kind of have, like, god. And Adam, I remember you remember talking about this. Okay. So I guess you weren't at at Sun yet because you're still at school in That's right.

Bryan Cantrill:

At the rollover. Right. I remember talking to colleagues being, like, can you imagine if, like, New Zealand just goes total postapocalyptic mayhem, and you now have, like,

Adam Leventhal:

23 hours to unload or whatever. Hours.

Bryan Cantrill:

You have 23 hours to, like, stockpile, like, the water and ammo. I mean, it would just be, like I think I would just get a lawn chair out and just, like, you know, kind of enjoy it. Just sit back and but so yeah. So we this start it very y two k vibes starts in Australia, kind of interestingly enough, same way. So, yeah, as you say, this is the thing we've been prepared for that didn't happen.

Katie Moussouris:

Right. And the the reason it didn't happen is a lot of us old folks are pointing out is because we worked really hard to make sure it didn't happen. Right? We we had warning, we knew, you know, the y two k pocalypse was was coming and so, you know, a bunch of old COBOL programmers, even older than we were, were, you know, being called back out of retirement. And then there was also, you know, the folks like me who were tasked with systems administration duties, like upgrading even some of the UNIX systems that in theory shouldn't have been affected, but totally were going to be affected.

Katie Moussouris:

So, yeah, I was saying I was saying to somebody on Twitter. Yeah. Someone on Twitter was saying, oh, UNIX was fine because UNIX, you know, just went with the with the standards. And and I was like, son, let me tell you something. No.

Katie Moussouris:

It did not. Not all Unix. And it was, I was working that summer of 1999 at Harvard. It was Harvard's division of engineering and applied science, whatever it's called now, but, you know, that's what it was called back then. And a bunch of SunOS boxes had to be upgraded to Solaris because SunOS could not handle the 4, you know, the the 4 digit date and Solaris could.

Katie Moussouris:

So that was, like, a lot of what I was doing that summer. But, anyway, that was kind of, like, my first annoyance with the reaction on Twitter was, you know, people people who are too young to remember downplaying that y two k could have been as bad or even worse. And I was like, yeah. But this one came with no warning. You know?

Katie Moussouris:

This one was Right. Surprise. Something bad just happened. Yeah.

Bryan Cantrill:

And then I So you thinking it's gonna be in those early hours, Are you and your kind of signal chats? I mean, people must be wondering, is this a cyber attack? I mean, how clearly does it, how did, how soon does it become clear that CrowdStrike is involved? Or, I mean, what's the I think I I was awake when I first saw the reports. I kinda shrugged my shoulders and went to sleep, and then woke up.

Bryan Cantrill:

It was all kind of been determined business. It's CrowdStrike. But how quickly did people were people able to make that determination that this was due to CrowdStrike, or was this due to CrowdStrike raising their hand?

Katie Moussouris:

You know, I so it's it's a little cloudy for me because I was up at that hour, and then, you know, I you said you shrugged your shoulders and went to sleep. I think I shook my curmudgeonly fist about y two k and then went

Bryan Cantrill:

to sleep.

Katie Moussouris:

So I think I went to sleep too, and then I woke up and I was like, oh, yeah. Not a cyberattack. Okay. Oh, you know, George is looking rough on on the Today Show and everything. And that was that's what I woke up to, you know, was poor George, you know, looking kinda rough.

Katie Moussouris:

Definitely had been up all night and, and trying to explain. And so he was already out there in front of it, and they would have no good reason to make a mistake about whether it was due to a cyber attack or not. Right. Right. Right.

Katie Moussouris:

There's no benefit to them whatsoever to cover that up. So I was definitely willing to believe tired George. You know what I mean? So Totally.

Bryan Cantrill:

Totally. Yeah. And so we should talk about the the failure too. So this is the blue screen of death. This I I feel, and this is, you know, because we are all of the same vintage, and I definitely, you know, I'd always viewed Bill Gates as robbing me of my childhood.

Bryan Cantrill:

So I'd kinda grown up vilifying Microsoft. So it's very it's a real struggle for me emotionally to really feel, empathy for Microsoft in this situation that, like, this one is really not their fault. Like, this is a kernel panic. Every operating system, if the if if you're in the kernel mode, and you and you access memory you can't access, operating system has to die, and every operating system is going to die. This is the way that these machines die.

Bryan Cantrill:

They die with what's called the blue screen of death. And the it it it's an it the it's kind of unfortunate that they're being it's like not a Microsoft issue. This is a this is an issue with software that has been loaded into the operating system and into the operating system kernel. But symptomatically, this is really rough. Right?

Bryan Cantrill:

Because this stuff is it it it is this is not something that can be easily rolled back. You've gotta basically get hands on these machines to remove this file.

Katie Moussouris:

Yes. Yeah. I mean, look, I was a Linux developer in the time frame that we were just talking about, like, back in 1999, 2000. And, trust me, it was it was as big of a surprise to anyone that I ended up working for Microsoft from 2007 to 2014, myself. But, you know, having worked there and having worked in the security response center, one, a lot of hardworking people work in security across whatever company, you know, you you may see.

Katie Moussouris:

And especially a company as well resourced as Microsoft, they have some of the best and smartest people. Honestly, there were two places that I've ever worked in my life where I felt like I was surrounded by some of the smartest people on earth, and those were MIT and Microsoft. And I don't you know, it's it's completely not underestimating, like, the dedication. That being said, you know, yes. You're right.

Katie Moussouris:

Kernel panics happen to all all things with colonels. And I think the the issue there is that, you know, due to some antitrust settlements in Europe, Microsoft had to give kernel level access to any other antivirus or similar security software, so that it wasn't putting itself into a, you know, a noncompete, violation, from that EU settlement of its of its monopolistic hold. And so, you know, a lot of reporters came to me wanting to talk about monoculture. And, you know, is this a problem in the monoculture of Microsoft or the monoculture of CrowdStrike, you know, and and do we need more variety? Will that will that solve the problem?

Katie Moussouris:

And I'm like, no. Not really. Not really. Because, you know, one, if you get down to it, it's, you know, Microsoft is a useful operating system for a lot of users. They are used to it.

Katie Moussouris:

A lot of business applications are running on it. And, you know, that's just what the free market has chosen for a lot of, you know, a lot of these purposes. And, there aren't that many other flavors of operating systems out there. And as you were saying, they all have kernels. Right?

Katie Moussouris:

They all have kernels, and they're going to need to give similar levels of access, you know, in kernel mode as opposed to, user land for a lot of these, security software programs to work as effectively as they can. Okay.

Bryan Cantrill:

Yeah. I I've got so many follow-up questions, if you don't mind. I mean Yes. I mean, the the regulatory angle is really interesting. One thing I think it is worth clarifying is that Microsoft, when when we were kids, Microsoft did not have memory.

Bryan Cantrill:

They were not using the memory protection of the microprocessor, and the microprocessor didn't have memory protection for some number of years too. This is true for Apple as well. So it was not true that you had to be loaded in the car. You could have an application that would panic the system. It would happen all the time.

Bryan Cantrill:

And the part of the great revolution of getting protected mode operating systems like UNIX into people's hands was actually kind of forcing both Apple and Microsoft to develop true protected mode operating systems, which was well done, like, well, but, you know, in the kind of 2 1,000. So I all operating systems can can definitely pack, but this angle of so I've got the the question I've got for you is, like, what is this software doing? It is it is doing malware intrusion detection or malware detection, but why does it need to be in the kernel to do that?

Katie Moussouris:

Well, I mean, there's visibility that you have at that level that you don't get in user land and, you know, it's it's dangerous being that close, you know, and and that able to see things. I think it's it's really that, the speed at which it it needs to detect things and, you know, somebody in the chat is saying, they're instrumenting Sys calls directly. I mean, the the fact of the matter is it's they want to be able to prevent some actions from happening. So unlike the old antivirus signature based, you know, malware detection of old where they're basing it on known, malware that is doing known things and basing it on signatures, They're doing some of that, but they're also doing they're monitoring behaviors. They're monitoring odd calls into memory, so they need to be able to intercept those, and that is why, you know, they're kind of sitting that close to the central nervous system of of your computer, and that's why they're in kernel mode.

Katie Moussouris:

But yeah. I mean, there are certainly, there are different types of solutions that are out there that, you know, are trying to do things in less of an intrusive way, and some are more successful than others. But the fact of the matter is CrowdStrike isn't alone in their use of kernel level access. Right? So it's like that part, you know, is the ecosystem we live in right now.

Katie Moussouris:

And whether there will be better solutions, whether there should have been better solutions, I think that's gonna come out, you know, certainly in the, you know, in the coming, analysis of this whole thing. We don't even have a full analysis technical details from CrowdStrike themselves. They say what it isn't. But they they haven't quite said what it is.

Adam Leventhal:

Yeah. To say we don't have a full analysis suggests we have sort of a rough analysis. I the only things I've seen from CrowdStrike are very high level, like very, very, very high level.

Bryan Cantrill:

Very high level. And they are calling this like the technical details, but sometimes

Katie Moussouris:

it's not these are not the technical details. Gosh. You know, their statements yeah. They had a they had a link to a statement that had more technical details than the link you just posted in the chat with the technical details. And I was like, they really need to learn what, you know, these words mean.

Katie Moussouris:

I do not think it means what you think it means.

Bryan Cantrill:

It does not

Katie Moussouris:

mean. Definitely not. It's like they swapped the titles of those two posts, and I thought it was really odd. I mean, look. I I could tell that if George had been up all night like he looked like, every single person at that company had been up all night, including the comms people, and you could definitely tell, they were doing the best they could.

Katie Moussouris:

And, you know, if you think about it too, in one of my friendly signal chats were some of the Microsoft people who were directly working on this, and they had been but in seat for 36 hours as well. Like, literally, that was one of what one of them said. And so if the very well resourced, by comparison, Microsoft was also working a longer than 24 hour shift trying to deal with this problem, you better believe that everybody at CrowdStrike was all hands on deck as well. And, you know, basically, you know, it was more than pizza than they were having shipped into that office. They probably had those those hangover IVs hooked up to people.

Katie Moussouris:

You walk through. It's probably, really ugly. Yeah. But Yeah.

Bryan Cantrill:

And and I think that you get into the and, you know, we talked about this. We talked with the kind of anniversary of this big outage. We had a joint where you get very concerned about sleep deprivation. And, obviously, like, when you get sleep deprived, you make mistakes, you have errors and judgment. And there there are just so many cascading things from this.

Katie Moussouris:

I thought

Bryan Cantrill:

it was really interesting actually that the Microsoft folks were really jumping in on this. I mean, even when it was, like, pretty clearly exonerated as a Microsoft not a Microsoft problem.

Katie Moussouris:

You were exonerated, but you also saw like, there was a Wall Street Journal article that had, like, you know, one of the Microsoft executives' panties in a huge bunch, on he's basically because the headline was misleading, you know, because it was it was pointing the finger at Microsoft, and it wasn't until, like, you know, after the fold in that in that news article, that Wall Street Journal article, that you saw that, yes, this was due to a third party update from CrowdStrike. It wasn't a Microsoft problem. But yeah. So the I think in technical circles, Microsoft, definitely got, you know, was was put in the right context, but I think in the broader circles. And this I witnessed firsthand when I worked at Microsoft.

Katie Moussouris:

3rd party stuff would get Microsoft thrown under the bus in the broader, you know, hearts and minds of the users. Yeah.

Bryan Cantrill:

It's like

Katie Moussouris:

And that's just because their computer runs one thing. Right? It runs either Microsoft or Apple software, and that is it. You know, that's all they really know. And, I started, Microsoft Vulnerability Research, which was like an early precursor to Google Project 0.

Katie Moussouris:

So Microsoft Vulnerability Research was looking for third party bugs in the most commonly installed applications on Microsoft's platform for that exact reason. Right? It was, you know, an Adobe crash would be blamed on Internet Explorer. Right? A slash crash would be blamed on Internet Explorer, and and the list went on.

Katie Moussouris:

So they didn't get Microsoft didn't get away with it in the broad sense that the technical community knows, you know, that they got away with it.

Bryan Cantrill:

Interesting. Well, certainly what I was just impressed with is they were jumping in arm's length. Now it should be said, this particular failure mode is really, really visible in part because one of the many domains in which Windows absolutely dominates are these kind of external displays. Like, when you are waiting to board your flight, that the, you know, letting you know that you heard you know, United loves to tell me, like, where exactly I am in the upgrade or even if I'm, like, 89th. You know what I mean?

Bryan Cantrill:

You're like, oh, this is great. Okay. So, like, walk me through the scenario by which I get upgraded. I think I made the set anyway. The but, like, that screen that you're looking at, nothing's running Windows.

Bryan Cantrill:

You know, all of the the the all of the displays effectively when you're in a hotel or a hospital or apparently a 911 call center or an airport. I mean, that's that's Windows. I mean

Katie Moussouris:

Right. Well, you know what's what's funny is that, I've seen average users blame a Linux boot loop, you know, screen on their entertainment consoles on a flight on Microsoft. I've seen, like, Linux failures because I think users are so used to seeing, like, oh, there's a weird screen with a bunch of code on it. Damn you, Bill Gates. You know?

Katie Moussouris:

It is Bill Gates has got the on there and everything. Right?

Bryan Cantrill:

Logo on there. There's a penguin logo on your entertainment

Katie Moussouris:

console. What that means. Like, they're just like, what? You know, it's a bunch of stuff and the computer doesn't work. It's Microsoft.

Katie Moussouris:

Right? Right. So, I mean, in a lot of ways, Microsoft is the Kleenex of operating systems, and that's what the consumers experience. You know? So, of course, they jumped in because one, even though in, you know, their blog post, Microsoft's blog post about this, they said it was 8,500,000 Windows devices affected, which was less than 1% of Windows installs, right, out there.

Katie Moussouris:

And that sounds like a small number, but, you know, 1% of all Windows devices is never a small number. It's it's not been a small number since since 1999. It's not been a small number. Right? So, they, of course, had to jump on it because their phones were lighting up, and they would no matter what.

Katie Moussouris:

Right? So, in a lot of ways, I think, yeah, they came out with better guidance, you know, earlier. They came out with, I think they came out with with some scripts early, you know, earlier and whatnot. But it was really to get, you know, their phones to stop lighting up.

Bryan Cantrill:

Yeah. Interesting. When I do think I mean, on the one hand, I'm I definitely appreciate CrowdStrike, very sleep deprived, everyone working hard, and so on. And, you know, I don't wanna take anything away from what I'm sure is gonna be the most stressful moment for or period for many people's careers. On the other hand, there are actually lots of room for improvement here, and I do think part of the thing it is, and I I and I know based on your own history, you share this point of view.

Bryan Cantrill:

It's like we have to learn from failure. And we it's, you know, one thing, but it's, like, it's okay. Like, you know, everyone, you know, your your mothers still love you, but we really do need to do better. And there's a lot of room for improvement. And I feel like one of the early missteps was trying to play down the gravity, which is always really perilous in an outage.

Bryan Cantrill:

And, like, well, it's only, like, this percentage of the machines, or it was an update. It was only available for, like, an hour and 15 minutes. It's like, yeah. But, like, you know, 9 911 call centers downloaded that update and are offline, so it's actually just Oh, yeah.

Katie Moussouris:

Then there are still people who are stuck. Like, that part's not over. I was at my kid's orthodontist today, and they had to walk over to one of the 2 machines that were back online in the whole dentist office and everything. So this is definitely you know, if the if the airlines haven't sorted it all out yet and the dentist you know, I mean, from the airline to the dentist, like, this still isn't sorted out. Yeah.

Katie Moussouris:

And and this is having, like, long reaching effects, and we're not even done with the getting all the machines back online part of it. I do remember Friday when I went to the Starbucks nearby, they had a sign on the door saying their Wi Fi was down and they could only take cash or, like, the Starbucks app points. And and that was because, you know, their credit card systems were down. And then inside, they had x's of tape over the credit card, point of sale machines. And I have never seen you know, it's like I've never seen that much, you know, just affecting regular people in regular life.

Katie Moussouris:

You know? Yeah. So, yeah, I think your to your point, the downplaying and saying it was less than 1%, I think it was useful useful for all of us to understand that. I don't think it was the win they thought it was of saying that, but it was useful for us to understand that because I think I said it on Twitter. I was like, look.

Katie Moussouris:

If this was what an outage of less percent less than 1% of Windows devices looks like, we're gonna need a bigger boat because, like, we're Right. We're in for it, man. You know? And,

Bryan Cantrill:

yeah. Okay. So alright. So the I mean, got a bunch of things to pull on there. One is I think and you're kinda getting to a little bit of it.

Bryan Cantrill:

And, Adam, I know you saw a bunch of it too because you were trying to travel over weekend. And just the economic damage of this is gonna be significant and quantifiable, which is something that we haven't really had with single software defects. The software

Katie Moussouris:

licensing was very clever, and the most that most people can get is a refund of their licensing fee. Somebody was saying that that, that, you know, bigger organizations might have negotiated better terms than that, but the default terms are the most you're entitled to is fees paid, which would be how much did you pay for that CrowdStrike license.

Bryan Cantrill:

Well, so you wonder if that's gonna be stress tested, though, because, like, that has actually never been stress tested in part because the damages usually aren't significant enough to merit it. And often, it doesn't happen to everybody at the same time. It's hard to prove. And there are a bunch of reasons why you can't go litigate against that. But you could definitely go litigate against this.

Bryan Cantrill:

I I I really wonder. And the software industry wants to live in this kind of world where it's like, you're not oh, like, unlike literally every other product you buy, this one can be used for no purpose. So anything that you're using this for is, like, that's on your own. Like, I can break you, but then crutch

Adam Leventhal:

crutch are just for entertainment purposes really.

Bryan Cantrill:

Just for entertainment purposes. And it and it's like, yeah, that's actually not the way like anything else works, and there's consumer protection law. And software's always been kind of, like, skirting in this, like, gray area about, like, oh, we don't even know what it is. Like, by the way, you didn't even buy a copy. Like, you got a, you know, there's all And I, and they're gonna wanna stay in that gray area, which means I think that they're gonna have to, I mean, I I don't think they're gonna be able to be like, oh, by the way, like United Airlines, do you want a refund?

Bryan Cantrill:

That's the only thing you can get. Like, yeah. I don't think so, pal. Like, we've got we've got tens of 1,000,000 of dollars, 100 of 1,000,000 of dollars damage across the economy. Like, that's not gonna fly.

Bryan Cantrill:

I don't think I mean, I don't know, Katie. What do you think? Is that maybe maybe that's, I'm we're now kinda all out of our depth, I guess, but, do you have a take on Well,

Katie Moussouris:

I mean, they they could always try and say they were a screensaver program, you know, just keep blinking through.

Bryan Cantrill:

I think That's

Katie Moussouris:

right. No. That's right. This is right.

Bryan Cantrill:

Program is working very well, by the way. I don't know what's here. Right?

Katie Moussouris:

Yeah. No. It I think, honestly, they are gonna have a lot to answer for. And, you know, I had suggested I just rolled off, finishing my term on the cyber safety review board, and I was like, this smells like, you know, a great topic for the next cyber safety review board, you know Yeah. Scrutiny.

Katie Moussouris:

And, I think it's I think this is exactly what the executive order that the president used to create the Cyber Safety Review Board, the CSRB, was meant to examine was, like, what crippling terrible things could happen, and how can we prevent them, you know, from happening in the future? Let's look at this plane crash of software. But I think

Bryan Cantrill:

please talk about the CSRB? Because I actually, ironically, learned about the CSRB for the first time on Thursday, like, 12 hours before this happened.

Katie Moussouris:

Oh, that's funny. Yeah. Yeah.

Bryan Cantrill:

So so we so we were talking to a potential oxide customer who who is in a regulated space, and they were mentioning the CSRB. I'm like, what is that? Why have I not heard of that? And it's pretty recent. So could you walk us through the CSRB?

Bryan Cantrill:

I found this really intriguing.

Katie Moussouris:

Yes. It was so it was a cyber safety review board, CSRB, that was that was born like, you know, Athena out of the mind of Zeus. No. It was born out of an executive order. And I was asked to be part of the inaugural CSRB, which was a great honor.

Katie Moussouris:

It's about 15 board members, half of whom are government folks that are on the board based on, you know, by virtue of what role they play in the government. So, for example, you know, when when, Rob Joyce was at the, you know, at NSA, his role on the board was the NSA seat, and, and then when he went into private industry and retired, they asked him to continue, but in one of the private sector one of the, private sector seats. So I was in one of the, you know, obviously, the the private sector seats. But what the purpose of the CSRB was, it got some unfortunate early comparisons to the NTSB, which is, you know, as you know, examines plane crashes and their causes and then shares out, you know, best practices. But NTSB is also a regulatory body.

Katie Moussouris:

Like, they can compel, you know, an airline, for example, And they they cover all transportation, but typically, you know, you hear about them when there's a plane crash. Right? They go and get the black box. They go fight figure out why that, you know, airplane door blew out of that airplane, you know, and everything. But they can also they basically have legal powers to compel, you know, in a lot of cases, like, Boeing, right, for, to get information.

Katie Moussouris:

And then they do a very lengthy report, and in the report, they detail what went wrong and what, you know, the whole industry needs to do to fix it.

Bryan Cantrill:

They kind of begged the FAA to adopt the changes that they I mean, because I know there there was definitely frustration with the NTSB that, like, the FAA kinda needs a crash before they will adopt some of these regulatory changes. But they but they work closely with these regulators, and they got the power of subpoena.

Katie Moussouris:

Right. And so the CSRB doesn't have any of those things nor should it with its current makeup. Right? It wouldn't be fair if Boeing had a seat on the NTSB, right, and could subpoena things out of Airbus. Similarly, you know, we we actually experienced this in the last review where I was still technically part of the CSRB, but I was recused off of the Microsoft review because I worked for Microsoft for so many years.

Katie Moussouris:

Right? Yeah. Right. And so and the competitors of Microsoft, you know, Heather Atkins, who's been working at Google forever, she was recused because she was working at a competitor of Microsoft. So, basically, the CSRB as it stands right now is a useful experiment.

Katie Moussouris:

I think that the three reports that have come out have been very useful, and those 3 were Log 4 j, the Lapsus attacks, and similar attacks, and the Microsoft report. Those were the 3 in in that order that came out, and I worked on the first two. And those reports have been useful because in a lot of ways, even though they are restating a lot of, you know, well known best practices, it's all in one place. It's tied to a case study, and it's something that organizations can then look at and say, well, you know, all these changes that I wanted to push out, now I have this handy report, where it says that, you know, all of these things, were done poorly and that's part of what led to it. Some of these things were done well and we should do more of those things, and then in the future we to do these other things.

Katie Moussouris:

And the other thing the CSRB reports do is they advise federal government agencies. So in the LAPSUS one, there was a section on the telecom was doing, you know, was doing some SIM swapping. So, anyway, was doing, you know, was doing some SIM swapping. So, anyway, point is, this absolutely belongs in a CSRB report, this this exact thing. Oh, and just incidentally, in the executive order that established CSRB, they actually were ordering us to go and do an examination of the SolarWinds, attack.

Katie Moussouris:

Oh, interesting. Yeah. And and they, you know, the the, you know, the the leadership in CISA who oversees CSRB, decided to skip over SolarWinds and go to log 4 j instead. And so we'll never see the SolarWinds report because it's just it's too much time has passed, and it's not gonna happen. But, but that was essentially why it was established was this was initially to look at SolarWinds.

Bryan Cantrill:

SolarWinds. Okay. So it does have an incident focus. And we we are gonna look at it. Okay.

Bryan Cantrill:

Interesting. And so the idea is we're gonna look at an incident, and, yes, the incident is gonna have different representatives, including representatives from industry. But the objective is to, I assume, us all to learn collectively from an incident and improve the state of the industry based on the incident. Is that a fair synopsis?

Katie Moussouris:

Yes. And it's also supposed to instruct the government in what the government can do differently, whether it's regulators or, you know, something else. And if you notice, you know, the Microsoft report spawned congressional inquiries of Microsoft. Right? So, I think it I think well, we can definitely expect, hopefully, a more well rested George to show up and and have to do some congressional testimony about this.

Katie Moussouris:

But if you look at it also, you know, SolarWinds SolarWinds if we look at SolarWinds, that was interesting because another cybersecurity company was the, were the first ones to notice the attack, and that was Mandiant. Right? Remember? It was Mandiant that came forward and said, hey. We noticed that we've been attacked, and we figured out it was through SolarWinds.

Katie Moussouris:

What's interesting was, you know, they came forward. Congress loved them for it, you know, and all of that stuff, and we're super grateful. But they were a cybersecurity company that was under persistent attack, and part of their job is to prevent and detect, you know, advanced attackers that that, come in through any means. Right? And from when it happened to them to when they finally noticed and were able to figure out what had happened was 4 months.

Katie Moussouris:

So that was a 4 month persistence inside the supply chain of of, you know, Mandiant via SolarWinds, as well as, you know, presumably every other company that was that had SolarWinds installed, and was affected by by the adversary. So in that instance, it's very funny because, you know, Congress loved Mandiant for coming forward. Nobody really questioned the fact that that company was, you know, unable to detect, this particular attacker, for 4 months. For 4 months. Right.

Katie Moussouris:

Wonder how, you know, how charitably congress is going to act towards CrowdStrike who ostensibly, you know, they did in terms of response. They did the best they could in a very short amount of time. They worked with their biggest partner in this, which was Microsoft. And, you know, they did they did come out with, actionable guidance. I mean, the the thing is, this is gonna happen to every company to some extent, probably not with as much carnage as what occurred, but, like, in terms of responding to a security problem you caused, that is like an almost guaranteed thing.

Katie Moussouris:

If you ship code, it's gonna happen to you. And if we look at their response and, you know, what they were able to pull together in a short amount of time, was pretty good. Right? You know, all all told, it it wasn't perfect, and I'm sure they have room to improve. But, but I think it was, you know, it was reasonable given the scope and scale of the problem, and they they had a reasonable response.

Bryan Cantrill:

Yeah. And and so in well, one question about that, first of all, is the, and I guess is the would this be a question that the CSRB would take on? Because certainly a question that I've got is like, okay, could if this had been a state actor, I mean, this is clearly a great vector for a state actor. And now I've got all sorts of questions, CrowdStrike, about the way you, prosecute your own internal security because we really need to know how this thing was shipped out because this could have been I mean, great. This was like a zeroed payload that that I mean, it's my understanding.

Bryan Cantrill:

Correct me if I if I because I think the technical details are still and, clearly, I made the mistake of reading that thing labeled technical details and create a tweet or whatever. The, because it it I mean, it's it feels like it's still a bit vague, but it's clear. Well, in particular, maybe you can answer this question. These channel files that are the configuration. I Because we what we have in the quote unquote technical details is the configuration update triggered a logic error that resulted in an operating system crash.

Bryan Cantrill:

It's like, yeah, we're gonna need a lot more details on that. But are do these channel files contain, intermediate representation of executable operation program text effectively,

Katie Moussouris:

explanation, and it might be a number of different things. You know? I'm seeing new analysis emerge in, you know, even this afternoon where, you know, there's an argument going on between some, you know, weird idiot on Twitter, who had a very convincing, you know, and highly viewed thread.

Bryan Cantrill:

Oh. Now

Katie Moussouris:

Did you see

Bryan Cantrill:

this thing Adam?

Katie Moussouris:

And then Tavis Ormonde was like, actually, this guy's an idiot, you know, and

Bryan Cantrill:

Oh my god. Here's the funny thing.

Adam Leventhal:

But then Are you asking me if I saw the idiot making a threat on Twitter? Maybe is my answer.

Katie Moussouris:

Yeah. The right. You know,

Bryan Cantrill:

I I would have to go over to

Katie Moussouris:

the other side. Is, there are still there are still, then this is the this is the main failure of, you know, I would say of CrowdStrike's response is that they haven't gone into gritty enough details soon enough to debunk, you know, the quasi technical analysis that's going on that then Tavis Ormandy has to go and, like, strike down himself. Right? But I absolutely think that, it could it could be a number of things. And the most likely, so to your question is, was there, you know, essentially executable code in these channel updates?

Katie Moussouris:

Possibly. It could also just be, you know, a parsing problem in the in the engine that in that ingests that those channel files. It could be a little bit of both. But what seemed to be, you know, one of the biggest questions that people have was, how did this get out at all? No matter what caused it, technically, how did it get out at all without proper testing?

Katie Moussouris:

And were they testing on all, you know, most likely deployed versions of Windows? You know, was it this, like, intersection of, you know, they weren't testing on the right versions of Windows? Did Microsoft change something on the back end? Because Microsoft certainly pushes these little micro patches as well. Right?

Katie Moussouris:

You know, and changes things, shuffles the deck a little bit out there. But what it seems like is it could be a combination of technical errors and the testing part of it. I mean, we've all been there. Right? We've all been there in our developer days that said, works fine on my machine, and then you push it out.

Katie Moussouris:

Right? Everybody's done this or worked it worked fine in test and and push it to prod, and something is different. Right? And I think I think that there's there are a lot of people okay. So somebody just posted this Dave's Garage guy.

Katie Moussouris:

I would be cautious about this particular fella because apparently he was involved in some, lawsuits and scams, of, like, secure online. Like, those weird pop ups that say that your computer is, very insecure, and it would sell you something. His his little micro company did a lot of those. So, anyway, I've I just think Oh, man.

Bryan Cantrill:

I didn't milkshake Doug on that one. So I I I just listened to that one today and thought I was like, okay.

Katie Moussouris:

I just screwed up. I know. He's he sounds he sounds very convincing too, doesn't he? No. I mean, the thing is he did really work at Microsoft at some point, but, and that you know, not taking away from that.

Katie Moussouris:

But it is, he definitely made you know, he makes some mistakes, technical mistakes, but he sells them with such great confidence. You know? He's

Bryan Cantrill:

I mean, the and that video honestly doesn't say very much. It spends most of the time explaining, like, what kernel mode is, and that's all fine. And then it kinda has this hypothesis about it doesn't in in particular say that that the actual payload itself is all zeros and, may it you know, then the the CrowdStrike thing has got this very strange thing. It's like it's not related to it being all zeros. It's like

Katie Moussouris:

Right. No. No. And and I don't think it is. I think so there's a couple things going on.

Katie Moussouris:

And the reason analysis differs online is because I think it's in the class of what I like to call Heisenbugs. I think it's one of those things that, you know, it changes a little bit upon observation. But, honestly, it could be, it could just be a pointer to uninitialized memory, in which case it's gonna vary. What you see when you do the analysis is going to vary. Yeah.

Adam Leventhal:

Of course.

Katie Moussouris:

And that that would also explain why it may have passed certain tests. And then the test of the real world, it varied, and it varied in a way that was catastrophic. So, Katie,

Bryan Cantrill:

one question on the rollout.

Adam Leventhal:

Like, you

Bryan Cantrill:

you talked about, you know, maybe

Adam Leventhal:

like how did this get out? One of the questions I had for you is, how did it get out everywhere seemingly simultaneously?

Bryan Cantrill:

All at once.

Adam Leventhal:

Right? Like, I guess part of my question was, you know, because the the very cautious developer I am, you know, I try to, you know, dip my toe in the water as little as possible and see how that went. Apparently, that's not the philosophy deployed here.

Katie Moussouris:

Yeah. So a couple things. Right? One, these things operate in trying to protect you from real time threats. So the model wasn't, you know, roll out cautiously and let part of your customer base be vulnerable while the other part, you know, stays flapping in the wind.

Katie Moussouris:

The model was trying to protect everybody all at once. So it was working as designed in that sense. And, similarly, IT departments can't really handle, you know, even scheduled patches as much as we would love for them to test every single patch. You know, a lot of IT houses, honestly, they will only test and deploy critical level severity patches in anything close to real time, and everything else gets postponed to a monthly or even a quarterly update rollout. And that is just for practical uptime reasons.

Katie Moussouris:

So these content updates were always viewed as safe, but they also you know, there wasn't really a deployment mechanism or, you know, either on the, you know, on the CrowdStrike side of pushing them out or on the customer receiving side of rolling them out in their own organizations that would have allowed for, you know, a staggered rollout or a ring, you know, deployment where you're you're you would have at least gotten warning. But I think that's one of the things that the CSRB examination, going back to that, that would be one of the main questions that I'd be asking if I were still on the board. Right? Is, you know, what mechanisms do you have to do a staged rollout? And, you know, can you enhance those, please, for next time?

Katie Moussouris:

Because we never wanna see that. We understand that there's a trade off. Right?

Bryan Cantrill:

Well, yeah, I understand. There's tension. Coverage. Yeah. And this is a very kind of unusual attention.

Bryan Cantrill:

It's Adam, it's honestly, it's one that you and I are not totally accustomed to of, like, we have not there's a new threat that we have determined that's out there that is beginning to we're beginning to see in the wild. And this update that we're gonna push is is gonna is gonna address this new threat. So we do wanna move with with urgency, but maybe a tad more rigor on the on the on the next time it it gets rolled out. And I also think that, like, there's this kind of question and maybe this is again, maybe you could get milkshake ducked here, so maybe I've got incorrect information. But the you've got this issue of, like, what Microsoft certifies as in an internal component versus this kind of well, the a data file that's gonna be consumed by this thing and is now changing its behavior based on that data file.

Bryan Cantrill:

So it's, like, is the thing that you certified the thing that ultimately and and now I'm rolling that out on a much more aggressive cadence. And should we be bringing that in from the twilight and being, like, at least allow should Microsoft be in the loop on this? Because I think CrowdStrike CrowdStrike's answer prior to Thursday night would probably be like, hell no. It's gonna be too slow, and Microsoft's like, yeah, we're too slow. But maybe it's like, actually, maybe we kinda need to meet in the middle on some of this stuff and and to to prevent and, again, this to your CSRP point, Katie.

Bryan Cantrill:

This is the kind of thing that the CSRP can really take apart, hopefully.

Katie Moussouris:

Well, I mean, I think the issue is that, you know, kernel mode drivers are put through a lot of screening before they are given, like, the Microsoft blessing to even be there. You know? It was the fact that these channel files, they can't necessarily because they they, you know, change pretty much daily and even sometimes several times a day. There aren't enough resources on the Microsoft end necessarily to suddenly start to screen all of those. But, you know, again, that might be a finding and that might be, you know, just cookies.

Katie Moussouris:

Like, make it happen. Put them through as much scrutiny as you would a regular software update because, clearly, it is making, you know, material changes to the way things work. Yeah. I think that, honestly, it's like the trade off between manual testing manually testing everything in large enterprises and enabling more of an automated update in the background. The trade off is really in, you know, the the business use case.

Katie Moussouris:

Right? And, unfortunately, like, you know, what we saw in this outage was there were enough critical components of our society that had that perfect mix of running the, you know, Windows and CrowdStrike that even though it was a relatively small number of hosts, like, given the, you know, entire percentage, they were at business critical intersections. And, like, I think, you know, if you think about rolling these out in stages, that just means, you know, more rolling downtime in unpredictable ways. It could just mean that, you know, it's not all of these systems coming to grinding to a halt at the same time, and that's the trade off we're willing to accept. But, you know, I think way back in the conversation you were asking, you know, could threat actors just take take advantage of the fact that, you know, all these 3rd party AVs are are sitting pretty in in kernel mode.

Katie Moussouris:

And, like, yeah. Sure. Obviously. You know? The every piece of software on your operating system is a potential vector for threat actors.

Katie Moussouris:

And the more ubiquitous that software is, you know, the greater the threat to the masses and the more mission critical that software or the more, you know, mission critical machines that software sits on, the more attractive it is to the threat actors as well. Right?

Bryan Cantrill:

And to more sophisticated threat actors. Right? I think this was something that that in terms of, you know and sometimes people worry about state actors where it's like, hey, you don't we need to worry about state actors here. Sorry. But there are other I it feels like as you get to these super high leverage points, it's like, yeah, you kinda do need to be worried about state actors if you are CrowdStrike.

Bryan Cantrill:

And then you've got like a lot of questions about how do you do, you know, background checks on employees and then be that security has to go, like Oh,

Katie Moussouris:

yeah. So

Bryan Cantrill:

How do you sign can

Katie Moussouris:

I ask

Bryan Cantrill:

you a question? Do you think these updates are signed? I mean, they got it right?

Katie Moussouris:

Well, so not so not the content updates, but the kernel drivers were signed, you know.

Bryan Cantrill:

I know the drivers are signed, but content updates

Katie Moussouris:

have to

Bryan Cantrill:

be signed.

Katie Moussouris:

Well, the content updates aren't count you know, they don't count. No. They they're signed you know, presumably, they're signed by by CrowdStrike, but they weren't signed by Microsoft is my is my Right.

Bryan Cantrill:

No. No. They weren't signed by Microsoft. Yeah. Oh my gosh.

Bryan Cantrill:

Signed by CrowdStrike. Right?

Katie Moussouris:

Yeah. But yeah. But, well, if they're not, we're gonna find out. Right? Yeah.

Katie Moussouris:

But look, this I think it's like,

Bryan Cantrill:

we don't know. I mean, this is where and I you you've been such a tireless advocate for for transparency in your career. You must surely be like, we need transparency. The world needs transparency into what into what how the system works because you've got a vector into kinda like humanity CrowdStrike. And, like, we really need to understand, is this signed?

Bryan Cantrill:

If it's signed, one of the things that I definitely appreciated. And, Adam, I look forward to doing a future oxide and friends on this because I think it's so fascinating about how do you do your own key generation, the the end, the kind of the root of your own root of trust. And where is that stored? Is that stored in the CEO's drawer? Because, like, that's not good enough, you know, and, like, the all of the procedures around that.

Bryan Cantrill:

And is that documented? And is there a signing ceremony that is documented that we can, you know, all of these things that we that we get kind of in the public Internet, but not I I mean, I don't know, Kate. Katie, are the are the am I am I just dreaming that we can get all this stuff?

Katie Moussouris:

Well, first of all, you know, please don't call me Shirley. You you said that like a while ago. I don't know if anybody remembers. No.

Bryan Cantrill:

Oh, but you for making an airplane reference. I you know, that thank god. You know, we I just I just wanted it's airplane folks. If you've never watched airplane, you're gonna you you're you're gonna go get watch a winner tonight. Go watch Airplane.

Katie Moussouris:

Thank you very much. That was the, but the thing of, like, you know, key signing so I'm old enough to remember physical, like, key signing parties. Do you remember this? Like, are you old enough and or nerdy enough to have been to a in real life For

Adam Leventhal:

your, like, a PGP. Yeah. Yeah. Like, all your friends for your PGP kids. Yeah.

Adam Leventhal:

Absolutely.

Katie Moussouris:

Exactly. You know, there there is a practicality around this that, you know, it is it is in a lot of ways, you know, a chicken and egg problem and a practicality problem. But I would say that, you know, it really it it by the time an organization realizes that they're important enough to need every single safeguard, you know, key signing party. And, yeah, that looks about right. That looks exactly like they look like.

Katie Moussouris:

Yes. You

Bryan Cantrill:

know, I I you know, those people have all started cybersecurity companies now. I mean, I think it's like, where are they now? Those that the, yeah. So, but

Katie Moussouris:

by the time I'm kinda an organization realizes that they're, you know, that they're so mission critical in, in this that they need every bit of best practice no matter how much of a of a pain it is, you know, and all of these things, it's too late. It's they found out the hard way. They found out because, you know, they missed a step. They missed something, that was tedious and they had, you know, earlier decided not to do it and everything. And, so, honestly, you know, I don't I don't know that we're gonna get to, you know, this magical place where everybody's doing the right things, all at once or at at some, you know, point in time where all the software we rely on is doing all the right things.

Katie Moussouris:

Like, I don't I don't think we're gonna get there because it does keep changing.

Bryan Cantrill:

It keeps it and I think we got such poor visibility into it because when we did our when we did this ourselves and did our own private key generation, and we did our signing ceremony inside of Oxide where I learned a lot. I I did not know the first thing about a signing ceremony, and kind of setting up that all that process. And I mean, it it's wild. But we still have not publicly talked about it. Like, I'm really looking forward to doing it, but we need to talk about it carefully because there are a bunch of details in there that kind of a state actor.

Bryan Cantrill:

You know, we how much of that do we want to reveal those state actor and transparency and security do end up in a kind of tension. And I think that you're able to kind of appeal to like, no, no. I can't tell you those details because then the attacker could use those details. And I mean, how do we kinda get through that mindset? Like, no.

Bryan Cantrill:

No. Like, I I know. Yes. I know you don't wanna talk about the vulnerability because you don't wanna talk about the vulnerability. But, like, we talking about the vulnerability will make your software more secure.

Bryan Cantrill:

It feels like it's the there's a similar analog here in terms of these internal security processes.

Katie Moussouris:

There is, but, you know, so I think you all know a couple years ago, I I had a little accident and I stumbled across a vulnerability in, what was it? It was that audio yeah. It was Clubhouse. Yeah. And that I didn't mean to find that.

Katie Moussouris:

I just was messing around and, you know, you can't can't help yourself sometimes. But long story short was I found a way to persist in an audio chat room, with no avatar present. So I could speak, and if I had ever been given the microphone in that room, I could speak like a ghost, and I could, you know, listen in without being seen. Right?

Bryan Cantrill:

Right. Which is a serious issue.

Adam Leventhal:

I don't

Bryan Cantrill:

get to make light of it, but it is also I just love it. It it does offer some real comic possibilities. The voice of

Adam Leventhal:

God coming from nowhere. Right?

Katie Moussouris:

Exactly. Exactly. Yeah. No. I think when I demoed it for, my friend, Lily Hay Newman, who wrote it about it in Wired, I can I swear on this podcast?

Katie Moussouris:

Is that a is that a laugh?

Bryan Cantrill:

The I'm not sure. Yeah.

Katie Moussouris:

No. I swear. I was like

Bryan Cantrill:

with his pride and swearing. I'll encourage.

Katie Moussouris:

So, yeah, so so I disappeared out of the room, and she was, you know, in the room. And I said, see? I'm a fucking ghost. And she goes, can I use that quote in the article? And I said, yes.

Katie Moussouris:

Yes. People know, I swear. It's I'm old enough. And so but, yeah, it was it was pretty funny. But the thing was at the time, if you remember, they had an insane valuation that since, you know, has deflated.

Katie Moussouris:

But at the time, they had, like, a valuation of, like, $4,000,000,000, and they were about to get, oh, actually, you know what? If you click on that article that just got dropped in the chat for the clubhouse vulnerability, you will see a picture of the late great Scappy the cat. He totally helped me with that.

Bryan Cantrill:

Me. He's here.

Katie Moussouris:

He helped me. Yeah. He's he will be here forever. Yeah. He was 18 years old when he decided that it was time to go over the rainbow bridge, so he he had a good run.

Katie Moussouris:

But, anyway yeah. So the funny thing was they had an insane valuation. They had a $100,000,000 in the bank. And when I finally got through to a person trying to do the nice disclosure thing that, you know, I'm I am somewhat known for, and I was like, well, I better do it. I better I better not drop this snow day.

Katie Moussouris:

I better, like, try and warn them. So when I tried to do that, you know, it was very frustrating. I had to threaten to drop the drop the O Day because I couldn't get anybody. And when I finally got somebody, it was one of the cofounders. It was the CTO, gets on a Zoom with me, and I was like, something is wrong here.

Katie Moussouris:

Why is a $100,000,000,000 in the bank super popular company with 10,000,000 users? Why is the CTO getting on the phone with me? It cannot be because I was like, by the way, I'm the coauthor of the ISO standard that I'm trying to follow right now, and you're making it really difficult. It can't be because he's a fan of international standards. Like, this is not this makes no sense.

Katie Moussouris:

So, turned out they had 5 employees at the time. And I was so what I'm getting to here is the proportionality of their responsibility did not equal their investment in security.

Bryan Cantrill:

That's Yeah.

Katie Moussouris:

The that's the punchline that I'm getting to in this long winded story. But it is where we keep finding where suddenly something gets popular or, you know, is that tiny little component in the xkcd comic that's holding up the Internet and nobody knew heard about it before it horrifically collapsed. And that's when you see, like, oh, it should have had all of these safeguards. It should have had all these resources invested. It should have been doing all these best practices, and it wasn't.

Katie Moussouris:

Right? Right. And that's where I think, you know, we're just gonna yeah. Thank you. Somebody somebody dropped the dependency xkcd comic in there.

Katie Moussouris:

But, even with things that we know that we depend on, Like, do we depend on open SSH? You know? Guys, gals, and non binary pals, yes. We do. Yes.

Katie Moussouris:

We do. And yet there are new vulnerabilities found in open SSH, and some of them are new old vulnerabilities. Right? Like that recent one was it was a regression. Right?

Katie Moussouris:

That was that was a regression of an old bug. It was like a marked out 18 year old, almost as old as Scappy the cat bug, you know, in open SSH. So the fact of the matter is we're not we're just we're not collectively, as a society, up to the task of securing what we have built and what we continue to build. And that's

Bryan Cantrill:

that's the

Katie Moussouris:

that's the big problem.

Bryan Cantrill:

Which I think is but I think it's why it's really important that when you do have and first of all, you should know, Adam, I'm not sure if you're aware of this, but that that this clubhouse issue that Katie found was it's kind of at, like, the the the wellspring of oxide and friends because one of the first Twitter spaces that I listened into was Katie talking about this, and it was mesmerizing. And I was like, oh, but these Twitter spaces, and this is when I Katie shortly thereafter, I I convinced Adam to join join me on a Twitter space over the last

Adam Leventhal:

origin story. The origin story

Bryan Cantrill:

I never knew. No. No. We we we are at the headwaters of Oxide Friends. Is it actually starts with Clubhouse, and this this vulnerability that Katie found.

Bryan Cantrill:

I And when I was first placed to discuss it, and the the thing the problem there is the kind of the reaction to that. And I think that it's like, yes. You we will kinda succeed beyond our initial ambition. You only have 5 people and you got you know, you're a $100 in the bank, whatever. But when Katie comes calling, like, that's an opportunity to be like, okay.

Bryan Cantrill:

Now I really need to become I need to grow up in this capacity really quickly, and I've got an opportunity where this wasn't serious, but it could have been much more serious. Or in the in I mean, your case, like, it was actually pretty serious what they found. What you found, the, because one question I definitely have for you is if you look at the Hacker News discussion, especially around the the CrowdStrike issue, there it feels like there've been a lot of near misses for CrowdStrike. It feels like they have and that to me is actually thing I that I find most troubling is is did you have other incidents where you had these kind of oh, shit moments, and we never knew about them? It was like only only one customer knew about it, and you kind of told them, you know, it's like, well, I can give you your money back, but that's it.

Bryan Cantrill:

And the you know, is CrowdStrike learning from their own failures? Because that's the concern that I have is that you actually had a bunch in your misses. They gave you the opportunity to learn. And because it's like, it's a downer, you know, it's like not good news. And it's pretty natural people to not wanna hear the bad news and and kinda drive on.

Bryan Cantrill:

But then you you leave this threat of a much larger, not a near miss, but something that actually does devastating, has devastating effect. I mean, is that first of all, did you are you getting that kind of same, I mean, is that a a wildly inappropriate read? I mean, it feels like there there have been issues in the past with,

Katie Moussouris:

there definitely have there definitely have been issues in the past, and, you know, certainly, there are probably some former and even current CrowdStrike employees who are smugly going, I told you so. I told you we needed to do this, like, one thing that I, you know, that I'd advocated, this new test harness that I had advocated for or whatever it is. But, I think that every organization goes through multiple of those. Like, you know, the Environmental Protection Agency that could or could not be disbanded depending on November, in the next phase. So that agency came into existence.

Katie Moussouris:

A lot of people cite the Cuyahoga River caught fire. Right? And that's what made the EPA a thing. The Cuyahoga River had caught fire dozens of times, and it was basically the last time that somebody was like, hey, come on. You know?

Katie Moussouris:

How many more times does this river have to catch fire? And so I think to your point, yeah, there can be early warnings. There can be many warnings. There can be near misses. There can be hits of a smaller, you know, caliber of bullet, you know, bullseyeing this problem, but it takes a critical mass of these issues.

Katie Moussouris:

I definitely think they hit theirs at this point. I think that's fair.

Bryan Cantrill:

I think they hit anyone's

Katie Moussouris:

threshold of critical mass of, like, wake up call. Right? But, yeah, you know, and so I think that it is it's something where it's accumulative it's an accumulation of pain, customer pain, scrutiny, you know, etcetera, that that ends up growing the scar tissue that that makes them more resilient in the future. But, I mean, if you look at it, it's not like with all the investment that Microsoft has put into things and Google has put into things in Apple, that they are bug free themselves. Right?

Katie Moussouris:

They are not. Right. So it's like you can keep investing, but the, you know, the the the well of mistakes is seemingly bottomless. Oh,

Bryan Cantrill:

and that's one of the frustrating things. It's like, yeah, you're never going to achieve perfection here, unfortunately, as much as probably as we might. But that's not like, that really can't be an excuse for not learning from past failures. We really and this is where, like, the CSRP thing I'm so excited about. And maybe unnaturally so or rationally so because it just feels like that is a real opportunity.

Bryan Cantrill:

And then I know I don't wanna go to the NTSB analogy, but I do feel that the those I mean, the findings of fact from the NTSB has done so much to to improve aviation safety to the point where it's like it's a non issue in the US. It's amazing to me. When we were kids, there were major airliner crashes every single year. In many years, there were multiple. And these are, like, hundreds of people dying incidents.

Bryan Cantrill:

And people not even in an airliner crashes, it's I mean, it's always tragic, but it's tragic because you're, like, getting people who are traveling from one spot to another. Like, they're in, like, they're living their lives. Right? So it's a lot of kids and people in the prime minister lives. So anyway, the the we all that tragedy is we don't have that anymore because of the we we decided that we are gonna learn from every single accident.

Bryan Cantrill:

We're gonna have a cockpit voice recorder. We're gonna have a flight data recorder, and we're gonna learn from every single one of these things. I'm making it sound more deliberate than I think it actually was, but we did learn from each every one of these. Right? And I feel like, yeah, there's an opportunity here.

Bryan Cantrill:

I wanna learn so much about the because I'm sure and I'm I from a CrowdStrike perspective, I'm sure there's reticence because I'm sure they know that, like, oh, man, there's a lot here that we need to improve. We are not where we wanna be. I can see and and for oxide, I can speak for oxide. Like, there's so many aspects where we're like, we know we need to be better, and we are not where we wanna go. We're we're not where we wanna be.

Bryan Cantrill:

And it's not exactly fun to talk about that stuff, but it's really important to, like, get to where do you wanna be. You gotta get off the

Katie Moussouris:

Yeah. Off the time here. So I'm looking at the the listening to you, and I'm looking at the chat, and some people are saying, you know, that, no, this is, like, this is inexcusable and that that, you know, really, like, the testing should have been up to par. I think the complexity level of interoperability is what a lot of people are missing, and that goes back to, you know, my statement that this may very well have been a Heisenberg, and that's how it got passed. All the testing that CrowdStrike certainly does, I just you know, I wanna believe.

Katie Moussouris:

Help me believe. Right? They did some testing, But, but it's the proper authority that, oh, and Rust has entered the chat. Okay. Thank you.

Bryan Cantrill:

Oh, we got Rust. Oh, we I know. We got I oh, god.

Adam Leventhal:

Well, Katie missed EBF earlier. As you're saying, Katie, it's hard to imagine a thing of this complexity and criticality, it being the case that we were just one decision away from avoiding this. That, like, oh, if it was just in Rust, we would have never had a problem like this. Or if we had used we would have never had a problem like this. Or if we had paid our agile consultants an enough money, we would have never hit this.

Adam Leventhal:

Just seems unlikely that that it's one thing.

Bryan Cantrill:

I will also is now the opportunity to say that it is indecent during someone else's outage to volunteer that you're a thing? I mean, it's like, just have and, like, look. I I I'm I'm forgiving of this. You know? I I'm surely have done this in my life.

Bryan Cantrill:

Adam, I definitely remember doing this early on before we'd written a line of DTrace. Like, one of the, one of our coworkers had just spent all day debugging a problem that I was certain the detrace that was in my head that I had not written a line of would solve.

Adam Leventhal:

So it was still like 4 years away or something like that.

Bryan Cantrill:

Oh, yeah. No. It was a and it it was actually, it was less than 4 years away because this is, like, this is 2,001. This be, like, early 2,001 because I so cheerfully volunteered. I'm like, you know, Tim's Tim Mars on.

Bryan Cantrill:

I'm like, you know, Tim. I he's he's like, you know, cursing. He's already angry. Like, what am I doing? Like, I have a prefrontal lobe, kid.

Bryan Cantrill:

Like, can you read the room? He's already angry, and I'm like, hey. We're going out to dinner Monday night in Australia. And I'm like, you know, Tim, Detroit would have solved that problem. With Detroit, he would have solved that problem.

Bryan Cantrill:

And he just looked at me, and, like, was spinning effectively on me. He's like, dtrace, what's your problem? Oh, right? Dtrace. And I'm like, okay.

Bryan Cantrill:

I'm like, message received. I think I will go. I need trace. I'm sorry. And I just I did and I was like, we do need I need to stop, like, talking about what this thing is gonna go do and go do it.

Bryan Cantrill:

And I do feel that, like, when someone else is having an outage, like, they're having a bad day, and it's not the time to volunteer cheerfully that your thing would have solved the problem. It's like it's just not helpful.

Adam Leventhal:

Couldn't you just or just like words that are not helpful in that situation.

Bryan Cantrill:

I know. That's right. Yes. Yeah. I have been told, I I in my marriage, I've been told that I'm also not to use those words.

Bryan Cantrill:

Actually, that's that's another the the you know, the the It's

Adam Leventhal:

a rare piece of crossover advice. Yeah.

Bryan Cantrill:

Crossover. You know, listen, we're, we're marriage counselors around here too. We go help you with your help you with your long term relationships. But I think it's like, don't do that. And I think that the if you are on the the out if if you are having an outage, don't minimize it.

Bryan Cantrill:

If someone else is having an outage, don't volunteer that you wouldn't have it or that if they should have made different decisions a long time ago. It's just not helpful. And I feel we saw a lot of that. I mean, Katie, you must have seen a ton of that from the I mean, and it wasn't just rust. It was a

Katie Moussouris:

lot of I mean okay. So in in enthusiastic acceptance of Rust being really good at a at a number of of things, I will say that I found it baffling when some of the guidance coming out of CISA on, like, how do we solve this whole, you know, vulnerable software problem was they were, like, use memory safe languages like Rust. And I was like, yeah. But yeah. But hold on.

Katie Moussouris:

There's a whole bunch of stuff out there that is not gonna get, you know, basically refactored in Rust anytime soon, and that is the world we currently live in. So I will say, yes. Yes. Enthusiastic support of Rust, but also the world we live in, I I can, you know, I can say without without even circumscribing my lifetime that I'm not gonna see memory safety be the prevalent, you know, language that is running the world software in my lifetime just because of, how big of a cost it is to reimplement technology we already use. And, so building new things, absolutely.

Katie Moussouris:

But I think the the the reliance on on so much old software is is not to be underestimated. And, yeah. I mean, I think we're we're we're in this for forever.

Bryan Cantrill:

Well, we are in this forever, and it's like you also have I mean, this is kind of another aspect of this that I find interesting is that you have these systems that are embedded systems in the way they're deployed, but not embedded systems in the way they're thought of. Right? So in other words, CrowdStrike does not think of themselves as deploying to lots of devices necessarily. They think of them as deploying to, like, window systems. But these window systems are being used in a way that, like, actually, like, you need to get up on a ladder here in order to be able to like there's no keyboard here.

Bryan Cantrill:

It is an embedded system, but it doesn't have some pullback protection. Just doesn't have a lot of these things. And do we you know, are there things that we can do just accepting the fact that this is going to occur? We are going and this is again, you know, maybe where the CSRP can help us out with some recommendations because maybe a recommendation is, like, hey. These these Windows systems are gonna be used in as embedded systems, and it would be and I know that there are there is things like there are things like rollback protection, but I don't think that I don't get the sense.

Bryan Cantrill:

I mean, clearly, they CrowdStrike has not really thought about rollback protection. The way we at Oxide, for example, in our embedded systems, like, we really like, rollback protection is is very, very important to us because you run the risk of like a wrecked box, like really bricked. And Right.

Katie Moussouris:

You

Bryan Cantrill:

know, do we need to be just honest about ourselves? Like, yeah, CrowdStrike, you're deploying new embedded systems. I know you don't think of it that way, but it means that there are different standards. And I don't

Katie Moussouris:

know what you're I yes. Absolutely, unequivocally, definitely. They do need to be looking at themselves as much more of an embedded embedded system deployment if they hadn't been thinking of themselves that way already. You know? I mean, they might have literally just thought, like, it's worked well enough so far, and, like, what would you think is okay so far?

Katie Moussouris:

You know? And they clearly hit hit their wall on, you know, that at maybe internalized hubris on their capabilities.

Bryan Cantrill:

Also, the the name of our operating system is hubris, by the way, which is really, by impossible to check. People are discussing it on Hacker News. Could you search for Hubris? You get a lot of things that are not operating. So, Katie, another question for you.

Bryan Cantrill:

Because one of the things that I is kinda remarkable to me about your career arc is you've managed to persuade a bunch of big entities to invest in things that are that they don't typically invest in. I mean, investing in security, it's getting people to buy insurance. Just like, I don't know. This just feels like doesn't feel like good things happen if I invest in this. It's just like fewer bad things.

Bryan Cantrill:

It's hard to make a case for it. And I think that there's almost you can let's assume, I think, reasonably that that there's somebody, if not inside of CrowdStrike, inside of other entities, trying to make the case internally for some better practices and so on. How did you have guidance for how to make that case internally? Because you've clearly done it successfully in a couple different stops.

Katie Moussouris:

Oh, I mean, I think this was these were a series of my own personality traits that normally have not served me well, but in this case, served me very well. No. It was it was, you know, like like many of us, I have I tend to hyperfocus sometimes, and, my hyperfocus can last many years, it turns out. And so, and and just kind of, you know, feeling like, no. This is the logically correct thing to do.

Katie Moussouris:

Therefore, I'm going to find out I'm going to figure out a way to make you do it even if it's the last thing that you possibly wanna do. And that was that that's kind of like the story of Microsoft's bug bounties. They had publicly said that they would never pay hackers for bugs, and they were so sure of themselves on that that, like, executives of the company at the time were saying things like, yeah. As long as I work at Microsoft, we're never gonna do this. And I was like, you know, I've been through corporate media training, and they tell us never to say never, so you must be pretty sure.

Katie Moussouris:

But but, no, I think it's, I think it was for me, it was looking at things, you know, as logically as I could and being like, well, you know, this is the only this is the only reasonable and logical thing for this company to do. They just don't see it yet, so I'm just gonna make them. And I'll just you know, it'll take a long time and I'll have to, learn more people skills, which it cracks me up every time someone's like, diversity is so important because, you know, women, they've got those soft skills, those people skills. I'm like, which women? Not me, not I.

Katie Moussouris:

I'm not Where

Adam Leventhal:

are these women?

Katie Moussouris:

For the math. I was like, math is computers are easy, and math is easy. Like, people are hard, you know, is is my mantra. But, no, I think I I think what I really needed to do was I needed to look at the problem from the organization's perspective and disarm some of their primitives and their assumptions around the problem. And once I had figured that out, I was like, you think you're never gonna pay for bugs because you you get all these bugs for free, and you're incorrectly assuming that if you offer money on top of what you're getting for free, that you'll get flooded out and bankrupt.

Katie Moussouris:

I see why you sit there. Why you think that? You know? Oh, that's interesting. I understand.

Katie Moussouris:

Right? But it was true that you're gonna

Bryan Cantrill:

get that understanding. I mean, did you are because I I love this. I mean, this is like I this is the role of curiosity is underplayed in conflict resolution. I think curiosity is so important in terms of understanding someone else's perspective. How did you kinda get past that armor of someone who's like literally said, I will, we will never do this as long as I'm at the company, which is, I mean, that's pretty dug in.

Bryan Cantrill:

How do you kind of like tease that apart and get to like, okay, here's what you're actually afraid of?

Katie Moussouris:

Well, I had actually already gotten them to say yes to something crazy for them, which was what I mentioned earlier in the podcast was Microsoft, vulnerability research. That was actually diverting Microsoft, engineering eyeballs to looking for bugs in third party software. And internally, I had to get them over that hump because and the way I did that was I literally just brought up crash dumps that were caused by third party software. And I said, look at this. Look over here.

Katie Moussouris:

Now look at me. I am the captain now. You know? And I was like, I will I will show you I will show you how the the 3rd party crash drones are hurting your brand. So it makes sense now for you to divert some of your internal security engineering to looking for 3rd party bugs.

Katie Moussouris:

Now you were not gonna boil the ocean. We're actually going to steer those eyeballs based on prevalence of crash dumps. So I did an analysis, and I was like, what are the most likely third party apps on the Windows ecosystem today that are causing crashes? There are bugs, there are be vulns, you know. And so that's how we, aimed the resources that we had to the Microsoft vulnerability research program.

Katie Moussouris:

So I had already figured out how to get them to care about bugs that weren't even theirs. That was step 1. And then, the next step after that was looking at it from a perspective of, okay. They're already getting over a quarter 1000000 free, you know, inquiries via email. Not all of them turned out to be bugs, but over a quarter 1000000 per year of emails from people trying to report bugs for free.

Katie Moussouris:

And why would they want to increase that volume. And I was like, they wouldn't, but wouldn't they wanna direct that volume? Wouldn't they want to direct the real researchers who are actually capable of finding things to the bugs they care about the most? And so I used the idea of traffic shaping, to get, you know, to get Microsoft to be open to the idea of paying for certain vulnerabilities. And that's why the first, bug bounty programs that Microsoft launched just about a decade ago, a little over a decade ago, they were for bugs in IE beta, because if you looked at the traffic of those free bugs that were coming in, they were coming in past the beta period.

Katie Moussouris:

Every researcher wanted their name in a bulletin, so why would they tell you about a bug that only affected the beta version? Because they knew they weren't gonna get their name in a bulletin. So they were inadvertently hoarding the bugs throughout the beta period. Way to console the worst possible time to tell you and everything. And so it was very much like I pointed this out, and I said, look at this big spike of bug reports that we get for free.

Katie Moussouris:

Wouldn't you like to get them at the beginning of the beta period instead? How about we put a bounty there? And they were like, not bad. Not bad.

Adam Leventhal:

A great example of you get what you measure. That just I just love it.

Bryan Cantrill:

Well, the the it it reminds me of the and I don't know if it's a or not, but the COBRA's in in India, right, where they they wanted to reduce the cobras in India, so they put a bounty on on the heads of cobras that are brought to the government. So, of course, people started breeding cobras, It's like, not what you wanna go do. It it No. They do

Katie Moussouris:

you know that there was that old Dilbert cartoon about bug bounties, and it was like, I'm gonna start a bug bounty. And, the the developers were like, I'm going back to my desk, and I'm gonna write me a minivan. You know? And they talked about purchase incentives.

Bryan Cantrill:

Totally.

Katie Moussouris:

Yeah. And that was exactly it. So

Bryan Cantrill:

You know, I gotta say, you can swear and make airplane references. I don't know how I feel about tail book references.

Adam Leventhal:

Yeah. That one's more complicated.

Katie Moussouris:

I know. It's it's it's complicated. But

Bryan Cantrill:

Yeah. Exactly.

Katie Moussouris:

Right after I launched the bug bounty programs, and got back from, the conference where I spoke about them first, On my door, someone had put that comic on my office door, and I was like, who did this? You know? But, no. It was actually it was a mixed level of enthusiasm inside of Microsoft. Some people thought I was crazy.

Katie Moussouris:

I mean, they thought I was crazy for that reason, not for the real reasons. And, you know, some people were like, no. No. This is clearly what we need to do. But I even got a guy, you know, who, when I was trying to socialize this idea early, did one of the classic Microsoft reactions was, I will do my best impersonation of this man.

Katie Moussouris:

He goes, that's the stupidest fucking idea I've ever had in my life. And I was like, oh, nice. I got swears in Irish, bro. You know, everything. I got to swear at by a VP in Irish.

Katie Moussouris:

You know? And, and what was funny was as soon as it launched and it was showing, like, good results and we were getting, you know, bugs early in the beta period, we also were boundying, we were the first vendor to bounty techniques, like exploitation techniques as opposed to just 0 days. Right? And that was something that it was the highest bounty in the world at the time that was ongoing from a vendor, ongoing year round and not tied to, like, a a competition like Pwn2Own or or or the Pwnium competition. But this was a $100,000 for a new exploitation technique.

Katie Moussouris:

And I know those numbers sound tiny now because it's been a decade and, you know, the the bounties have gone up quite a bit. But, we started getting new techniques reported to us. And before, we'd only get those from actually seeing exploitation in the wild. So when these programs started, like, returning real results of what I had said, like, look, if we do this, we're gonna get better bugs earlier, faster, you know, these kinds of things. And it and the the guy who had cussed me out in the Irish brogue was one of the first ones to, email, you know, one of those big reply all emails and was like, oh, I'm so glad to see this program.

Katie Moussouris:

I've always been a big believer in what you're doing. And I'm like, oh my god. You know? Like, I'm like, this guy, this freaking guy. You know?

Katie Moussouris:

But, no, it was very, very yeah. He emailed an Irish. He did. That's great. I could hear

Bryan Cantrill:

I I I just that's great. I think I'm watching a Lucky Charms ad. It's just I love the kind of, you know that's terrific.

Katie Moussouris:

No. It was but no. It was, like, you know what is it? Success has many fathers or something like that?

Bryan Cantrill:

Totally. Totally. Well and I also have to like, it is it is kind of amusing when this happens to your own work where someone swoops in to explain to you that they were an advocate all along, and you have to be like, alright. I just need to take this as praise. This is praise.

Bryan Cantrill:

I need to, like, great. And I'm certainly and I got a I got a couple of VCs that I have in mind when I think of this folks that VCs that may have passed on Octide. I look forward to you explaining how you are always cheering for us from the sidelines.

Katie Moussouris:

Oh, yeah. Yeah. Yeah. Oh, like all the VCs that definitely turned down my company, all of them, 100% of them. But that's okay.

Katie Moussouris:

We're bootstrapped and profitable.

Bryan Cantrill:

Oh, that is even better. God, that is that is delicious.

Katie Moussouris:

Well, that's the thing you have to be when you're bootstrapped in your, you know, 8 year old company. You better be profitable because it's not like I was independently wealthy to start with. You know? So yeah.

Bryan Cantrill:

And I just love the in terms of, like, the way you got the bounties off the ground and because I I I think that one of the challenges we have, you've you've got this in reliability too when you've got a system that's kind of a mess of, like, where do you start? And it feels like, oh my god. It feels like there's so much we need to go do. And you you kind of, like, look. Let's get some early wins.

Bryan Cantrill:

Let's start over here. Let's start in, you know, going through the crash jobs, being data intensive. Like, we're gonna start with, like, these they're clearly bones over here. We're gonna start here. We're gonna get some wins.

Bryan Cantrill:

We're gonna turn my Irish enemy is now gonna become my little leprechaun friend, and we're going to actually become get those winds under the belt, and we are gonna use that to kinda and you've gotta be really persistent on that. And it's just it's hard, but it it is, it because I just I feel like one of the things that I I I feel that there are going to be folks just given the number of near misses we've heard anecdotally. I just feel that there are and as you said, they're probably totaling to a certain degree, but I think that there's value to be had in, you know, learning why some of that was not that we we was not paid more attention to. And this is where I you really hope a CSRP can kinda tease some of this out and potentially get I mean, the term whistleblower is loaded, but, I mean, because I I I think that you wanna and, actually, let me ask this. Like, are you at all concerned?

Bryan Cantrill:

Because I do feel that, like, some of the first reactions to the outage, I felt like were written by a lawyer and not by someone who wishes a mistake, and they clearly Oh,

Katie Moussouris:

your word. Well, put it this way, all of those those things, even the ones released by Microsoft that were better and more technically actionable and more detailed, they all get reviewed by lawyers. It's just a matter of how many of these things have your lawyers seen. And I had some of the best lawyers in cybersecurity when I worked at Microsoft because they had seen things too, and they also understood that protecting the company was actually it would be counterproductive for them to over legalese those releases. And so it's your lawyers have to go through a few of these incidents to learn also.

Katie Moussouris:

And, yes, just saying there's no chance that any of these release statements from any company of a certain size is not gonna pass through the lawyers. It's just a matter of how good are your lawyers at understanding what what really matters and what's going to keep you out of trouble is actually more transparency, not less.

Bryan Cantrill:

That's right. That's right. And I think that that's the thing you've really gotta convey. And I'm, you know, I'm I'm sure that, you know, the executive leadership of CrowdStrike wishes, like, you know, wish we'd probably listened to a lawyer a little bit too much in the early going of this, or or maybe not a lawyer. Maybe I'm unfairly blaming the lawyers, and they were just a little because I think the the criticisms of that kind of the first communication is, like, you're not expressing the gravity of this.

Bryan Cantrill:

And, like this is grave, and you're you're gonna work hard to to rectify it, but you really you've gotta be and then being kind of transparent about what you you're gonna get you've got more to win from that transparency. What what role does does open source, by the way, play in all this? Because I I mean, obviously, you were at Microsoft during a super interesting time where Microsoft goes from being, like, the the literal avowed enemy of open source to, being a and then some 10 to 15 years later being a a really, a very important open source contributor for many, many different projects. So that whole that that whole kind of thinking is shifting inside of Microsoft. So,

Katie Moussouris:

you know, I think that, well, for one thing, I'm a big fan and proponent of open source. Like, that was that was the only development career that I ever had. Brief briefly, when I was a developer, it was for open source. It was for Linux. It was for a Linux distro actually back in the turn of the millennium.

Katie Moussouris:

And I, but what I think is that there's an overstatement of the presumably sec presumable security of open source or the fact that it's it's easier to secure. You know what? Talking to some open source maintainers, a few years back, several years back when the EU was planning to do its first bug bounty program against open source software. One, the EU regulators who had proposed this and gotten the budget for it in the in the EU budget had not let the targets know that they were about to unleash a bug bounty on them. So that was fascinating.

Katie Moussouris:

And I found that out because I reached out to the Adobe core or sorry, the Apache core maintainers. And I was like, hey, guys. Did you know that you are the lucky winners of an EU bug bounty program for open source that is mission critical to EU governments and systems? And they were like, no. Thanks for letting us know.

Katie Moussouris:

We're all for it. But, yeah, it would have been great to know. And I asked them because I wanted to help, you know, ideally structure this so that it did no harm, did did more good than harm. And I said, would it help if we offered a fixed bounty, you know, double the bounty if somebody could contribute a fix? And they said, actually, not really.

Katie Moussouris:

And they they back to our perverse incentive issue, they said at the time and they may have, you know, changed their mind since then. This was many years ago, probably, like, 6 or 7 years ago. But the issue is we have a lot of volunteer maintainers, and the volunteer maintainers don't get paid, hence the name volunteer. And, they they were Interesting. They said, you know, it would be it would leave a pretty bad taste in their mouth and, you know, if they're not getting paid to do this stuff unless they find a bug, right, unless they find a qualifying bug in the bug bounty program.

Katie Moussouris:

So, you know, it was complicated. And I think the real issue with with that, with the perverse incentive that, you know, was kind of unexpected in that conversation was that, getting people to become core maintainers is something that, you know, it's something that they they absolutely need to do a better job of doing. And it's not just having more paid positions, which, you know, the mega corporations, that are big contributors, they are doing more of that. They're paying their own people to work full time, being contributors of these open source projects. But some of the some of the thing that we found in the Log 4J CSRB report, for example, was that it wasn't that corporations weren't giving enough of their own employees and resources to open source projects, it was that they didn't know which ones would become important, like the one that produced the faulty log 4 j code.

Katie Moussouris:

They didn't know that they needed to, supply resources to that particular piece of open source software. So, anyway, it's a long winded answer of I love open source. I was an open source developer for Linux myself. But I think solving the security problem is more than, oh, it's open source, therefore bugs are shallow, you know, etcetera. It's how many people are looking, and are they looking in the right places?

Bryan Cantrill:

Totally. And so I I mean and maybe I'm the last one to hear of the CSRB, but I feel that not enough practitioners have heard of the CSRB and Katie, I'd be interested to know if you kind of, I imagine the circles you're in. It's better known and maybe just my own ignorance, but I, I think this is like a really folks should read. It's, like, I'm not really interested to read the Log 4 j, like, the findings, the recommendations. Also, like, I assume that there's a complete incident report, or is there an incident

Katie Moussouris:

report No. So I can people go to the tell you yeah. I can tell you the CSRB reports are they're not really for you. They are for they're not for you. Womp womp.

Katie Moussouris:

They're for

Bryan Cantrill:

As my daughter tells me.

Katie Moussouris:

They're they're for they're for 2 main audiences. The first being the government agencies that, you know Okay. In the reports, it says, like, here's what this government agency should do, and here's what this other one should do. And the other part is it's for those organizations that have had a hard time justifying things that they were they're asking for, you know, and new practices and everything. So they're really for those those types of organizations.

Katie Moussouris:

And in terms of do they have incident reports, they sort of have a timeline, you know, in some of them of how things unfolded. But the timelines are built not necessarily authoritatively, like there's an authoritative source that they went to and they figured out this timeline. It's through multiple interviews of different organizations who may have responded to it, you know, to the issue, etcetera. And so it's it's like a conglomeration of viewpoints. And I don't know.

Katie Moussouris:

I think for for your listeners, for your audience, the CSRB reports will be you know, if if if your audience reads them with a grain of salt, like, okay. This is for orgs that need to justify, you know, deeper investments in security, then it will make a lot more sense to you. But I do think that for the CSRB to get, you know, become as effective as the NTSB is, going back to, you know, how it was compared to the NTSB in the first place, I think CSRB cannot have private industry members on the board. You have to lose that component because you'll lose, you know, sort of practical expertise hands, you know, unless you can lure them into full time government jobs, which don't really pay as well at all. But you need full time regulators to be on that board, and you need that board to have subpoena power Because we didn't get I mean, for the Log 4 j report, we asked, but not a single organization would step forward and give us any kind of victim, you know, reporting at all.

Katie Moussouris:

So we basically yeah, so we basically had to talk to, you know, Linux Foundation and, like, all all of these, various entities and some organizations that produced fixes for you know, that included the the fixed log 4 j libraries and everything, but absolutely none of them, even though they were obviously victims themselves, would talk about the victim side of of their response and everything. So that's the whole thing is that, CSRB needs to evolve into something with more teeth, and that would mean losing the industry, seats.

Bryan Cantrill:

And so but the I mean, do we do you think that this this incident is an opportunity? If this incident unprecedented in so many different dimensions, maybe an opportunity to really rethink some of this stuff or, I mean, you you

Katie Moussouris:

said I think for this one, it's too soon. They'll they'll just do the usual recusals. I mean, the obvious recusal is former CrowdStrike CTO, Dimitry Olperovic. He would need to be recused. Right?

Katie Moussouris:

But there may be other recusals that are appropriate for other, you know, security software vendor competitors who might might sit on the board. So that I think for this round, that if they do it at all, right, which if they didn't do it, I would be very, very interested to to understand why. But if they did it, if they did a report on this, they would I think they would do go through the usual recruit recusals, because they just they just rolled 3 of us off the board and rolled 3 new members on who haven't done a single report yet. So it would be kind of awkward if they were like, actually, industry members that we just rolled on, you're not you're getting kicked off because we're gonna become a full time regulatory body right now. So, yeah.

Katie Moussouris:

So I would say that they probably won't change their makeup if they do a review of this particular incident. But But you

Bryan Cantrill:

think a review of this incident is I mean, obviously, I certainly, I would I want to understand all of what what happened here in much greater detail, and it feels like it's it it this feels like it's the right body to do this investigation and maybe the fact that it

Katie Moussouris:

Absolutely. Absolutely the right body. Yes. So that's what I was saying. If they don't do it, I'd be like, really, guys?

Katie Moussouris:

Like, what are you Yeah. What are you doing then? You know? It's like, I know I'm not on the board anymore, but, you know, seems really strange. No.

Katie Moussouris:

I mean, they absolutely should do it. It is the, it is the reason why they were created. Right? Like I said, it was an executive order pointing to the SolarWinds attack, a supply chain attack, you know, and all of that stuff. And then instead of doing that, they did Log 4 j, which was was not an attack per se.

Katie Moussouris:

It was an incident. And, of course, yes, attackers took advantage, you know, of the vulnerability once it came out, but it was an incident that was, you know, had global effects and ramifications, and so it met the bar for CSRB review.

Bryan Cantrill:

Well and I think it's attack it's attack adjacent too because it it's it's showing us surface area. And the we absolutely could clearly, you could have an attacker that if it with the attacker use the same vector, could do, a lot of a lot of harm, and actually do a lot of things.

Katie Moussouris:

I mean, honestly, it you know, as a former pen tester, we were always told, like, don't ever test out DOS. Just infer the DOS, you know, and stuff. Put it in the report, but don't you test it. I think this was

Bryan Cantrill:

denial service. It is not

Katie Moussouris:

just been operating system. The old operating system. Yeah. Exactly. So I think this was an incident that obviously caused a massive scale denial of service.

Katie Moussouris:

And, you know, so in that sense, yes, it was a security incident, in terms of if you look at the old CIA triad, right, confidentiality, integrity, and availability, it attacked the a. It attacked the availability of these systems. But it wasn't a cyberattack per se. So, yeah, I mean, I just I I think we should be pretty careful about how we characterize it, especially given the weirdo conspiracy theories that are out there. Oh my gosh, I can't believe we've been talking for like, more than an hour and 40 minutes.

Katie Moussouris:

That's insane. I thought for sure I would need to like leave and go to the bathroom. That's what those silences were. I'm gonna I thought for sure I would need to, like, leave and go to the bathroom. That's what those silences were was just me running and going to the bathroom.

Katie Moussouris:

Audio cut out. I don't know why. She's back.

Bryan Cantrill:

Right. Well, yeah. And we we have and I know that Adam has got a, Adam has informed me to speaking about domestic strife. Adam is gonna need Where are

Adam Leventhal:

we going with this podcast? No. This podcast heals relationships.

Bryan Cantrill:

We heal relationships. Well, we in order to heal relations, we have time to end roughly on time. So That's good. But this is a I mean, I I I think, Katie, first of all, thank you so much. I mean, you thank you for what you've done for the industry.

Bryan Cantrill:

I think that, you know, your tenacity and resilience inside of Microsoft, turn Microsoft into a real leader here. The fact that that now I mean, these I'm so grateful for these large entities that have these research bodies, that research open source software, and find vulnerabilities that is what I mean, with Spectrum Meltdown, and there's so many times where these folks have been that by being out front, they have found things before the actual bad guys did, which was they've been really productive, and so much of that goes back to your lawyer work. And, and then definitely appreciate you being willing to to jump on with with us jokers and, and ruminate on this one because this is this is a really interesting outage. And and you've got a unique perspective on it from your eye, as you say, from the orthodontist office to the airlines, that's amazing that you it must've been amazing to also, like, see that one personally. You're just like, hey.

Bryan Cantrill:

I know what's going on here.

Katie Moussouris:

Yeah. No. I I I suppressed my urge to offer to help them because I was at the orthodontist because I was like, no. No. I wanna get out of here.

Katie Moussouris:

I I want me and my kid to leave right now. So, like, let's not let's not make the kid, you know, eye roll their way into the next decade being like, mom, stop with the hacking stuff.

Bryan Cantrill:

Mom, let's stop getting them out of a boot loop. Just leave them in their boot loop. You do this pop.

Katie Moussouris:

Let's go. Yeah.

Bryan Cantrill:

I'm so mortified. I know.

Katie Moussouris:

Yeah. No. Exact it was gonna be exactly one of those situations. So I, you know, I chose I chose family over over my urges to fix things. Right?

Katie Moussouris:

That's good. But, no, it's funny because my kids do not think I'm cool. They don't think my hair is cool. Like, I think eventually, they'll be like, oh, yeah. Mom was pretty cool.

Katie Moussouris:

She was a hacker with hot pink hair. But now they're just like, can you just pick me up, like, down the block? Can you wait Right. Down far, far away.

Bryan Cantrill:

But please and can you not please don't like that post that I just played. Can you unlike that, please? I'm just like, I'm embarrassed that you like that post. Like, okay.

Katie Moussouris:

Yeah. I know. I'm just they're, you know, they're they're normal kids, and, I'm I'm grateful that I've had a life that they feel like I'm, you know, the usual embarrassing parent. So it's pretty

Bryan Cantrill:

There you go. You know, you've yeah. You know, you've achieved something when Yeah. Certainly, I I achieve that every day with my my daughter, especially. They, just reminds me what of an embarrassment I am.

Bryan Cantrill:

Well, Katie, thank you again. This has been really terrific. We're gonna wanna keep so I really wanna keep an eye on the CSRB and really hope they pick this up, and we'll be interested to watch this as it continues to unfold because we're gonna continue to learn more about this. But thank you again for joining us and for all of your leadership in the industry on behalf of all premiers. Really appreciate it.

Katie Moussouris:

No. Thank you so much for having me. This was really, really fun. And, no. I I super enjoyed it, and I love the chat.

Katie Moussouris:

I love the I love your people who come to the, you know, come to this podcast.

Bryan Cantrill:

Our people.

Katie Moussouris:

So good job, people. Thank you, people.

Bryan Cantrill:

Good job. Good oh, this is a good job, chat. This is what my

Adam Leventhal:

You got the right plan.

Bryan Cantrill:

Yeah. Right. Am I am I addressing the chat properly? That's right. You nailed it.

Bryan Cantrill:

The, this is what my my daughter does now. I'm like, okay. I'm addressing the chat. But great job, chat. Thank you again, Katie.

Bryan Cantrill:

And, Adam, I think we are out next week.

Adam Leventhal:

But we are out week. Yeah.

Bryan Cantrill:

Out next week, but back in 2 weeks. So talk to you, everyone. Thank you again. Thanks thanks again, Katie.

Katie Moussouris:

Okay. Thanks. Bye.

CrowdStrike BSOD Fiasco with Katie Moussouris
Broadcast by