I Know This! (Purpose-built systems with general-purpose guts)
Very good. I've got scheduled phases.
Speaker 2:Hallelujah. It is Hallelujah.
Speaker 1:It is so exciting.
Speaker 3:How does it feel though
Speaker 4:to be, you know, they they do the, you know, kind of tranche to roll out where they pull it out Yes. One percent and then 50%. How does it feel to be the last, like, you know, 1%? Oh. And 99 99% of the world has it before you.
Speaker 1:I gotta feel that this is personalized. I mean, I know it's not personalized, but it's like oh, what I wanna know is, like, who needs this feature more than I do? I'm not saying that I'm not saying there's there isn't anybody who need it, but I feel like we have really, really needed this feature.
Speaker 4:I think that's that's true. Like, we have a basically consistent space that's we skip it from time to time, like last week. And, yeah. And, like, be be able to let people know and schedule it and that kind of stuff.
Speaker 1:But, like, we're we're pretty reliable about doing it. And Yeah. And then I love, like then Twitter sending me a notification to be, like, remember to start your space in 30 minutes. I'm like, Twitter, like, no offense, but, like, I've been I've been the one who's been on the goddamn wall. Like
Speaker 4:Seriously.
Speaker 5:Like like, where
Speaker 4:where have you been?
Speaker 1:Where have you been? How about you remember to give me a feature within a year of when you give it to everybody else? How about that? And then I'll, like, don't don't you worry about MySpace. Like, we we we we we got our space figured out.
Speaker 1:We know how to
Speaker 4:Okay. Just the equivalent of, Twitter sending you a calendar Calendly link Calendly link. Alright.
Speaker 1:Are we gonna go there? Is that where you wanna start? Is that what you wanna do? No. Let's go ahead.
Speaker 1:Let's just do let's get the Calendly link thing out of the way. The and then we'll get on to our our scheduled event here. The what did you make of the Calendly thing? I mean, I know that I had to pick you up.
Speaker 4:I think I think, I I don't I don't think of the as extreme reaction, but I have had the experience of, like, oh, I get it. Like, I get to turn my schedule around your schedule. In particular, when when folks give you, like, here are the three times that I'm available. But I've also had plenty of ones that are sent to me that it's like, I'm I'm available anytime except for from 2 AM to 3 AM, so pick whatever time you want. So perhaps it's, like, how it's used rather than, in general, it being bad.
Speaker 1:It is emphatically how it's used, and I'm not even sure that I would've. There was an initial tweet that folks didn't see this. There's initial tweet basically saying that that it is rude to send someone a that your account a account a invite. This is a VC, I believe, that was saying this. Yeah.
Speaker 1:They they they viewed it as an act of domination because, of course, everything in the most animal terms. I think that I had not had, oh, I've had 2, like, bad experiences with Calendly and plenty of positive ones. I don't use Calendly myself, but, obviously, I make a lot of appointments people do. And I've had basically, every experience has been positive for 2.
Speaker 4:One of which was that the
Speaker 1:person had scheduled it all in the wrong time zone, and I didn't realize that at first. So I'm like, this person only looks to meet between the hours of, like, 1 and 6 AM. And for whatever reason, like, my first thought was not like, oh, the time zone is wrong.
Speaker 4:Well, there you go. I that says something.
Speaker 1:It does say something. It does say something. Wait. What does it say? It it Let's
Speaker 4:not get into that.
Speaker 1:Well, are we gonna get into that later? I think I think we should. I I it it it says that I may be lacking a little bit of common sense sometimes. There you go. Is that the is that the most you've missed the query?
Speaker 1:You know what it reminds me of? They were talking about the debugging this bug in either we'd I I working on a pretty deep change to the kernel, changing the way the kernel does timekeeping. And the way you remember this, the cyclic squad. Yeah. Yeah.
Speaker 1:Changing the way that the so UNIX historically has this clock that fires every 10 milliseconds called that increments this variable called elbow lightning bolt, and I was changing that to be a consumer of a much more general facility and doing a bunch of things that were very low level and had to do with weights. It keeps time. And I came in, and I was on the the final approach for this WAD. And I came in in the morning, and one of my test machines had was off had powered off. And I'm like, goddamn it.
Speaker 1:I have a bug that is powering the machine off. And how do you debug that?
Speaker 4:And it's it's it's kind of like burning its bridges there. Oh, absolutely. Lot of state left over. Not a
Speaker 1:lot of state left over. So I'm thinking, like, I and so I'm like, goddamn it. How first of all, what did I do to do this? Like, the because we do actually have power control in the system, and it certainly is possible for the system. And I I've got a stray pointer.
Speaker 1:I've got memory corruption that's powering the thing off, and how am I gonna debug it? And I'm kind of I I I don't know what and I'm kind of pacing trying to figure out, like, what is even my next move? Like, where do you record data to if we're gonna we you can't record it to DRAM. And as I'm pacing the lab, I realized that another machine was off and another machine next to it was off. And in fact, all these monitors on this row were off.
Speaker 1:And in fact, it was a blown power strip. And you basically how how long you've been waiting to do that? I did you have did you have to readjust your headset to be able to prepare yourself for the slow clap? That's what I wanna know.
Speaker 4:Did I do a slow clap? I I I that was that was totally unintentional. I I was taking off one headset while I unmuted the other. Sorry if that sounded a
Speaker 1:that did sound like a slow clap, but clearly, I'm hypersensitive. Yeah.
Speaker 4:I I
Speaker 1:I was certain that was a slow clap. Was that not what that was?
Speaker 4:Nope. Sorry. My bad. But I I do know that feeling. That must have been a month's long WAD, or certainly a week's long WAD.
Speaker 4:So your mind just goes to very dark places.
Speaker 1:Your mind goes to very
Speaker 4:dark places. Not like, hey, maybe someone kicked over the power strip.
Speaker 1:Right? Exactly. Someone just like, that that's exactly right. Your mind goes to very dark
Speaker 4:places. Is this a good segue of, like, you know, you know, stupid problems, into tonight's topic.
Speaker 1:So, maybe as good as we're going to get it.
Speaker 4:And so, so this, this came up because, there was this hacker news thread and featured prominently in it was this warning message that you wrote, 15 years ago, 12 years ago, something like that?
Speaker 1:Coming up on 14 years ago.
Speaker 4:Okay. Okay. Oh. And yeah.
Speaker 1:Well and so on I I
Speaker 4:don't know how I I think I
Speaker 1:found this by searching ZFS on Hacker News. Do you I mean, like, look, we're in a safe space. We're actually not a safe space. But the do you search Hacker News? What do you search on Hacker News for?
Speaker 1:Do you search Hacker News periodically?
Speaker 4:Do I search, so I, I do search Hacker News periodically. Infrequently though, I do look for things like DTrace and ZFS.
Speaker 1:And come on. I do ego search Hacker News. It's okay.
Speaker 4:No. I I look. It it look. I think that that would be too that'd be tough for me. I just I don't think I come up enough that it would be that that thrilling of a ride.
Speaker 4:But I do look for, like, folks for whom, you know, my enemies list, to make sure that they're being disparaged properly. I'm not gonna share that full list. I mean, that could be a whole different Twitter space.
Speaker 1:Even better. I know I don't search myself. I only search my enemies.
Speaker 3:Like, I I don't
Speaker 1:I don't care what people are saying about me. I wanna make sure. Yeah. Whatever. That's fine.
Speaker 1:I I I demanded attention to detail in the way that my enemies are big. Yeah. There we go. That makes sense. Okay.
Speaker 1:That makes more sense.
Speaker 4:Yeah. So what do what do you search for?
Speaker 1:I will own to e to ego searching Hacker News. I definitely I I've got a distinctive last name, so I I do search for that. I searched for Detroit, and I searched for ZFS just because I'm, you know, curious, whatever. Yeah. I I have been just because you wanna you wanna keep an eye on when people are kinda talking about technologies, maybe you can chime in.
Speaker 1:I do find that that it very much helps in online discussions if things are especially tacking super negative. It very much helps for the technologist behind the technology to participate in the thread just to humanize it. I don't know if you've found this as well.
Speaker 4:No. Agreed. I mean, I I feel like that does put the lid on some of the most toxic kind of conversation when someone shows up and, like, wave emoji.
Speaker 1:Wave emoji. Exactly.
Speaker 4:Like, this I'm I'm here actually right now. So, you know, as
Speaker 1:That that's right. And and, you know, and then you obviously want especially people being critical, you wanna make sure that you're listening to them. But I just find that, like, hey. Can we light up something your neurons? Don't have to be quite such assholes to the the people who developed this technology.
Speaker 1:The so I I I do end up searching for that kind of stuff, and I realized that we're now in direct opposition.
Speaker 4:We're saying about your enemies list. So I don't know. We're I we're just gonna have to, like,
Speaker 1:have it both ways. The so, I and after we open sourced Hubris and Humility, I actually started searching Hubris on Hacker News.
Speaker 4:That seems like there might be some some false positives unless There
Speaker 1:are a lot of false positives, and it's this kind of, like, crazy slice across Hacker News because as it turns out, Hubris is only being used to disparage things. And it is kind of like a the it it it's like a very finely tuned detector where you're actually now being trolled by things I wasn't being trolled by before by searching for the term hubris. Yeah. You
Speaker 4:know what? It's it's not surprising at all that that's a popular term on Hacker News because it's the right kind of intersection of just intellectual enough and just disparaging about it. You know, not not not too clever and not
Speaker 2:too kind.
Speaker 1:It is.
Speaker 5:And it's
Speaker 1:got I mean, hubris is I mean, it's a great word. It is a great unfortunately, it's a 6 letter word, so it'll never be are you I I I didn't mean to ask you, and I just I'm sorry to go on tangent. Can we talk Wordle for a second?
Speaker 4:Yeah. Yeah. Yeah. Yeah. I knew I knew that's where you're going.
Speaker 4:Yeah. Yeah.
Speaker 1:I I
Speaker 4:mean, I I love Wordle. It's, you know, it's a competition in my house that that I had the good fortune of of winning a couple days in a row, which which was not to my benefit, to be clear. Like losing mortal is the smartest thing I can do in my relationship,
Speaker 1:but that's not something you're gonna be biologically capable of doing.
Speaker 4:Well, turns out my wife is better at world than I am.
Speaker 1:Really? I married
Speaker 4:up in that regard. Yeah. Yeah. She crushes.
Speaker 1:So then I mean, so folks should know that you're a very good Scrabble player. I would never play as I mean, we'll move it this way. Go ahead. Sorry.
Speaker 4:I was a I was a pretty good Scrabble player. I did spend time memorizing, like, lists of words, and I do know what some of those
Speaker 1:words And I highly recommend if you're gonna play Scrabble with Adam, bring a dictionary and a lawyer because you're gonna need them both. They
Speaker 4:Yeah. Yeah. Both of them for sure. Absolutely.
Speaker 1:Do not do not bring one without the other or you'll be devout. I guess, so it wow. That's interesting. So she's a very and so that means you may have bred some sort of whirl superhuman in in in Joshua. That is that that's terrific.
Speaker 4:Yeah. Yeah. It could be. I mean, found him asleep last night in bed with his swim goggles on, but maybe that's a maybe that's a sign of such preternatural wordle ability too. For
Speaker 1:whatever reason, that's just such a great visual, just because he's such a I mean, he's he's a large human.
Speaker 4:He's he's a large human. He's a large human. He's a large human. He makes large decisions.
Speaker 1:Alright. So we do so but wordle is popular to your household. You would you you enjoy yourself in Portal. Alright. The Yeah.
Speaker 1:Wordle's been very popular in our household. So it's very it's very well done, honestly. Everything about it is well done. It's kind of amazing that it wasn't done earlier, actually. Because it is, like, mastermind for words.
Speaker 4:Exactly. You'd explain it like that. You're like, why wasn't it done sooner
Speaker 1:but it is it is perfect.
Speaker 4:Yeah. So you're interested there you are searching for yourself on Hacker News and then moving on.
Speaker 1:Then moving on to ZFS. So searching for hubris, finding that very strange slice. Okay. Moving on. And then yeah.
Speaker 1:So I think it must have been a search for CFS that found this. And then it is definitely weird to, like, search for something and see this message that I wrote 14 years ago, coming up on 14 years ago. 14 years ago on April 18th. It is. We'll get into kind of the details on that in a second.
Speaker 1:And so the just just to give other people context, and the context is also in the virtual suite on the space. But this is a so Adam and I worked together on a a group that was developing a network storage appliance at Fish Works and Son. And like like any appliance, any, IT appliance, you it's not designed to be interact with at interacted with at the operating shell. There's a higher level command line that is more captive and more constrained. But, again, I would assume, like most of my most of these kind of environments, there is a way to get an arbitrary shell into the operating system.
Speaker 1:And this message is very direct. So, Adam, did you I I had not thought of this message since we wrote it.
Speaker 4:No. No. I do I do remember, like, the the lengths that we that we went to, to keep people out. I do remember there being, like, a big message, but I I did not remember the details of this
Speaker 1:for sure. Perform an out loud reading from this please do that. So I I first of all, you're going to have to, to your listener, imagine the ASCII art box that contains all this. The ASCII art is a very important element of this, of course, that's kind of properly framed with ASCII art offset with with dashes and hyphens and pluses. So it says, I'm not gonna scream the bits that are in all caps, by the way.
Speaker 1:So I just imagine some of the let's just assume that the most emphatic bits are in all caps. You are entering the operating system shell. By confirming this action in the appliance shell, you have agreed that this action may void any support agreement. That's in caps. If you do not agree to this or do not otherwise understand what you were doing, you should type exit at the shell prompt.
Speaker 1:Every command that you execute here is audited, and support personnel may use this audit trail to substantiate in validating your support contract. The operating system shell is not a supported mechanism for managing this appliance, and commands executed here may do irreparable harm. Nothing should then new paragraph. Apparently, that we decided that wasn't enough. Apparently, like, you'd be like, wow.
Speaker 1:Okay. Like, that seems like enough. Apparently, it's like, no.
Speaker 4:Well, dear reader, dear reader Yeah.
Speaker 1:Exactly. Say next paragraph, Rick. Next paragraph. Next paragraph. Nothing should be attempted here by untrained support personnel under any circumstance any circumstances.
Speaker 1:This appliance is a nontraditional operating system environment, and expertise in a traditional operating system environment in no way constitutes training for supporting this appliance. And this is where he gets the sentence that I think that the original poster to this Hacker News resonated with this person anyway. Those with expert and this is all caps. Those with expertise in other systems, however superficially similar, are more likely to mistakenly execute operations here that will do irreparable harm. Unless you've been explicitly trained on supporting this appliance with the optics shell, you should immediately return to to the appliance shell.
Speaker 1:Type exit now to return to the appliance shell, new paragraph. So Well, I would say
Speaker 5:it wasn't crafted, first
Speaker 1:of all. Group effort for sure. But when I read that, I'm like, I I'm virtually certain that was me. It was me, but I I that was definitely a group effort, I think. Yeah.
Speaker 1:Yeah. For sure. And I think I know that because I found the email that I sent to the team announcing that we had done this. And, that that email was helpful because so first of all, a couple of things about this. Did you remember anything about this message, Adam, when you saw it?
Speaker 4:No. No. No. In in fact and you and Keith were sort of, joking about the genesis of this, and that has totally like, I have no idea why. I I know that something precipitated this.
Speaker 4:This did not come out of the blue, and I don't think this is purely prophylactic. So I I did not remember the
Speaker 1:So the thing that's funny is what I thought the genesis was, the timing does not light up. I thought we had done this after we had shipped. We shipped in in late 2008. This was done in early 2008. This was done before we shipped.
Speaker 1:And I what I didn't remember correctly is that the the the target audience here was not actually the customer. The target audience was actually our our fellow Sun employees who thought they were being helpful. That was really the target.
Speaker 4:And was this was this the test organization or was this was this other
Speaker 1:other You know, I think it was other helpers. I think I think that the the the are the folks that worked with us in the test organization kinda figured out pretty quickly what they could and couldn't do. The the the context that we had offered was that we had people that were, changing the network interface configuration from the operating system shell, which is, like, not something you that is being managed by another body of software. And this is where you get to, like I do think kind of, like, the the interesting and slightly broader issue that we wanna hit on, which is, you know, when you you're offering like, this is not an abstraction. The operating system shell is not part of the abstraction that we were offering.
Speaker 1:It was very much an implementation detail. But there were things that that would lead you to believe that there like, in in Unix, the shell is the abstraction, and you can do things with the shell. And that gives you this kind of false sense. The, you know, the Jurassic Park line. Right?
Speaker 1:I know this. This is unique. It's like, no. You don't know this. Like, you don't know this.
Speaker 1:This is not in particular at the time. This is not the Solaris that you you think you know. And because you actually know enough to be dangerous, quite literally, and you are way more likely to do harm by knowing a little than you would if you knew less. And that's why we were trying to
Speaker 4:be very explicit in this message. And this could be true generically. Right? I I don't know how other on prem appliance like entities do this. But this is a general problem.
Speaker 4:I was doomed. Or like built around Linux or free BSD or or whatever it is. It's like if if someone if some naive user manages to or or even sophisticated user managed to navigate their way to such a shell, then they're probably gonna break things worse.
Speaker 1:They're gonna break things. And so, yeah, I think what we and I would love to get to some of these stories that I know people have of where well meaning, maybe customers, maybe support personnel, well meaning folks have made the problem much, much worse. This is like the like the James Garfield of of IT support. Do you know what? Adam, have I if we No.
Speaker 4:I don't think we've gone deep. Oh, James Garfield. So
Speaker 1:James Garfield, president, assassinated. Singleton he is a he is shot by Charles Guiteau, bonkers guy. Charles Guiteau shoots him because he has been passed over as the ambassador for France, which I kinda love. I tell, like, a little twist that I love that, like, you gotta think, like, James Garfield as this guy is shooting this. You know, this is exactly the kind of anger management problem that I was concerned about when I did not make you the ambassador.
Speaker 1:Like, you keep asking you're you're doing it right now. You're do you're doing the thing that that that that I am concerned about right now. I don't I don't think Charles Koch was a serious candidate for the ambassador to France. He shoots Garfield. Did you know the story?
Speaker 1:This is an amazing story.
Speaker 4:No. No. No.
Speaker 1:He shoots Garfield. Garfield then is is taken away. Shops shoots in a train station is taken away. And they and now 19th century medicine will take it from here. So in particular, they wanna fish the bullet out of it.
Speaker 1:But Alexander Graham Bell is here. Don't really know how that works. Like, Alexander I don't know. It's Washington DC. It's 1881.
Speaker 1:Alexander Graham Bell.
Speaker 4:This wasn't a dream. This really
Speaker 1:happened. Yes. Alexander Graham right. And then there's a walrus there, and the walrus go. No.
Speaker 1:Alexander Graham Bell was there. And he's Alexander Graham Bell has a new invention, the metal detector, and he's gonna use this. Like, I will use this to determine where the bullet is. He determines, bad news. The bullet is, like, buried in, like, near the spine.
Speaker 1:Alexander Graham Bell has detected the bed springs in the bed. Oh my god. This is not good. Yeah. So then they are fishing around for this bullet with these filthy hands.
Speaker 1:Like, there's there's not this understanding of infection. And James Garfield dies of septic shock, dies of sepsis.
Speaker 4:I feel like there's, like
Speaker 1:I I feel like that like, I I mean, the story is interesting. It's captivating to me because I feel like this this analog comes up a lot where you have where your I mean, I think we could kinda throw in the the the blanks with where Alexander Graham Bells that we have known have detected the bed springs that we can think of. And that others have pitched their hands around it.
Speaker 4:I'm reading a book that referred to this period as the one of heroic medicine. And I do think that there's a strong analog here for heroic debugging and heroic.
Speaker 1:Oh, yeah. Go on. Yeah. I like that
Speaker 4:we'll we'll just I mean so I I'm thinking of a of a customer example and uh-uh I gotta get this one off my chest. So I I was the CTO of a company called Delphix. And one of the organizations that that, that was in my purview was support. And this is I don't know if this is dated or embarrassing, but I'm just gonna get out with it. Like, the way that we supported this product, it was an on prem product, is, for these systems that were not Internet connected, is often a support person on the customer side, would start a WebEx.
Speaker 4:They would screen share. We would log into this, this kind of shell with this kind of prompt in front of it and then we would remotely type in commands while the customer watched. It kind of sucked for all the reasons you can imagine it sucking. Like the customer screensaver turning on and they were out at lunch and us losing, you know, time doing that. Like all those kinds of things.
Speaker 4:But one of the ways it really sucked is, when the, in this case, the customer broke the connection, intentionally with us and was left with the shell left with this ostensibly general purpose shell to a system they knew. And they completely fucked things up. Like in, in particular around networking stuff, as they as they, again, tried this heroic intervention. And then we finally patched. We got the connection back up.
Speaker 4:We went back online. We kept on debugging this thing. And the support people kept on reporting to me that they had no idea what was wrong with the system. That it seemed worse than when they started. And the customer did not fess up to this.
Speaker 4:We kept on a hypothesizing. We think the customer, when we broke that connection, that they went and monkeyed with stuff, but how can we prove it? Unlike the fish work system, Our commands were not logged. It wasn't audited. It was a it was a huge oversight on our part to not do that, but we didn't have any insight into what they could have done.
Speaker 4:We didn't have any smoking gun until.
Speaker 1:Yeah. No. Because you are not yet a parent or teenagers when this is happening. I feel like this is where your teenage parenting actually, maybe maybe this
Speaker 4:is more
Speaker 1:like younger kid parenting where you're just like, if someone took the cookie from the cookie
Speaker 4:jar, I'm
Speaker 1:not saying you did.
Speaker 4:Well, when we asked them and they they said no and we asked them again and they insisted no. And and then we got an email and, you know, g one of the great gifts of Gmail is not giving you the full context of the emails you're sending or forwarding. And we got forwarded an email from the customer where they they, you know, the the first kind of few messages as we expanded the the forward chain were pretty you know, pedestrian. But about 20 messages in, we started seeing some bickering back and forth at the customer in email saying, should we fess up? Should we tell them the ways in which we reconfigured networking?
Speaker 4:Should we not tell them? They decided not to. And and we saw all of this happening in reverse chronological order in this deeply nested forward. And, you know, and then the politics of you know carefully presenting this to the customer. Yeah, we needed to also retain But, but we're we're able to navigate past that.
Speaker 4:But we really, I mean, we we we we implemented auditing shortly after that, and it turned out to be really important. Yeah. And how do you get to that? Because I
Speaker 1:and I I just think of, like, this is also this fundamental tension of, like, it is you know, it's your system. You're the customer. Like, you have paid for this thing. And on the one hand, you know, we want to you to you got the right at some level to screw it up. On the other hand, like, we don't want you to screw it up, and we don't we want you to have the best part.
Speaker 1:And, I mean, it's like, it's hard, like, there's this tension. You don't wanna be overly passionizing, but you also don't want them to to to accidentally do something that is gonna jeopardize their experience.
Speaker 4:Well, and when when you can't unwind what they've done. Right? Like, you can configure systems in such a way where, like, I'm not sure I can resuscitate this thing. So so, like, but but this I mean, the fact that they needed to take the reins or they decided they needed to take the reins was was also symptomatic of another problem, which is, you know, our support team, you know, great, great folks, but they had not given them the confidence they needed. Like, the customer didn't take control as the first step.
Speaker 4:They did it after they had sort of lost patience and decided, like, hey, I can fix this better than you guys. So, Interesting. That was the other lesson that we internalized there, which was, we spent a lot of time like fixing things that maybe were the problem, maybe weren't, but were certainly good for the system. You know, good beneficial, you know, good robust health kind of things. But we developed like a 3 strikes and you're out rule, which was even if you find things that may be beneficial, but you can't, but you don't think that they're necessarily, you don't have evidence that they're necessarily going to fix that problem.
Speaker 4:We deferred those kinds of changes just because of this experience of, you know, if you, if you make too many changes with the hope that it might fix things, they lose confidence after the 2nd or 3rd.
Speaker 1:Oh, god. You're bringing back bad memories. This is bad memories. Yeah. No.
Speaker 1:This is like a really important point because you are people want to solve the customer's problem. Customer's upset. You you want to write the customer. And in that enthusiasm, people are not always completely rigorous about how they vet their own theories.
Speaker 4:Absolutely. And you start digging for the bed springs.
Speaker 1:You start digging for the bed springs.
Speaker 4:Oh, my God. Yeah. I mean, this was particularly acute when we dealt with performance problems, which were, I mean, by their nature, are much squirrely often than than functional problems, than than kind of clearer bugs. And, like, in particular, we we you know, in that product, one of the common misconfigurations that customers would have would they they hadn't set, like, the NFS, you know, maximum transfer size to a particular value. And so our our folks would say, oh, reboot the system and do this and do that.
Speaker 4:And if that didn't fix the problem, that was sort of strike 1. And you you only had a couple more opportunities, of not fixing the problem before they just gave up on us.
Speaker 1:Yeah. I, I mean, not to be a captive of the mother of all support issues, but God is bringing back so many bad ecash parity or memories.
Speaker 4:Because because I I Sun must have just been saying
Speaker 1:Oh my god.
Speaker 4:Do this, do that, hold the antenna sideways.
Speaker 1:Oh, hold the antenna sideways. Oh my god. There were so many of these. 1 in particular because they they would talk with the cache barrier. Before, we did the requiem with Tom on Spark.
Speaker 1:I said that we weren't gonna get through without talking with the cache barrier. This chip level defect that caused all these problems. And one of the problems with this thing, and I think we've described before, but just for those to ramp everyone else up, The when you had bad parity, and that you had a you had a chip that had, that had bad parity on this e cache line, had a bad e cache line. That chip was not necessarily likely to discover it. That that parity error could be discovered when the line was snooped by another CPU.
Speaker 1:So and it would not tell you where that came from because it would know. So CPU 10 has an e cache parity error. So, of course, like, oh, replace CPU 10. You're like, no. You literally just killed the messenger.
Speaker 1:So we had customers where we found the bad CPU because it was the only one that had been placed.
Speaker 4:This is like that that picture of the airplane in World War 2 and where the damage had occurred.
Speaker 1:Yes. Absolutely. And the and the damage you had to infer from the and so there was that was bad. But the memory you really brought back for me was dealing with a, customer in London who I think I I I guess they told me the story just to torture me. I was not but they were well, I think, actually, to convey how upset they were at the time, which is very understandable.
Speaker 1:They were having a cash barriers, particularly bad rash. Sun was claiming these were due to environmental issues in their data center. And in particular, Sun claimed that this is because they were within a quarter of a mile of the London Underground.
Speaker 4:It It's like vibration or something?
Speaker 1:As vibe. Okay.
Speaker 4:That's exciting.
Speaker 1:Right. Which, of course, like, how does Vibe get you to, like, an e cash parity? Like, these are, like, not connected. They had done, like, the math and effective 99.8% of the city of London is within a quarter of a mile of the tube. So it's like it's not a you know, that's ridiculous.
Speaker 1:Then they claimed that it was due to dust in the DC, that the DC data center had Can
Speaker 6:I just ask
Speaker 1:how many
Speaker 6:spots in London are not within a quarter mile of the London Underground?
Speaker 1:Exactly. They are all within a quarter mile of London Underground, which is part of the reason. So the customer basically kicked that one back to Sun, like, no. No. That's bullshit.
Speaker 1:But they did convince them that there was a dust issue. So the customer did a £1,000,000,000 upgrade of to to have the the environment make the environment much cleaner with respect to dust. Of course, it had nothing to do with the problem, and the eCash parity errors persisted. In part to try to repair the relationship with this very upset customer, Sun invited them up to the manufacturing facility in Scotland. And they got the manufacturing facility, and in the manufacturing facility, these e ten k's were being or or at that point, f 45 k's big machines, were being deboxed in the same room that they were being burned in.
Speaker 1:So there was, like, cardboard everywhere, and there was dust everywhere. And the customer described running his finger on, like, a surface, and looking at the finger, and the finger was just black with dust, and holding it up to the the VP of quality at Sun saying, I paid a 1,000,000 quid for what exactly? And apparently, it was just like I'm like, what was it like in that room? They're like, oh, it was silent. It was silent.
Speaker 1:And, you know, to Sun's credit and not very it's not saying very much.
Speaker 4:But to Son's credit, it was just
Speaker 1:like took that one right on the chin. Like, did not try it. Like, I'm not gonna say any words here. I'm just gonna like,
Speaker 4:Yeah. Just let you shout. Just shout it out.
Speaker 1:Just just shout it out, and I'm gonna beg for mercy. But it it it is it it was a good reminder that you you know, and this is, like, this is kind of the duality that this kind of this message represents. That on the one hand, like, that's concerned about the customer being wrong. But there are lots of times when the folks using the technology are right, and you'd and you gotta be very, very careful when you are supporting a customer. And I let you know, I like the the 3 strikes in your out things kinda eventually got to work.
Speaker 1:It was like, you wanna, like, like, we don't have an infinite amount of capital to chase random theories. We really need to hone in on this thing, and then we need to get it right. And I think it's tough when the customer's upset, you know.
Speaker 4:Yeah. You want to, kind of, like, maybe this is the thing. Maybe this will, like, I can just do this right now, and and it could fix it. And it's like, it's hard to see the other side of it. But if it's not right, like, will they believe you next time?
Speaker 4:Like, or have you just lost a little bit of credibility? Even even if it's like, sort of low effort or whatever to try it out. It's like every one of those. Yeah. It's kinda surprising.
Speaker 1:And I think I mean, my bias on on kinda squaring this is is transparency, I would say. But I'm not sure that I mean, I don't think that also is not a perfect solution because I have been probably too transparent with customers at times. Where it's like, I don't know that if I want my surgeon to walk me through the risks at quite this level of debt.
Speaker 4:You know what? This this calls to mind another Delphix support incident where we were doing an upgrade that involved, like, moving a lot of d ZFS datasets around and to and fro. And it was a pretty dicey procedure. And so for this, Eric Schrock and I, the the VP of engineering and I as CTO, were in the room with the support personnel as they were running this on the first customer for the first time. And it went South and, you know, Eric and
Speaker 1:I were like, you know, how
Speaker 4:are we going to recover their data and so forth. The customer comes on the line and says, hey, guys. You know, we're getting to the end of this, to the outage window. Like, are we gonna make it here? And Eric and I are looking at each other like, your data may all be gone and, like, we may be losing our jobs and and the support guy just unmutes and says, you know, it's gonna take us, a little bit longer than expected, and and I all I can do is apologize, and we're working on it.
Speaker 4:And I was, like, oh, David.
Speaker 1:Thank you. Thank you and
Speaker 4:we were able to get the customer data back, but it was deeply touch and go
Speaker 1:well. That's a skill too.
Speaker 4:I know are you I mean, that's what I'm saying like I would have just vomited all over the phone and and this is also I mean to your point about transparency like and and, you know, if if there is a if there's a problem during surgery, to what degree do you wanna be informed of it in the moment? Or perhaps after the fact, after everything has been stitched back up?
Speaker 1:Yeah. I mean, I remember when, I was one customer that
Speaker 4:we were very transparent with. This is our our friend, Jesse Saint Laurent, and who was a huge advocate of what we had
Speaker 1:done at FishWorks. I remember talking with him about a custom a problem that a customer's having. And they were having a it was both end of a thundering herd. A diagnosis is a thundering herd. And we were a there was a CV broadcast that should have been a CV signal.
Speaker 1:So and there were for other reasons because of
Speaker 4:s u zero d, we had thousands of threads that are waiting on
Speaker 1:the CV, and we're broadcasting. They're all waking up. Only one of them is getting the lock. The rest of them are going to sleep. But in the
Speaker 4:process of doing it, all of those thousands of threads are in the way of the one thread that needs to do
Speaker 1:the work. And so the system is going orders of magnitude slower than it should go. If we just and those of you who write multithreaded code, you you should use CV signal not CV broadcast. CV broadcast, you generally do not people kinda casually use broadcast, like, you really rarely need broadcast. You should the the disposition should always be the to signal rather than broadcast or what the subject are probably.
Speaker 4:This this is your PSA for the kids. I gotcha.
Speaker 1:No PSA for the kids. Listen. I listen. You're the one with a teenager who's dabbled with c plus plus. So, like, I'm trying to help you out.
Speaker 4:I'm trying to Oh, the shame. Yeah. I know. I know.
Speaker 3:I know.
Speaker 1:I the I I the if anyone's wondering where my c plus plus tweets are coming from today, they're all coming from Adam and and him. Adam is blessed with a teenager who's interested in sovereign sharing, which is not And and damned with
Speaker 4:a teenager who is only in c plus plus for the moment. Trying to meet him where he is.
Speaker 1:Trying to exactly. The good news is loves to cook. Bad news is it's all cooking crack. But, you know, I I I I wanna encourage the interest in chemistry. But so if if if people are are wondering about but so the yeah.
Speaker 1:I mean, this could be this could be news you could use. But so what I realized is that, like, actually, these things, CD broadcast and CD signal are right next to one another in program text. I mean, of course, because, you know, part of the same file is on. And the branch displacement they're they're less than 256 bytes apart. And I could actually write the branch displacement in program text.
Speaker 1:Did you know I did not?
Speaker 4:No. Talk about heroic medicine, though.
Speaker 1:And I started asking Jesse questions that were making it were making him very nervous. I'm like, so just, like, tell me about this system. He's like, well, it's like an extremely important system that right now, thousands of health care researchers are using.
Speaker 4:Yeah. Right. But but all systems are important. We're all I mean, come on.
Speaker 1:Right. And he Jesse's like, why are you asking me all this stuff? And I'm like, no. I like I think I can actually fix this. I think it will be it will be a single byte.
Speaker 1:Right?
Speaker 4:I have an experimental procedure that I just thought of on the moment that I'd like to try out on your grandmother. Like, how does that sound?
Speaker 1:It and listen. Granny's not doing so well. This is it is experimental medicine. It was like, you know what? What do we have to lose?
Speaker 1:We don't have anything to lose. And and I'm like, it would it be worse if this system bounced? Yes. It'd be much worse. Like well.
Speaker 1:Okay.
Speaker 4:Well like now now you're doing actuarial calculus like how much worse and how likely is it to bounce
Speaker 1:Right. I'm holding up the I gave it a 7 in terms of pain or an 8. It feels more like a 7. Maybe it's a 6. Remember, 10 is, like, the worst pain you've ever felt in your life.
Speaker 1:The I love doing that with kids. Like, it's a 10. Like, it's not a 10. Like, no. You're trying to choose between, like, 1 through 4.
Speaker 1:So
Speaker 4:those are those are the numbers that are your options
Speaker 1:like numbers above 5 are not are not eligible for the pain you are currently enduring.
Speaker 4:But I did ask for his, like he's like Jesse's like, you keep like, why are
Speaker 1:you still talking about this? Like, are you just like this is making me really nervous. Like, you are clearly nervous about this, and you shouldn't do this. No. No.
Speaker 1:It's fine. It's fine.
Speaker 4:So did you dig the bullet out of the bed springs?
Speaker 1:I did this patch the bite and I mean it was I mean look you know we don't I know you're using heroic to fucking disparaging but
Speaker 4:Well, it it should be it's just be
Speaker 1:the hero.
Speaker 4:It's not clear. You're talking about in the kernel. Right? Yes. And so what Brian is saying, he went to the function called CV broadcast replaced the first bite to turn it into a branch to CV signal.
Speaker 1:Okay. Listen. That that would have been crazy. I was not doing that. What I was doing was marginally less crazy.
Speaker 1:I was going to the particular call.
Speaker 4:It was the call. It was the broker. Yeah. Yeah. Okay.
Speaker 4:So you just changed that one call site. The rate. Yeah. There you go. There you go.
Speaker 1:I have your own part.
Speaker 4:That's that's that's more narrow narrow crazy.
Speaker 1:Yeah. It's narrow crazy. Yeah. That's awesome. It was amazing.
Speaker 1:I mean, it is it just shows you the incredible power of a single byte in a I mean, this is this is, like, the the glory of of software systems that you got one bite that is the difference between the system just being destroying itself and just being just ripping. It was amazing. So, yeah, I mean, we hit the one byte and, like, it was just like all the storm clouds were gone. Like Bam system was cranking.
Speaker 4:I mean, and you did have this preponderance of evidence that this was the problem right. It wasn't like I mean, you weren't just changing some bite or whatever. Like, it was pretty targeted.
Speaker 1:It was extremely targeted. But I definitely remember this is where transparency was not necessarily great because I was being totally transparent with with Jesse, who
Speaker 4:is getting increasingly nervous at my level of transparency. Right. Like, moving you away from the system. Right?
Speaker 1:Exactly. Exactly. So but I still think that I gotta be biased towards I think you gotta be biased towards transparency. Right? When we're dealing with these
Speaker 4:Yeah. Absolutely. Get definitely biased towards transparency. Sometimes it's it's it's the timing of that transparency. Right?
Speaker 4:Like, Hey, your system is in a really bad state and I'm not sure we're ever gonna fix it, but we're still working on it is different than like, you know, kind of a a a more percentage wise update or an after the fact recount of of what all went on.
Speaker 1:Yes. And I think that the because I also think that the how is this different in the software as a service world? How has this changed?
Speaker 4:That's interesting. Because, I mean, you do get these I mean, in some ways, you get these outage notifications, and there's a lot more like it's, it's pretty opaque in most cases. And then you'll get these post mortems of often or for the responsible SaaS vendors that that tell you what happened.
Speaker 1:That do tell you what happened. Yeah. And I would love to know. I think the postmortems are I think this is actually generally a great trend is that the SaaS outages do have now very detailed post mortems. And, you know, that's a that's a great shift where we I mean, we know right now if if Amazon or or Google were to have a cloud outage that would be severe, we can count on the fact that, you know, 4 or 5, 6 days later, whatever it takes.
Speaker 1:There will be, like, a pretty detailed postmortem. I'm I'm actually I know people can be critical of them, but I think they're actually really quite good. And certainly, we always strive to do that at at Joanne. I'd be curious to know, like, when that really started. And maybe it just that always came along with SaaS.
Speaker 1:I don't know. Maybe
Speaker 4:But it is it is a good kind of cultural norm or what is becoming a core cultural norms or expectations, certainly. But, you know, that's different than your interaction with Jesse where that's like live tweeting the outage.
Speaker 1:Yes. This is more like I'm pre mortem. The pre mortem people are less excited about.
Speaker 4:That's right.
Speaker 1:Yeah. Ian, go ahead.
Speaker 2:I think if you'd have, sufficiently large customers, they're going to be asking you through the private support channels sufficient number of those customers, it makes more sense to just post it publicly, so that everyone can refer to the same document rather than posting, you know, individual updates to each of these, interested parties.
Speaker 1:I think that that's totally right. But having been a very large customers of things that I have not gotten any of that transparency on, I mean, even still. I mean, we were you know, we had a Twitter space a while ago. We're talking about the the Seagate firmware problem. We were a huge customer of those drives and never got full transparency on what actually happened.
Speaker 1:So I I actually think that that the it's not just customer. I mean, I agree with you. It should be that. But I don't it's not just that. I mean, I feel that, like, these companies are more transparent than they strictly have to be.
Speaker 1:But maybe I
Speaker 2:think I think the other part here is a little bit of industry standardization around terminology and around process where Google pushing out the SRE, model in book form and starting to talk about some of these practices and practicing what they preach kind of forced a bit of a transformation in the industry such that the idea of a PIR is something that is like this is an acronym that the majority of the industry understands at this point. So, you know, there is a bit of an expectation and a a kind of rising tide in that, this is kind of standard operating procedure for any large enough vendor now to provide that kind of thing.
Speaker 1:Which is great, honestly. I mean, I think it's a terrific development in in the industry, and certainly something that that we always believed in, but not something that you got from kind of traditional certainly from traditional vendors, you don't get that. They isolate their customers from failures. And maybe it's the fact they're public too that makes it harder to run from, harder to deny. I I mean, AWS, if if there's a cloud outage, it's really hard to deny that it happened versus
Speaker 4:sad. There's an if there's like an e EMC array outage or whatever
Speaker 1:Oh my god.
Speaker 4:Then they're notorious for, like, throwing, like, throwing discounts at you instead of a, you know, post incident review.
Speaker 1:What what we had a, a parity error on the on an LSI perk card on a on a HBA, Dell HBA. And Dell, I mean and I I just don't mind publicly naming them because they did this over and over and over again where it's like there's not a problem. Like the Jedi mind trick. It's like, what do you mean there's not like, these machines are putting the parameter. And then it's like, you're the only one seeing the problem.
Speaker 1:And, also, we think your software is to blame for the problem. It's like, this is a pair like, this machine can't boot because of a parity error on the HBA. Like, if my if if our software is somehow affecting that, there there's actually deeper issues.
Speaker 4:You're like, my software is barely running yet. Like, what are you talking about?
Speaker 1:Right. Exactly. Like, you're you you are blaming a a child that you're not gonna have in for another 15 years on your failed placement. Like, it doesn't even make sense. And I actually got so frustrated by this that I that at, surge in was it 2,000 and 10, one of those 2011 maybe.
Speaker 1:I was actually you know, this back in the day when the only way to speak to people was in a physical room, and I actually asked the room. I'm like, there were, like, a couple hundred people in the room and a bunch of folks that were still, managing their own physical infrastructure. And I remember asking the room is, like, is anyone seeing a parity error on a Dell perk on a I think it's an 1800, whatever it was. And most of the room is like, what are you talking about? But there were, like, 20 hands that shot up in this room, and all of a sudden, all the hands start looking at one another and realizing, like, wait.
Speaker 1:Because we had all been individually told this is only happening to you. This isn't actually a problem.
Speaker 4:Talk about a CV broadcast.
Speaker 1:Alright. Talk about a CV broadcast. Right? Oh.
Speaker 4:But I do think that, like, that's the thing in
Speaker 1:these SaaS environments. You can't do that. You can't pretend that, like, you're the only one seeing this problem. It's like, no. No.
Speaker 1:It's a SaaS. Like, everyone is
Speaker 4:in this problem.
Speaker 2:Yeah. I mean, there's there's also the the fact that if you're having a public PIR, you, that's only for, you know, severity 1 or severity 0. Like, the side is down, and it is very obvious that this down, sort of issues usually level reach that level. Or or okay. We're going to do a public PIR about this.
Speaker 2:But there are almost certainly, you know, a large number of very small incidents with smaller customer accounts affected and that kind of stuff where, either no PIR will be posted because, you know, they don't happen to raise a support case, during that window in which they have an outage or, you know, the outage is short enough that nobody really notices or it's purely between those customers and in private communications. But, you know, if if you try to go to a website and it is not loading, you can't really hide from that. And you do need to to fess up. It's not 20 hands in a room. It's, you know, thousands of people on Twitter.
Speaker 1:Right. It's the Internet. Ken, you're trying to get in here.
Speaker 5:I think another reason for just general, you know, more transparency comes from compliance too. Like, compliance and security regulations that have come into the fact. Like, I I I know they've been pop they've been around for a while, but I feel like they're just more popular, like, within the last 5, 10 years than previously.
Speaker 1:Well, if compliance is part of what's what's generating that and, you know, when you're using the term PIRR, use that as a post incident review. I assume it's the
Speaker 2:correct.
Speaker 1:Yeah. Yeah. I if if if that's, if compliance is generating that as a positive artifact, that's I guess, that's a very good thing, actually.
Speaker 5:Yeah. I I don't know if that's actually true. I mean, I'm more annoyed by compliance than anyone, but, it could be a reasonable cause for it. Because, I mean, you you know, there is, like, monetary loss to not following those regulations, and I don't know. It's, it it could be because of compliance.
Speaker 1:Yeah. Well, and it's I mean, you kinda bring bring up a good point too in terms of of the the tie in with with security. And it's certainly been interesting to us to discover, to watch how vendors deal or don't deal with vulnerabilities. And that which I think is the real test of a character of a company because, you know and I would say most companies companies believe that. I was actually shocked to learn that I always assumed that, like, everybody participates in MITRE's CVEs.
Speaker 4:But they don't?
Speaker 1:No. Alright. I mean, I I have known that it's, like, optional in the sense of, like, these are the common volubilities and exposures database that that's handled by MITRE. And it's, like, optional in the sense that there's off there's, like it's not you're not legally required. But I kind of assumed that everyone felt a kind of a responsibility a communal responsibility to report CVEs, But that is not the case.
Speaker 1:And the so, Adam, did you think because I
Speaker 4:Yeah. No. I I I thought that that was just what everyone did. Like, I
Speaker 1:did I did
Speaker 4:realize yeah. I didn't realize I mean, this sounds, like, naive now that you're saying it.
Speaker 1:Oh, god. This is so much better, though.
Speaker 4:Of of course, some folks will, like, choose to do things in secret or do things their own way or consult with their lawyers who will forbid it for whatever reasons.
Speaker 1:This makes me feel so much better because we found it in particular, and, like, this is just a statement of fact, which should be clear at this point. NXP, the the maker of putative putative secure microcontrollers, does not really believe in CVEs. So if you go to, like, the LPC 55, which we just we had Twitter space describing Laura Abbott's terrific work, describe finding this vulnerability in LPC 55. If you go to the CVE database on the OPC55 or the database on all the NXP, there are, I think, 4 vulnerabilities. And you're like, wow.
Speaker 1:This must be the most invulnerable thing ever. It's like, no. No. They just don't believe in it. And when I when I just Adam, I was like, you, this makes me feel much better that you felt the same way because I kinda felt like a kind of naive.
Speaker 1:Like like country mouse being like, what do
Speaker 4:you mean they don't believe
Speaker 1:in CDs? And our security were like, no. Are you serious? No. You not go ahead.
Speaker 4:They want people to think what you thought, that they're only 4, so it must be the most secure thing ever.
Speaker 1:It's like, but that's wrong. It's like, oh my god. Like, seriously? Do I have to, like, do all of your education for you right now? It's like, yes.
Speaker 1:This is all you know, this reminds me of the the in terms of. And, Adam, I'd like to believe that this reflects well on our fundamentally optimistic character. Is that a is that a name? Yeah.
Speaker 4:Yeah. That sounds good.
Speaker 1:The they'll tell you about this, like, this, we went to this marketing event. CIO Summit. Do you ever go to any of these Adelphics? These are speed dating with customers?
Speaker 4:Absolutely. Yeah. At at, like, boondoggle golf resorts all over the world. Oh, yeah.
Speaker 1:This is these are so filthy. So you have a these these paid for events. And so as a vendor, you pay to go to them. And they're like and as as Adam says, like, golf resorts, whatever, really nice, and there are all these customers there. Was it turn people users of technology.
Speaker 1:As it turns out and this is like I can't believe that this is certainly, it doesn't feel ethical. I can't even believe it's legal. You've got employees of like companies that are being individually paid to be there, which is feels like unspeakably dirty.
Speaker 4:Absolutely. A lot of them I mean, is it that surprising that, like, a IT exec at, like, a bank would would feel fine being paid for some of these things? I mean, based on other, like, their their pure executives in other divisions?
Speaker 1:But it's like, and I get I'd say, this is why I sound horrifically naive, but it's like so you got this, like, little cottage industry where you go and, like, sell yourself out on the it just feels like because, like, you're clearly there because, you know, you work for this large company and people are believing that you're there representing the company, not representing yourself. And yet you're being individually paid to be there, which is like, oh my god. It's so filthy. So as you can imagine and and and did you go to any of these things?
Speaker 4:Yeah. Yeah. I went to I went to one in Turnberry. Scotland.
Speaker 1:Oh my god. Oh my god. You flew the Gulfstream out there?
Speaker 4:Yeah. Yeah. Yeah. No. I flew coach, but yes.
Speaker 1:And cause the other thing is, like, of course, like, the people you actually have out there, the kind of people that are attracted to this, like, filthy arrangement, like, don't necessarily aren't the people you wanna be speaking with anyway. Like, this is not they don't necessarily have I remember Steve and I were there together talking to 1 customer, and then I we're talking about our technology. Like, oh, this is great stuff. My company will never adopt it. Like yeah.
Speaker 1:Like, oh, no. These guys are total idiots. I can't get them to do the most basic thing. But this looks great. I mean, good luck to you, honestly.
Speaker 1:If there's any way I can help you out. Like, well, you can help us out by, like, I don't know, like, evaluate. I mean, Andy was so disaffected. And he's like, why, Kai? What a what a terrible way to live.
Speaker 1:So we get to the end of the day just like where I could unspeakably naive. We get to the end of the day, and I I don't know. It'd be interesting to know if this this was this particular event because we only did one of these or this is just, like, ubiquitous. There's a raffle at the end where the customers can win like pretty neat prizes, you know, Beats, headphones, whatever, whatever.
Speaker 4:From the vendors in particular?
Speaker 1:Well, from the conference. The conference is gonna run a raffle.
Speaker 4:Got it.
Speaker 1:And our VP of marketing is like, so, who do you wanna win? So, yeah, we've got our prizes that we contributed to this kinda, like, pot of loot. And who do you, who do you wanna win? Oh my god. I don't know.
Speaker 1:Yeah. They're like, you know.
Speaker 4:Pulling I'm pulling for these guys. Like I'm
Speaker 1:pulling for these guys. Yeah. Like, you know, the the guy over here is being friendly. Yeah. Exactly.
Speaker 1:I don't know. Yeah. Right. Exactly. Like, I don't know.
Speaker 1:Like, who do I wanna win? I don't know. Yeah. We're doing this in, like, a spectator's part. He's like, no.
Speaker 1:No. Like, who should we assign as the winner? And I'm like, Brian, we can't do that. I think it's a raffle.
Speaker 4:It's a raffle. The integrity of the raffle.
Speaker 1:Honor the the integrity of the raffle. Honor the social contract of the raffle. I just remember him giving me a look like, are you fucking serious right now? I mean, he was just like, do
Speaker 4:I really? Are you this? I brought this country mountain.
Speaker 1:I brought this country mountain.
Speaker 4:Who was like, who put you in this suit that you're in? Like, I don't Like, how have you survived in the city? Like, the how
Speaker 1:so he's like, no, Brian. We designate the winner. And I'm like, we but they're they're and they were literally drawing it out of a hat, but but they're drawing it out of
Speaker 4:a hat. Like, yeah. They draw the one that we designate. There's nothing written on those pieces of paper. Like, we need to walk you through it.
Speaker 4:And I'm like, oh my god. I'm just like, it's all been revealed,
Speaker 1:and I feel just absolutely filthy. So, of course, there was the it's not clear to me if the customers knew this. Like, maybe they did maybe they didn't. But there was in this event, which is basically just, like, filthy and the customers there, it was gross and, like, they weren't real buyers. There was one person that we'd already that we'd also spoken with from a major petroleum company who seemed really sharp and really good and had a lot of budget.
Speaker 1:Seemed like a really interesting technologist and was interested in a lot of different things. So I, to this day, do not know if this guy had just, like, faked earnestness. He definitely seemed very earnest, but he was winning everything. So I assigned him as our winner too. And so they're like, and wow.
Speaker 1:Bill wins again.
Speaker 4:It's a net set.
Speaker 1:Wow. Well, this guy had this absolute heat of I mean, he like, he was at 2 dollars,000 worth of doodads, of gadgets.
Speaker 4:I'm like, this is nice stuff.
Speaker 1:It's like an Xbox, like
Speaker 3:you say.
Speaker 4:Man, that's amazing.
Speaker 6:It turns out he doesn't actually work there. He just shows up to these things.
Speaker 4:Oh, it's
Speaker 1:you know, that's a great point, Matt. I've got no idea how deep the deception goes. I that totally he totally could have been a ringer as part of the conference, and the con I mean, the conference itself was filthy and, like, they pocket the loop. That probably is true, actually. Now that I think about it, it's like the Spanish prisoner for for vendor events.
Speaker 1:I gotta go replay everything in my head. Yeah. But it was yours in Scotland? Or was it as gross?
Speaker 4:It was I don't remember quite so many raffles, but it was pretty gross. Cause it's like, I mean, it's super opulent digs. You know, I don't know how much my company paid, but it was a lot. And I and I went with actually a a very experienced, salesperson who was also very green to, our company. Like you had joined not that long ago.
Speaker 4:And one of the, you know, we, we, we had one of these speed dating sessions. 1 of the, execs we were speaking with said, you know, what are the obstacles to like rolling out this technology? And, the sales guy. And I, and I emphasize his job was to sell the product, said, looked the guy in the eye and said, well, you kind of need to change your whole organization around this product.
Speaker 1:Oh. And and and
Speaker 4:and afterwards, I was like, you know, I'm I'm just a a country mouse, but is that really how we wanna pitch it? Is that really how we wanna position the product is turn yourself inside out to adopt this thing? So I I don't know that it was very effective for us either.
Speaker 1:And we did the, the was this a multi day thing? Or how does this work?
Speaker 4:Yeah. It was like a several day thing.
Speaker 2:Oh, yeah.
Speaker 4:And it was like, you know, some of the folks were, like, sneaking off to play golf instead of going to their sessions
Speaker 1:and stuff. That's which I also can't stand. Yeah. I mean, I I the I and I made a huge the the the Linux Foundation years ago. 1 they do this this open source summit, which one of my colleagues called open source Davos, which is basically what it is.
Speaker 1:And they wanted they were hosting it at the ski resort now known as Palisades in Palisades Tahoe in in in Tahoe. And I was just so furious that they were hosting this. And they're like, no. Listen. This is about getting people away from their work environment so they can really focus on these meetings.
Speaker 1:I just remember that I apparently I don't actually remember this, but this was retold to me. And, again, this sounds like something I would say, so I can't deny it. Someone else just would be like, I remember you saying, very directly, then why don't we hold it in a motel 6 in Bakersfield? Which would have been, but needless to say, we're and, actually, someone is like, man, you must really hate skiing after that. I'm like, no.
Speaker 1:No. No. Like, I love to ski, but I like I ski with my kids, not with, like, not as part of a corporate boondoggle in the middle of the week. Like, if we wanna have a meeting, let's have a meeting. Let's not, anyway.
Speaker 1:It's all.
Speaker 3:Yeah. Corporate grossness.
Speaker 4:Well, I hate walking way way back. I wanna say I think that you thought that this Fishworks message was precipitated by some customer event, but it turned out to not be the case. What what was the incident that you thought that precipitated this message?
Speaker 1:Okay. So the one I thought that so as I have said repeatedly, we never lost anyone's data, but it took some very long vacations. At which I mean, I was proud of the fact that we didn't lose people's data.
Speaker 4:Yeah. Absolutely.
Speaker 1:The but it did take some very long vacations. And, we we you and I both endured some, some early performance pathologies of CFS, for sure. Snapshot deletion. I remember needing to make the case for making snapshot deletion, because snapshot deletion happens, the way that happens in the transaction group historically. You could end up needing to do many, many, many random reads in order to close a transaction group.
Speaker 1:And you would that the the the transaction group would effectively quirk on these random reads to do the snapshot deletion. Adam, I'm I'm sure I'm bringing
Speaker 4:No.
Speaker 1:I'm I'm I'm trying
Speaker 4:to remember if this was the same pathology that caused us like at boot single threaded to like pause for 5 days or something like that. Where we said, no, no, no, no. Your your data is still there and it is coming back. It is gonna take a minute.
Speaker 1:Yes. This is the this is similar. That was dedupe that Oh, yeah. Yeah. I think you're referring to a customer that was that he was in tears, and I was very close to tears.
Speaker 1:It's the closest I've come to crying with a customer. The, because
Speaker 4:he was like, I just want this to be the way it was last week.
Speaker 1:I'm like, I want it to be the way it was last week. It was so glorious when it worked, and now it doesn't work. And in particular, this is where d dedupe was enabled. And we had lost the dedupe table. And I determined that the thing was gonna boot on it was gonna come up by Thursday.
Speaker 4:And it was my fault. That's a vacation.
Speaker 1:Yeah. It was it it was, it was not good. The it it was really, it was it was great. But we, ultimately and then that was same thing about kind of creating these random dates. The the incident that I was thinking of was, more was one where we again, data wanted some long vacations.
Speaker 1:There's one exception to that. There was one incident of data loss, and the the that data loss was created when a someone attempting to support the appliance actually, edited the kernel from the operating system shell and changed the internal data structures, which
Speaker 4:is Holy smokes.
Speaker 1:Do you remember this?
Speaker 4:No. No. I don't remember that at all. That's I mean, that makes your, like, CV broadcast, CV signal change look like just a, you know, outpatient procedure.
Speaker 1:Oh, absolutely. No. It it
Speaker 4:definitely like, I listen. At least my
Speaker 6:my Hang on.
Speaker 1:Yeah.
Speaker 6:When when you all are doing this, is this like DD ing or is this like
Speaker 1:mdb minus k w, my friend.
Speaker 6:Just, like, do is is this a Solaris thing?
Speaker 4:No. This is in memory, not not on disk. But he's saying that you opened up basically the kernel debugger and splatted down some bytes.
Speaker 6:Oh, okay. You're you're throwing a debugger at this thing. Okay. Carry on.
Speaker 4:Yep. Brian, are you muted? Alright. Brian may be gone.
Speaker 6:Sorry, Adam. I killed him.
Speaker 4:No. No. Now now I'm I'm, like, waiting with bated breath to know, like, what had happened here. Oh, he he just DM'd me aside and said, I think I'm going to die, which means I think his his, system went down. While Brian is gone, I'm going to share one more anecdote from my experience on on support, which was a we had a customer who, was having a, you know, real hard time and one of our support engineers was working on it with them and was very explicit saying, you know, there there are some problems with this dataset.
Speaker 4:We're gonna get back to you. Oh, Brian says he's back. Brian Brian, we're gonna finish this one anecdote. Yeah. So, this one dataset may not be recoverable, but don't touch it.
Speaker 4:We're gonna deal with it later. Next day comes around, the sport engineer logs in with the customer, they have deleted the data. Now the customer is furious with us that we have told them to delete the data, that they've lost the data and so forth. And we're really at loggerheads because the customer isn't owning up to the fact that they inflict the self harm. And support engineer is is very clear and showed me the communication.
Speaker 4:I remember this very well because I was sitting in the room with one of my colleagues and and my my other colleague, the support engineer, Mantha, who also worked with us at, at Sun on the storage clients. Martha was was off in Colorado. And, I DM'd Monta and I said, look. I'm going to yell at you, but you did nothing wrong, and I want you to know that. And I start laying into him.
Speaker 4:Martha, how can you do this? We have procedures. You know. And the customer immediately fessed up. Like, the fact that there was this human face being, like, winked over the coals.
Speaker 4:He was like, you know, we did it. It was our fault. We did it. Now in the meantime, I was the the colleague who I was sitting in the room with that was not did not know that this DM exchange was happening.
Speaker 1:Oh my god.
Speaker 4:His jaw hit the floor as I started, like, laying into this colleague of ours. He
Speaker 1:had no idea that you would equip the colleague with a tackle platter? That's right. You actually you actually shot him in the style of the thing?
Speaker 4:That's right. So this was this was like a a shoot the messenger, shoot the hostage, negotiation with with the customer.
Speaker 1:Alright. Are are you if you're gonna do that to me, you're gonna get with an oxide customer. You're gonna give me some advance status. Right? Yeah.
Speaker 4:Yeah. Yeah. I'll I'll I'll I'll let you know why. You're taking friendly fire.
Speaker 1:I mean, I'll stage my own death if that's what's required.
Speaker 4:Yeah. Actually
Speaker 1:That's right. Actually, that is, that's that's wild and troubling.
Speaker 4:Well, so yes, on on both fronts. And I agree that that, you know, perhaps this doesn't violate this violates a certain amount of transparency.
Speaker 1:Your methods are unorthodox.
Speaker 4:That's true. But, like, what what do you do with, your when when you have folks who are who are, like, ultimately being dishonest.
Speaker 1:Yeah. Yeah.
Speaker 4:Alright. So that's so back
Speaker 6:I wonder I I wonder how many of the issues here just come from the business model of how support gets done. Right. Because it seems like, you know, for a lot of enterprise stuff, what you get is you get basically a fixed per year supply support price, which is basically just paying neglect you, versus, like, you know, a time and materials type of support thing where if it turns out they suck, they don't make any money.
Speaker 4:But you're you're you've got a single vendor. You know? Like, you can't like, if if Dell support can't fix it, you're you're kind of beholden to them.
Speaker 6:But, like, also, at some point, you are doing things, and you're just a man without a country no matter what. Right? Like, I mean, was Dell any use to you at Giant, Brian?
Speaker 4:Like, I
Speaker 6:mean, if if
Speaker 1:you're not
Speaker 6:doing things that are that far out, like, at some point, no one can help you anymore. You're on your own.
Speaker 1:I I mean, I
Speaker 6:crossed over the Rocky Mountains with your dragon train and, like, well, looks like you're
Speaker 4:the Donner party now. I I
Speaker 1:I definitely agree with the sentiment. I I I like the phrase this is help. It's not on the way. And so I definitely I I do appreciate that at some level. I know it's like I I definitely think, like, Dell could have helped.
Speaker 1:I mean, there are and I do think that, like, I I I think when people pay for infrastructure, whether it's in a SaaS model or if it's on prem, I think it is part of the responsibility of like, part of what should be included in that price is the price is is standing by it and supporting it, which is not necessarily easy. But I I agree with you that there is that you get these kind of, unfortunately, perverse set
Speaker 6:perverse set of incentives where you're effectively paying Dell to neglect you, not to help you.
Speaker 4:Okay. Well, because they
Speaker 6:At the end of the day, they make more money from just telling you, oh, it's no one else sees this and run away.
Speaker 1:Oh god
Speaker 4:Brian so you were when last we heard from you, the customer had just had just in core modified the kernel.
Speaker 1:Oh, yes. So so and the, so it effectively they modified the in core Vdev, and, ZFS was simply not what I mean, they effectively changed the implementation of ZFS, and there was the interruption.
Speaker 4:As like, like, what else could it have done?
Speaker 1:Yeah. And the the yeah. That was definitely a wake
Speaker 4:up call. Did did the and did they own up to that?
Speaker 1:Yes. Yes. I I think that they did. Again, this was a sun the the customer did not do this. This was done by Sun Support.
Speaker 4:Oh. Oh my goodness. Holy smokes. That's terrible.
Speaker 1:It was not good. It was
Speaker 4:not good. And it's, like, it's so bad that you're, like, okay. We actually need.
Speaker 1:There's something deeper here than just, like, a single person acting irresponsibly. Like, you when when something like that happens, you have to look to something systemic. And that that there were definitely more
Speaker 4:systemic Absolutely. I mean, it's it's like the the, you know, the aphorism about SaaS, like, where when if someone were able to, you know, detonate the production database on the 1st day, like, it's not the person that's a problem. It's the systems that are the problem.
Speaker 1:It's the system that are the problem. Perhaps exacerbated or certainly exacerbated by a person acting sloppily, but there's a larger system that contains that person that way has to be examined. And, yeah, there were, there were a lot of challenges there. That the and and I think the other thing that's important to remember is, like, the they wouldn't have been doing this at all if there had not been just what you were mentioning about I thought it was interesting, Adam, your point about the customer taking control of the shell. That was not their first move.
Speaker 1:That was like a bunch of other things had failed. And now they're at a level of desperation that where they felt that that was And I think that that's always important to keep in mind in these cases is that you got the levels of desperation are getting higher and higher and higher. And that was also, like, that was a really rough that was during that rough for the 1st year of that product was super, super rough because it was too successful, And we sold too many of them.
Speaker 4:Yeah. The out stripped our ability to support it as well as we wanted to to educate internally, educate externally, you know, build the systems for that kind of scale, the whole thing.
Speaker 1:Yeah. And we got there, but we we built it behind it.
Speaker 4:So yeah. Well, that's it. That was a that was a meandering walk
Speaker 1:through James Garfield and
Speaker 4:Yeah. Didn't have that on the bingo card for tonight.
Speaker 6:Was there even a plausible reason to go running around editing in memory data structures in the kernel?
Speaker 1:Like, is there ever?
Speaker 6:Well, I mean, like, clearly, there are
Speaker 1:I mean, there are
Speaker 6:definitely reasons to go hand editing anything on a system. But, like, was were the reasons good at the time to justify why this was happening? No. Or was this just like
Speaker 1:No. No. And this is one of these cases where, you know, there are things that I'm willing to do on a running system. That's not among them. And when you are and, indeed, like, when we're talking about changing that branch earlier, I was going to do that because it's a single byte.
Speaker 1:When you are and it's programmed text and the the the you're very bound at about the failure modes, going in and changing data structures of a running system is is is
Speaker 4:particular, ones that are related to persistent data where one false move, and you haven't just bounced the system, like, if, you know, if you if you had gotten the wrong byte, probably the system just detonates. If he gets the wrong byte, now forever, this is, like, the data is garbage.
Speaker 1:Yeah. And indeed, we we only piece this all this together long after the fact. It was really, really brutal. Persistence is hard as it turns out. And when you have, because you you have defects can can leap this fire line as Adam is describing.
Speaker 1:And you it's forever. And it can be really, really brutal. So, yeah, it, it it it it's Matt, are you I
Speaker 3:I this is gonna
Speaker 1:be I'm worried now. I'm, like, with bated breath as someone who I've definitely worked with on infrastructure problems, are you Hey.
Speaker 3:Hey, Brian. So I, my first job ever was basically on the receiving end of Sun Support, and I definitely ran
Speaker 4:You're welcome.
Speaker 3:Ran a bunch of sun machines. And, I mean, in general, everyone that I talked to was super helpful. But I I I I gotta say, I knew what m db minus k was, and how did I know that? Because, like, it's in, like, various docs, and, like, people will tell you to do it. Like, I never did it because I was terrified of it.
Speaker 3:But, like, I knew what it was, and I knew what it did. Like, how did I, like, how would I, lowly I mean, I was, like, a college student. And they were like, oh, yeah. Maybe you should mdb minus k that thing. It's like, I don't know.
Speaker 3:Oh, man.
Speaker 1:Okay. I would like to point out the difference between m d minus k and m d minus k w. The w is load bearing. So the w is what allows you to write to the system. Yeah.
Speaker 1:Yeah. And b minus k only allows you to observe
Speaker 3:the system. I see. Fair enough. Well, anyway, I somehow people were telling me to, like, do this stuff. But, yeah, I guess, a fair point.
Speaker 3:May maybe they weren't telling me to to use my SW. But, you know, it's been a long time, and they maybe were telling me to do that, and I don't remember that.
Speaker 1:Right. But and and but this is a good point in terms of, like so I Matt, it's the lesson there. What's the lesson there? Should we
Speaker 4:have not documented it? Should we have not?
Speaker 3:No. I'm just saying. I think it's naive to assume that, like, no one knew what that Yes. Technology was. It's, like, kind of
Speaker 1:Oh, for sure. Kind
Speaker 3:of interesting. And, like, the support staff was definitely telling people about it. I mean yeah. Otherwise, how would I know?
Speaker 1:For sure. And in fact, we actually had one of the the kind of decision points we had around DTrace was to whether to allow DTrace to modify the system. And we came to what I think is kind of a classist, but that's surely due to me, that DTrace cannot modify the kernel, but it can modify processes.
Speaker 4:It is kind of classist.
Speaker 1:Yeah. It's kind of terrible.
Speaker 3:I don't know. I feel like that's maybe appropriate in this case.
Speaker 1:Okay. And and Matt, I don't know if you're saying that's appropriate or it's, like, apt. It's, like, not surprising given you turkeys or if it's, like, that's actually the right decision.
Speaker 3:I mean, you I think maybe a little bit of both.
Speaker 4:Apparently. Okay. Exactly.
Speaker 1:Yeah. So we've got the ability and we and now did you ever use CopyOut in anger, Adam?
Speaker 4:Yeah. Absolutely. And even more so on x86 because, you know, on on 32 bit x86 a lot because, you know, almost nothing was in registers. So there was lots of opportunities to change the behavior of the program almost arbitrarily.
Speaker 1:And so and you would you would actually do this as part of like
Speaker 4:Yeah. Yeah. I I I did it for example, to, like, even in some sort of known debugging situations before the is enabled USDT probes to, to kind of like, change, how things were operating? Like, what which which branch to take and that kind of stuff?
Speaker 1:Matt, I would like to say that I'd never used copy out of any of your processes. I just wanna, like
Speaker 4:I
Speaker 1:I don't wanna do that.
Speaker 4:On any of your processes. Right? There's always other people.
Speaker 1:Hey. Hey. He doesn't need you to parse my language. He's fine. I you know, just
Speaker 3:I'm fine. I feel like we had a very different relationship when you were personally involved in yeah. I mean, good. I don't know. I kind of, I have an increased risk tolerance.
Speaker 4:So you
Speaker 3:know what I mean? Like, I I kind of like people who are like, you know what? Let's try to fix this. It might work, but it might actually work.
Speaker 4:It might actually work. Alright. Maybe I should
Speaker 1:have used copy out. I'm sorry. I didn't. The but you don't actually never heard. Well, I well, it also should be said that with Matt, with most of what we're debugging.
Speaker 1:Those were actually we're JavaScript or no processes. So it's like copy out is the Yeah. Good luck with that.
Speaker 4:Good luck with that. What are you gonna copy out to where? Yeah.
Speaker 1:Yeah. Yeah. And the things are getting moved around a bunch. And which I have to say, I don't miss. Having been it is nice to be doing everything in Rust where things are not g c'd, not moving around, where we can actually we can actually use copy out recklessly now.
Speaker 1:Poor Rust. You've been so safe. And you now have this completely unsafe monster that's actually instrumenting you. Well, it's been a fun tour. Adam, I wanna let you get to your get to your family.
Speaker 1:I know that you're
Speaker 4:that they're all probably I got the barbarians at the gate here at the club.
Speaker 1:Exactly. But the the a fun tour from a from a and I well, I have to do some more of these on support. I think, like, support
Speaker 5:is I know it's something
Speaker 1:that's really important to me and it's important to you, Adam, and I would love to get more folks in here who end up needing to deal with do with folks that are trying to use systems when they're not working and how you navigate that because it's tricky.
Speaker 4:Oh, and it's this amazing blend of, like, psychology and technical skills. Right? Keeping the customers, like, who are who are experiencing these things happy while trying to navigate the technical details and navigating your own organization and teammates and trying to get to the root cause of these things. It's it's an incredible skill
Speaker 1:and and and staging a staging a murder of a of a colleague
Speaker 4:in the meantime and
Speaker 1:I'll I'll In
Speaker 4:the meantime, allowing your boss to, like, execute you in front of a customer doesn't work. Yeah.
Speaker 1:Hey, listen. It worked. I can't argue
Speaker 4:with that. Right. Right. Results. Results.
Speaker 4:That's it.
Speaker 1:Alright. Well, thanks for joining everyone. We'll, we'll talk to you next week.
Speaker 4:Yeah. Thanks everyone.
Speaker 1:Bye.