Supercomputers, Cray, and How Sun Picked SGI's Pocket
Alrighty. Adam, are you there?
Speaker 2:Did you see the the Twitter space I was in 5 minutes ago?
Speaker 1:Were you actually in that Twitter space?
Speaker 2:Yeah. I mean, only for long enough to hear someone making the claim that Bitcoin is going to a1000000 and that Makes sense. She was planning,
Speaker 3:if I'm
Speaker 2:I don't understand this nomenclature, on drinking a cup full of NoCoiner's tears this evening.
Speaker 1:Okay. Well, how how did no am I wait a minute. Am I annoyed? Is that no coiner the opposite of diamond hands? Is that what I am?
Speaker 2:I I don't I I I I don't even know what a diamond hands is.
Speaker 1:Oh. I I feel like I'm
Speaker 2:I'm probably a no coiner and not diamond hands if
Speaker 4:if I had
Speaker 1:to guess. I mean, I had to pick between no coiner and diamond hands. It feels like I'm more of a no coiner, but you know what? I I don't even know. I don't even know.
Speaker 1:Yeah. So Tom is here. Tom, we we were just talking about your very enticing tweet about the Soviet Seymour Cray.
Speaker 3:Oh, yeah. He is the character.
Speaker 1:Okay. So we gotta start with it. So I just got people want us to put the subject at the top. We are talking about, I got a book recommendation from George b on Twitter, after we did our space, whatever it was 8 weeks ago, about the Superman on the story of Seymour Cray, but of supercomputing more generally, which I just finished and loved. There's so there's so much to talk about there.
Speaker 1:But before we talk about that, Tom, now I wanna talk I just wanna hear about this guy. Tell me the his story. How did you meet him?
Speaker 3:Well, you you probably know that Sun had several things going on in Russia after the wall came down. And Dave Ditzel was driving this relationship with Boris Babayan. And he he was kind of the you know, people spoke of him reverentially because he got he got a lot of stuff done in spite of the fact that Russia really had no infrastructure to build state of the art things. And I I really only got to meet him for one little meeting. But,
Speaker 1:And what was the subject of the meeting? I mean, were you good to like what was that?
Speaker 3:I think it was very early on, basically. Hi. We're from Sun. Maybe we should do something together.
Speaker 1:And this what year is this?
Speaker 3:This was 92. Oh, man.
Speaker 1:This is like I mean, this is only 3 years after or 2, like, maybe depending on when it was in 92. Yeah. And it is I mean, I Tom, I find that it is hard to express to folks who post date. I mean, Adam, do you remember the wall coming down?
Speaker 2:Sure. I remember the wall coming down, but I was but I was a kid. But I remember it as a moment of significance. But, you know, from I I think it was 10.
Speaker 1:Okay. So you may be a bit too young to appreciate how permanent the wall felt when it was up. It felt like I mean, it just felt like this was never gonna change. I don't know, Tom. That's how I felt.
Speaker 3:Oh, yeah. Yeah. And and so when when I did the trip in 92, people were still in this kind of state of euphoria about things are open, and we're gonna be normal and have great relationships and blah blah blah. And, it was a really, really fun trip to Moscow.
Speaker 2:That must have been unbelievable. Yeah.
Speaker 3:I know. Seriously. With much much vodka.
Speaker 1:Is it and is is this the winter, the summer? When is this?
Speaker 3:Summer. Beautiful summer. Oh my god. Most of mostly, we were talking to this other group, headed by Sasha Galitzky, who has since become an international venture capitalist. And, in fact, the the guy I worked closely with his son, Jeff Baer, who started up this relationship with Sasha's group.
Speaker 3:Now the 2 of them are venture capitalists, and you may know them at at whatever that firm is.
Speaker 1:Oh, that one. The one that passed on us.
Speaker 3:We're gonna have to
Speaker 2:be a lot more specific. They're rooting from the side.
Speaker 1:They're rooting from the side.
Speaker 3:Yeah. I'm I'm blanking on the name of the firm, but but they they specialized in, you know, east west deals until things got really bad with the relationships. And so now now they're a little bit more normal.
Speaker 2:And so, Tom, what did Russian, you know, and then, you know, before that Soviet Computers look like or or Soviet Computers as they emerged out of the Soviet Union look like it when you visited?
Speaker 3:Well, they they mostly cloned Western computers. So they cloned the IBM 360. They cloned the PDP elevens and PDP eights.
Speaker 2:This is like the concord ski of PDP elevens.
Speaker 3:Yeah. But the whole manufacturing infrastructure was so weak. Now the the best story I heard, though, is they they cloned the Sun 3. And, but the trouble is the they couldn't get the processors, and, you know, they were building their own processors. And every processor was broken.
Speaker 3:So, in fact, they ended up when it worked, it you had a lot of custom kernel hacks, processor, Not per type of processor, per processor. And so it was just pathetic.
Speaker 2:Wow. That's that's pretty incredible.
Speaker 3:So so it's like it's like what the UNIVAC one guys were doing back in the fifties. Just it's, you know, total total lack of manufacturing
Speaker 5:for us.
Speaker 2:So each one off the line would have its own sort of customized collection of stuff?
Speaker 3:Yeah. You do. What whatever hack it took to kinda make it work. And it I think it was coming from the Zill Limousine Factory.
Speaker 6:That was the other key. That's incredible.
Speaker 3:So, you know,
Speaker 2:And so did they have any real de novo systems of their own, or or was it all just clones?
Speaker 3:Yeah. Well well, Boris Bobby Arnold is known for the Elbras family. So that those are pretty architecturally interesting. Wikipedia talks a little bit about them, but, you know, super scalar way way before other people were doing that. But, you know, again, they just kinda just barely worked and not not very impressive clock speeds.
Speaker 1:I Yeah. That that's crazy. In an act of supreme cruelty supreme cruelty Uh-oh. I'm actually, active in active supreme cruelty, I'm having to talk with myself. In an active supreme cruelty, I, Tom, the Twitter app decided to exhaust all memory as it as as you were telling the son 3 clones.
Speaker 1:Sorry. I'm gonna have to pick that up on the recording. But, it's the the the this whole I mean, Adam, once you watch a movie about Tom going to Moscow in 1992, I just feel like there's there's so much there. I
Speaker 6:would make that movie.
Speaker 1:Are you kidding me? I would
Speaker 2:I would I would write that screenplay if I could. That'd be that'd
Speaker 1:be so much fun. Oh, man. It just feels like that. Tom, it did I must have felt that that way even at the time to realize that, like, the that this probably felt historic, I imagine. Is Tom there?
Speaker 3:Oh, dear.
Speaker 1:Oh, no. This is you know, Twitter spaces, you're doing better. And now oh, there's Tom, maybe. I think Tom got bounced out. Oh, man.
Speaker 1:Alright. We're gonna well, we're oh, there's Tom. Tom,
Speaker 3:you back? Testing? Testing?
Speaker 1:Yeah. Yeah. You're here. You're here. Yeah.
Speaker 3:Good. Yeah. So so anyway, I I actually wasn't one of the main people going to Moscow, but I was on my way to Geneva, and so I tagged along. And, the Geneva story is a whole another story, but that was fun too.
Speaker 1:So We how did you, so how did you end up meeting Boris? How did that what what was the context for that meeting?
Speaker 3:I don't remember. Dave did Saul somehow knew about him and had started a relationship. And then then eventually, Sun bought his whole group, and they they worked for Sun for, like, 12 years.
Speaker 1:Is this the Saint Petersburg? Is this the Saint Petersburg group? Was it was this where were they based? Okay.
Speaker 3:They're they're based in Moscow. So there were there were 2 Moscow groups. Okay. There was Boris' group and there was, Sasha's group. Sasha's group, worked on VPN and networking stuff.
Speaker 3:And, he he hit before, you know, before the wall fell, Sasha was working on satellite networking technology, which was pretty amazing. But then the Saint Petersburg group was after this trip. I know somebody found them, and then that's where a lot of cool people came from.
Speaker 1:Yeah. Yeah. Certainly, there were some extreme we had some extremely bright colleagues, out of the the the Sun offices in Russia, certainly. But that was been amazing, and I would love to know more about the the the kind of computers that he built. And so, Tom, you said that you you never met Seymour Cray.
Speaker 1:No. I remember him dying vividly, because I was the the I was with some sun engineers when he died. I just remember John Johnson, one of our colleagues proposing a nanosecond of silence to memorialize Seymour Cray, which at this time, I remember I don't know. Now I I can't I can't figure out if that's, like if if that's appropriate or not. I don't know.
Speaker 1:But, you know, it was what it was. But the Tom, have you read this book, The Superman?
Speaker 3:Yeah. In fact, I I just reread it because I saw saw it was coming.
Speaker 1:I I this book is mesmerizing to me. I I I felt like there's a lot. So the the so, Adam, a couple of things that that I felt I came away from the the the book from book with. So one, the something that we don't really appreciate about Moore's Law is how it meet it's not just doing, more with less. It's not just or not just about about greater transistor density.
Speaker 1:It's not just about greater transistors per dollar. It's also about much more power per watt, much more like, many more transistors per watt, much more compute power per watt, much more efficiency. And, Tom, I can't believe how how power hungry these machines were. So, Adam, they had a a board, like, a daughter board in, I think, the 66100 that the CDC 66100 that was 3 k w. It's like Abort.
Speaker 1:Abort. Yeah. And it's like, oh, no. It's like, well, how do you even cool that? Like, oh, well, no problem.
Speaker 1:We're just going to use we'll we'll we'll use liquid cooling. You're like, okay. And so they on the gray 2, they are spraying this thing with Freon because water cooling, it is is insufficient. It just
Speaker 3:Yeah. Yeah. I I heard back back at Princeton, yeah, we had the 31691, which this book talks about never never working, but it worked fine. But it it was water cooled. And and one day, the IBM guy forgot to turn on the water when booting the machine, and they they had a core meltdown.
Speaker 1:Okay. So that's interesting. Yeah. The this book does talk about the 360 model 90 as it ended up being the subject of a lawsuit between CDC and IBM because they were accusing that machine of being entirely paperware, but you actually you had one.
Speaker 3:Well, it it it was way preannounced. They they announced the model 90, and then later on, changed the announcement to 9 to 91 and 92. And then 92 never shipped, but eventually a 95 shipped. You know? But it it was a a killer machine, but it wasn't quite 66100.
Speaker 1:And so what was the difference between I assume the 360 line well enough to know. Because I know, like, the the 85 was the top of the line when they originally ran to 3 60. Right? That was the the machine on
Speaker 3:I think 85 wasn't in the original announcement.
Speaker 1:Oh, was it not? Okay. The 85 is famous to me because that is the first machine to have caches.
Speaker 3:Right.
Speaker 1:And I I just love I know I've retold this story before, but I just love it so much about the the fact that when they developed the cache, they were describing the approach they took for the IBM Systems Journal. And they had not as far as they were concerned, they were buffering memory. So we're gonna call this a a buffer as a memory buffer. And the the editor of the systems journal was like, wow. This seems like a really important concept.
Speaker 1:But, like, I don't know. Muffer? Like, can we do better than Muffer? And that's how like, brainstorming with the engineers. He came up with the idea.
Speaker 1:Like, it sounds like you're kind of, like, leaving something and then coming back later to it like a like a cache. Like a and like, yeah. Yeah. Like a cache. Like, alright.
Speaker 1:Let's call it a cache. Yeah. Much better name. But so the so the the 90 is obviously except was it 85 water cooled? Or is it just the the the 91, 92, 95?
Speaker 3:It it probably was. That one one of the key differentiation of of the Amdahl machines in the seventies was that they were air cooled instead of water cooled. So I think most of the higher end IBM stuff was water cooled.
Speaker 2:So so, Brian, in in addition to the 3 k w, can you give us some other specs? Like, what what was a supercomputer?
Speaker 1:Okay. So the one that the well, so there are a couple of interesting things. One, it and, you know, I don't know how much of the book to take is apocryphal, especially given what Tom is saying. But the, one thing that's interesting about Cray is he it actually reminds me so much of of Corey Cal Clarence Kelly Johnson at Skunk Works, about he's deliberately not trying to be on the bleeding edge of everything. So in particular well, in particular, he he did not really believe in the microprocessor, which, I didn't really realize that you there were mal microprocessor malcontents.
Speaker 1:But he he felt that he was gonna get better performance by using effectively discrete components than than using a microprocessor, which is probably true at the time, I guess. I mean, he was certainly getting ridiculous clock speeds that were far greater than any than any microprocessor.
Speaker 2:Is that just because the process wasn't there yet?
Speaker 1:That's my read on it, Tom. I don't know if you got I mean, you know, you were definitely on the scene then. I mean, what what what was the rationale? For when was Amdall? Amdall was also not using a microprocessor.
Speaker 1:Right?
Speaker 3:No. No. Yeah. The seventies was way too early for serious microprocessors. You know, the you know, I'd I'd be surprised if there is any real supercomputing done on them before 1985.
Speaker 3:It
Speaker 1:on micro so there you go. Yeah. So so Cray is so these are all with effectively discrete components. So he is all about getting better performance with shorter path length. And so that means he's jamming more and more components into smaller and smaller area.
Speaker 1:So a lot of what they're doing is refrigeration. I mean, that's like he at both at ERA and then CDC and then Cray and then Cray Computer. Because another big theme I have to say is these folks, getting funding at one company and then that company becoming kind of ossified with management. And in, you know, very familiar themes then rebelling and going to a new company. And so Cray, they all lead e r a together, and they form CDC.
Speaker 1:They they form CDC by, like, getting money from, like, friends and family in the in the sixties. So basically, like, selling the company at Tupperware parties was my read of that. Tom, I don't know if you had the Yeah. Which was and they so they they they were able to raise money to start control data. Obviously, very important company.
Speaker 1:And then Cray Seymour Cray is at CDC. He does so one thing I did not know, the c the CDC 1604, which I think I think, Tom, I even I I showed you my beloved manual for the CDC 1604. The that number the actual number for that, comes from the e r a 1103, which they all worked on at e r a, added to their street address, which was 501 Park in Minneapolis, which I felt like it feels like something don't change. Adam, does not feel like something
Speaker 2:That that totally wonderful. And you're right. Like, they're out to lunch and
Speaker 1:I think we have, like, a port that was 450 for 450 mission. I feel like we used 450 mission in some something that we
Speaker 2:We had we had a sentinel value in, in the Fishworks product, which had to do with the day that we all sort of dropped our books simultaneously and told management they were leaving to
Speaker 6:go do this other thing.
Speaker 1:Which was that was delicious, by the way. That was like we all all 5 of us separately told our management that we were leaving at the same moment, like Godfather style, which I think we even called it Godfather style at the time, if I remember it correctly.
Speaker 2:That's right. Or we named it the Red Wedding.
Speaker 1:And then I think that's exactly what I'm trying to think. Well so but this is a very common theme because this basically happens again and again and again for super computing in particular. And so they leave. He leaves, CDC, he decides, has its way and leaves to go start crate research. The then CDC has this I think the thing was supercomputing.
Speaker 1:They are doing it's like I swear, Adam, if you think that we at Oxide are nuts, the the history of supercomputing will assure you that there are people much crazier than we are. I mean, like, literally, it it's like it's like Oxide, but a factor of 10 in every single dimension. They raise a factor of 10 more money with a factor of 10 more risk. And when these companies end, they fly into the mountain. Like, every single one
Speaker 7:of them. That's sort of by design and by definition, though. Right? I mean, if you are building an oxide rack, you you are by definition not building a supercomputer. Right.
Speaker 7:Like it's supercomputing is just defined by the fact that it's an order of magnitude past what
Speaker 1:anyone would
Speaker 3:consider reasonable. That's right.
Speaker 7:I think, you know, that's a good point. Anyone would consider reasonable.
Speaker 1:That's right. I think, you know, that's good point.
Speaker 7:And, like, even even on the 3 k w number, that's honestly not that extreme. Right? Like, I mean, if you think about people like Cerebras today, that wafer chip thing is 23 k w.
Speaker 1:Yes. Which is which is crazy. I mean, it's and they had, like, they had to solve many unsolved problems in order to be able to I mean, dealing with with the problems in order to be able to deal I mean, dealing with with the thermals there has been a huge challenge to the point where and, I mean, we get because we share a board member with Cerebras, I get to answer a lot of thermal questions. I have to reassure Pierre that, like, we are thermally nowhere near as adventurous as Cerebras is. But the the the and I I mean, I think you're you're right, man.
Speaker 1:There's something catalogical there that, like, it's a supercomputer. So, of course, they are are going for 10x. X. The thing that is surprising to me is that these companies, when they fail, right into the mountainside. Like, people learn that they fail because they go to work and, like, they've been locked out work.
Speaker 1:Like, the badges doesn't don't work. And that I feel is a bit Why is that
Speaker 7:so surprising, If you build a supercomputer that is not the superiest supercomputer there is, no one cares because, like, that's old news.
Speaker 1:Yeah. You know, you're right. You're right.
Speaker 3:I mean, if if Like If all if all you're selling is the fact that you're number 1 and you're not number 1 anymore, you're in big trouble.
Speaker 1:No. You're right. This is this makes it makes total sense. It just is brutal because you keep reading about it over and over and over and over again. And so you and I remember this happened.
Speaker 1:And, Adam, I was asking you earlier if you remember my ETA systems phase. So ETA systems was a a a skunk works effectively inside the CDC to do a new super computer. And they really interesting stuff. Do you remember ETA, Tom? I don't know if you if you had any overlap or
Speaker 3:I didn't know much at the time, but I've read a fair amount about it.
Speaker 1:And I think I've I, I think, actually, Larry McEvoy worked at ETA for, like, a summer. But the so the the ETA is making supercomputer and just like, Tom, what you and Matt are saying, you know, they're vying for number 1 and they're they're they're they're not getting there. And so these guys all come to work and, like, they've been locked out of work. And the the doors are locked. Buses is come.
Speaker 1:They pick them all up. They go to a a theater in Saint Paul. And they think that they are gonna hear that they the company's been acquired. But CDC gets on the stage and it's like, you're all out of a job, basically. Like, you've been there is no ETA.
Speaker 1:And even though you and I just remember when I reading about this and just being mesmerized by it. I it's feel it's feels so graphic. You know? I mean, Adam, we've lived charmed lives and that we've I I totally I have never lost my job this way, and it feels like it would be really upsetting.
Speaker 2:Oh, for, like, the company to run out of money?
Speaker 1:To to run out of money in this, like, suddenly. It's not like, okay. Yeah. Things are getting worse. Like, I mean, like, okay.
Speaker 1:Yeah. We know we're struggling, but we're still kind of collectively believing. And then, like, bow it in the brain. It's all it's all over.
Speaker 2:Well, I think that's part of, like,
Speaker 3:what we're doing. The story? How about the story about the ERI guys who one day they wake up and the company has been sold? Like, wait. I I thought I was running the company.
Speaker 1:Right. Yeah. Exactly. Reading about it in the paper. But so it so ETA and I actually remember and so, Adam, I there was I I wanted to, like, in part because I was having to go out to Minneapolis to Bertrand was going to the University of Minnesota for some reason.
Speaker 1:And I was really interested to learn as much as possible about this event. And so I talked to a couple of people that were there. And they I mean, they told me that, like, it was even, like, crazier than it sounds, and it people didn't really see it coming. The this one guy described for me going over to a colleague's house at that formerly ex colleague, and they are sending, faxing resumes out to Pyramid Computer. And they're faxing resumes out, and then and then Pyramid is faxing back a response on every resume.
Speaker 1:And they're all hired. So they're basically, like they're all huddled together in this house, all getting jobs in California, and then everybody's packing up and leaving for California.
Speaker 2:That's crazy.
Speaker 1:Is that crazy?
Speaker 6:Yeah.
Speaker 1:I just felt like I felt like is this, like, John Steinbeck writing a novel about computing or something. It just felt like it's like this very and, you know, because people were giving up on effectively Minnesota as at the same time as they were leaving it or or had been axed from ETA. But this happens again and again. So, Adam, you're asking about some of these extremes. So Steve Chen is this guy at Cray.
Speaker 1:So one thing about but nothing about Seymour Cray. Man, that guy does not like, what's the right word? He built something and then immediately wants to go on to the next thing. So he does not wanna see things through I really like to see things all the way, like, into a customer's hands. I think, Adam, you and I have that shit same disposition, certainly, of wanting to see it from initial idea all the way to running in production.
Speaker 1:And I wanna know, like, when it falls down along the way, I wanna know why and I wanna pick it back up. And Seymour Cray does not have this interest.
Speaker 2:Sounds a lot simpler.
Speaker 1:Yeah. It is kinda simpler to me, it's like alright. So he just, like, goes on to the next step. So he is, like, the crate one is effectively not even done, and he's already decided that, like, no. No.
Speaker 1:Like, we did it all wrong there. I'm gonna go do the the one true machine in the crate 2. And he's not interested at all in the crate 1. And in particular, he's got zero interest in software compatibility between the it's like just software compatibility, not something he cares about at all.
Speaker 7:I I mean, software compatibility is also not really a cultural value in supercomputing.
Speaker 1:Well, yes.
Speaker 7:You will build the software from scratch, of course. Obviously. Oh. Right? Like
Speaker 3:Well, but but then, Cray would have would have died a lot sooner if they hadn't done the Cray XMP.
Speaker 1:Well, that's it. And so the Cray XMP was led by Steve's well, uncleared. Steve Chen very Steve Chen, according to the book anyway, very clearly we're gonna take credit for the the the, and totally unclear how much credit is due or not due. But the, Steve Chen becomes the face of of the Cray XMP and then later the YMP, which is Adam, this is basically the Cray 1 kind of the continuing to Cray 1 key maintaining binary compatibility with the Cray 1 in in particular. Well, Cray goes off to the the the Cray 2.
Speaker 1:He then leaves Cray, and goes ultimately separates into what becomes Cray Computers opposed to Cray Research. Steve Chen then leaves Cray. And this is like sorry. I'm weeding up to this example of, like, just nuttiness. So he leaves Cray.
Speaker 1:IBM decides to fund Steve Chen's new supercomputer startup. This is in 1988. IBM puts a $150,000,000 into the company.
Speaker 2:That's that's a lot of money. I mean, now and then.
Speaker 1:Like, that would be a ton of money now. It's like like that is that's like rate it'll be like raising a $600,000,000 seed now. You know?
Speaker 8:It it might be worth, keeping in mind who buy these machines. Right? So, like, maybe it it's it's more it's more useful to think of supercomputing kind of like aerospace. Right? It's like the government is going to pay whatever it needs to build the machine that's going to let it simulate, you know, material degradation in nuclear weapons.
Speaker 8:That's what these machines are built for. So, maybe it's not so surprising that these things are were so well funded.
Speaker 3:Yeah. That that's what always bothered me about the the HPC world is it it's not actual economics. It's not actually economics.
Speaker 1:That's right. Yeah. It's not actual economics, and it doesn't force rational economic decisions. And so the the Steve Chen startup, this this outlandish a $150,000,000 in. So, Adam, they are making a they make a 78 layer board, Which is That's
Speaker 2:a lot of layers. I How how how thick was that?
Speaker 1:Great question. So, definitely, folks at Oxide were kinda doing the math on that. And Rick is like, that's, like, at least, like, a half an inch thick.
Speaker 2:It's like plywood at
Speaker 1:this point. It's plywood. It's plywood. And, like, there are lots of I mean, if you gotta back drill that thing, like, that that's it's just it's just nutty. So and it and they
Speaker 7:How many layers did you say again? Exactly. 78. Maybe a quarter of an inch?
Speaker 1:70 sure. I mean, it's, I mean, we have got a what? We've got a a 28 and a 16 layer board, and both those things are considered to be, like, big boards. 78 is a very large
Speaker 2:I mean, the fact that we're using, like, English, you know, woodworking measurements, like quarter inch, half inch
Speaker 1:That's right. Exactly.
Speaker 2:Rather than, like, millimeters, explains
Speaker 1:how big it is. It is a it is a board that is 4 fathoms deep. Yeah. It is,
Speaker 3:You're gonna need a bigger ruler.
Speaker 1:You're gonna need a bigger ruler.
Speaker 8:Maybe it has something to do with signal integrity or something. Like, maybe every second layer is like a ground plane or something that's
Speaker 1:like that. For sure. I mean, I'm sure that there are, like, somewhat rational reasons, but it's also happening at a time
Speaker 7:where are they building this in?
Speaker 1:That was in sorry. 1988. Which I mean, by the way, like, that's not right about the the vision for the future. I mean, that's the other thing. It's like these guys actually were all kind of like I mean, don't wanna put it too sharp a point out on it, but they were kinda wrong.
Speaker 1:Right? In the end, it was pretty clear by 1988 that was probably not the trajectory. Certainly by the mid nineties, it was it it was crystal clear. And this was a a dead end, I would say.
Speaker 2:So aren't all supercomputers hot rods? I mean, necessarily hot rods, but hot rods in the sense of, like, they're one offs and they're performance art and they are never meant to be a platform or or or like the next version is supposed to be starting from scratch anyway. Is it isn't that true throughout the history of supercomputing?
Speaker 7:I mean, there are small batch things. Right? Like like IBM BlueJeans. You know, there's probably a 100 of those in the world.
Speaker 6:So I worked on a some compilation, which was pretty much commodity x 86 hardware with a third party interconnect. There's sort of a difference between grand challenge supercomputing where you are aiming for a scalability and a problem type that addresses some very big patrons' needs versus service bureau supercomputing, which is what nci.org.au did for, a long time and still does it, although I stopped being associated with them. And, so, yeah, the Sun Constellation system that we had, which was 1500 processes, You know, it was a reasonably big machine. It was biggest in Australia, but it wasn't by any means a, grand challenge machine.
Speaker 1:Yeah. It was just it was commodity CPUs strung together. And it's like, ultimately, that's a better use of kind of of humanity. It's like One of the things about that was that it it
Speaker 6:existed within a software, framework by then of, distributed style message passing, support, rather than write, you know, screaming vectorised loops and maybe split them up across a few CPUs which all had access to the same memory, which was interleaved umpti bazillion times, which was the old, you know, CDC 7,600, and, later, you know, 2 0 5, you know, the 5 12 way interleaved memories. And I think, well, anyway, the 5 12 bit memories. But anyway, crazy high bandwidth memories, but shared memories between a bunch of CPUs. When the software support for message passing actually got to a point where you could actually scale out around, about clusters, I think that was a, you know, that was when, I think, even Seymour Cray had agreed, shortly before his passing that, he had been sort of hit over the head by microprocessors enough to kind of accept that they had a place.
Speaker 1:Yeah. Microprocessors had a place. And they were he so in particular, Adam, he goes down the gallium arsenide route. He that we're gonna make silicon we're gonna be faster switching than silicon can offer. So that that was kind of his angle, especially at Cray Computer, working on the Cray 3 and Cray 4, a company that also ends with people being locked out of their workplace.
Speaker 1:And it's like it's like the common theme across all of these. So but, Adam, your question about, like, are are these kind of one offs? I mean, I would argue that supercomputing today is in the GPGPU. That is the kind of special purpose high performance computing today. It happens to have a much, much broader commercial application than testing nuclear weapons, which is what effectively, what the commercial application was for supercomputing.
Speaker 1:It's a little bit broader than that. And so and now I think between the use of the COD GPU and DL and ML, I mean, there's a broader use of it. So I would say all that kind of zeitgeist has gone into the to the GPGPUs, but I would love to, get the perspectives of others for sure. We're gonna be better versed on this than I.
Speaker 3:Yeah. I I I wanna mention, put in a plug for Shaheen Khan, who I see listening here, but he he has this whole, Dead Architecture Society meeting every year.
Speaker 1:Oh, nice.
Speaker 3:That which should be coming up. Shaheen, can you talk?
Speaker 1:Yeah. I I just I I invited Shaheen to speak. So, yeah, if he's if he's around. Yeah. They'll click.
Speaker 1:Hey, Shaheen. How are you?
Speaker 5:Hey. Thanks for having me up here. I'm really enjoying this. And and I was just gonna since I have the microphone, it's also the interconnect. I think that the g p GPU is is really where the action is in terms of performance, but it's also highlighting the importance of the fabric.
Speaker 5:So if you look at the latest greatest, it's like the Cray, HPE now, Shasta, Slingshot, Slingshot interconnect, or really the NVIDIA Mellanox interconnect. And in fact, the system that was spoken of earlier, the one in Australia, that I believe was a Mellanox interconnect connect too, or was it the big fat switch that Bechtel Schein built?
Speaker 1:Right.
Speaker 6:The humongous which was the, copper 1 20 gigabit cables. This was the one that was based on, I forget how many ports. I'm gonna say 576, but I don't remember if I've multiplied that outright. But it was, sort of 10 inch rack unit, switches, and we had 4 of those in core, I believe, and they were all running CXP 1 20 gigabit, trunks out to, edge, optical CXP out to, the nodes, sorry, out to the edge switches, which were basically the Constellation, the rack was the blade chassis, and then in the Constellation, chassis, there were 4 slots for Mellanox, switch fabric switch. Right.
Speaker 6:Chips per tab per
Speaker 5:Crazy machine.
Speaker 3:For for It was a good machine that
Speaker 6:could put 35 kilowatts peak per rack, so cooling was a real issue, and I've got some stories about that. But, yeah, not not not kind of grand challenge stuff, but, the previous machine, the, Alteryx P37100 that they had, nearly died as a result of a really interesting failure mode after a, thunderstorm supercell, which knocked out the UPS for the management part of the machine, but not the raw power for the interconnect. There was a raw power for the
Speaker 3:No. Poor? No.
Speaker 6:And, so and but it had also knocked out the, water chill loop because the, floor was flooded and the sensors had been installed on the basis that they thought, well, the reason the floor will be flooded is because the water chill loop is, leaking, and so it shut down the pumps. So they had no cooling and no way for the management stuff, which was turned off, to turn off any of the nodes, which didn't actually have that as a separate, monitoring thing devolved into each node. So it was just sheer luck that the main admin lived near enough by that after the supercell, he kinda got his Subaru and scooted through huge amounts of hail on the roads and turned off every single rack by hand.
Speaker 8:Wow. That's a fantastic story. I I have been watching because I have a login on this machine. I've had a login for the last 2 years. I haven't really been using it but I've been getting all the maintenance notifications and I have to say that lately basically the failure mode of these big machines is that the lustre file system craps itself and then a consultant comes out and kind of like looks at it for 4 days and then gets it back going again.
Speaker 8:But, I think that machine currently, the the current machine at the NCI and just about every sort of running the mall machine these days is just InfiniBand. Just and and that's Mellanox. Right? So Mellanox EDR or whatever. But there are some there are some start ups doing some new stuff in interconnect.
Speaker 8:So I I'd be interested in hearing other people's takes on, like, what RocketPort are doing and, what is it, Infabrica is the other one.
Speaker 1:Yeah. And, Shien, I think you made this I I think assuming this is what you're talking to, Shien, you're making a very good point about the interconnect kind of becoming everything effectively, that the the supercomputing kind of switches sometime in the eighties eighties to nineties where it is all about connecting compute elements quickly with low latency and high bandwidth rather than making the single fastest compute
Speaker 3:element. Oh, when you totally. When you go to the supercomputer show, right, people talk about the the Cray era, which was a solid, you know, 40 years from 1950 to 1990. You know, it was all single processor performance, but then after that, it's parallel and then the the interconnect dominates.
Speaker 7:Yeah. I mean Well, the craz Cray era, quote, unquote, kind of ends at the memory wall. And, I mean, the guy sees ahead sort of to that to the point where he starts innovating on things like DDR memory, because you just need better signal integrity for
Speaker 3:it. Yeah.
Speaker 7:But at the same time, even a grand challenge style supercomputer only gets to innovate on maybe 22, maybe 3 dimensions. Right? You're not gonna have a whole new processor and interconnect and silicon process of, good grief, gallium arsenide.
Speaker 1:Like Or you can do all that.
Speaker 7:Even with a $1,000,000,000 seed budget, you can't, with good conscience, innovate on all those different axes at the same time and think this isn't just gonna end catastrophically.
Speaker 1:Well, yes. So you you can do all that. It's just gonna find yourself locked out of work one morning, and everyone is gonna be waiting around for a bus to come pick you up. So, yeah, I mean, you definitely if I agree with you, Matt, that you Well, they Sorry, Tom.
Speaker 3:These these these days, not even the governments can afford to do the
Speaker 1:That's right. Yes. I mean, the the and that's what makes these numbers seem otherworldly because it's just not something that would ever get funding now from anybody. A government, it's probably not from the private sector. But, Shahin, sorry, I think you were saying something about the about the importance of the the interconnect.
Speaker 1:Yeah.
Speaker 5:I was saying that I actually liken it to a glue and because it is the glue and I say that the viscosity of the glue is the issue. And if you have like a really watery glue, that's ethernet. And if you have like cement, that's SMP. And you got everything in between. And and, you know, ultimately, you want address coherency, bandwidth, as well as latency.
Speaker 5:But, you have to kind of clear one hurdle before you get to the next one. And right now, it's all as was said, all distributed memory, message passing, and that was made to work. And I think Seymour didn't think that was gonna work. He was really relying on the programming environment to be an issue. And I think that's why he kept now to their credit, they did do parallel systems.
Speaker 5:So they had like, I think the C90 was like 16 CPUs. And by the way, that was a 53 layer board, if I remember correctly, and that was in production. And they also had, like, some levels of 3 d stacking that is now, like, what AMD announced today. They had, like, equivalent of that between boards. There was this thing called, you know, 0 insertion force, but then they also had these, like cloth looking things that they would put in between boards.
Speaker 5:So when the boards would be inserted and then they would be pressed on top of each other, Signals would go vertically up and down. It was like a crazy system, but it was production. And it was 16 CPUs at the time with lots of memory.
Speaker 1:And it should One of the things Sorry. Go ahead, Courtney.
Speaker 9:So one of the things I I I wanted to kind of point out there, when talking about, like, the larger ecosystem problem, that Cray faced, especially, I I they were kind of while Seymour Cray was very, advanced in in trying to drive hardware technology innovations. Right? The the software side was a huge problem for them as times changed through the seventies eighties. Like, one of the things you had, you know, in the seventies, you'd you'd sell to a national lab or to, you know, a defense program. And they'd help you develop your compilers.
Speaker 9:They would write their own OS, and you tell the machine that you could you know, your next generation didn't have to be compatible or whatever. By the late eighties, that was that was becoming less acceptable. Right? You had to have a good compiler suite. Yeah.
Speaker 9:You expected to have, you know, NOS that was that was that would run on several generations of machines.
Speaker 3:Yeah. Even in the mid mid eighties, you know, people were starting to demand UNIX. That that's part of the ETA story is how they they had shift gears to to support UNIX.
Speaker 1:And even today, I mean
Speaker 9:by 93, you had the end of the Cold War or 91 rather, you know, you had the end of the Cold war and all of a sudden this funding dries up. Right? And even Cray saw Cray not Seymour Cray, but, you know, Cray Pewter saw the need to do to do MVP stuff and came out with, you know, the t three d and the t three eight, which were very massively parallel machines in the kind of way that you might recognize today, right, using, you know, commodity alpha CPUs.
Speaker 1:Well, yeah. And, Courtney, you're getting you're tacking into a very kind of a question that I actually would love to ask Shahin because, Simeon, you had this question about what was the influence of Cray on Sun. And, Courtney, you're talking about these machines of whether they're using, you know, more commodity silicon. Of course, the the logical extreme of that was the Cray Business Systems Division, BSD, making the c 64100, out of Spark CPUs. And, Shaheen, were you at Sun when that
Speaker 5:floating point systems, which was actually one of the systems that the Bulgarians had cloned. And I think Bulgaria was like one of the hotbeds of technology for the Eastern Bloc. So floating point systems was building attached processors, accelerators essentially. And then, they had graduated to build UNIX systems with a vector attachment. And the CPU was based on an NCR chip in the old days, and it was like a, you know, so they decided that they needed a standard CPU and standard OS.
Speaker 5:So they cut a deal with Sun to use Spark. And that was the beginning of it. And it was like it's, you know, it was a FPS model 500 and then model 500
Speaker 3:EX. And then then then then FPS bought Cray. Is that right?
Speaker 5:Or And then, no, Cray bought FPS. It bought FPS. Okay.
Speaker 1:So that's where Cray BSD comes
Speaker 5:from? That is correct. Yeah.
Speaker 1:Okay. So and then Cray? Yeah.
Speaker 3:And then SGI bought Cray and didn't want any spark stuff.
Speaker 5:And then SGI bought Cray, and of course, you know, Daryl Ram, who I think is I saw I think I saw him in the audience. He was on the SGI side of it. And and the history of it is really quite funny and well, you know, it is what it is, but it is also funny. And, because SGI really had no use for this system, and they should have just killed it. But they didn't for whatever reason, despite, you know, advice to the contrary.
Speaker 5:And they decided that they were so confident of the SGI origin that was about to come out right around then, and it was a really nice CC Pneuma system. And that, and they sold it to Sun. So when I was at Cray, we were basically told that SGI is buying the company, and we have no use for you. So go sell your division to somebody. And we started chopping it around and we did a whole prospectus.
Speaker 5:And I remember in our prospectus, we said that whoever buys us, we think we can sell like 150 systems in our 1st year. And we sold like 200 system in the Q1 at I at Sun.
Speaker 1:Yes.
Speaker 5:It was, like, incredible.
Speaker 1:Okay. And so, Shaheen, the the purchase price of Cray BSD by Sun?
Speaker 5:Well, I think rumors are varying between, like, you know, $15,000,000 $19,000,000 and we had a whole inventory. And the idea was that if you sell the inventory, then you owe me another 10. And of course, the
Speaker 1:the price was not something that was well known inside of Sun. I had heard, actually, numbers even south of that, that it was below 10. But apparently but if you it was definitely this is the best acquisition in the history of the industry without question. It it it is certainly purchased for what? Less than 25.
Speaker 1:Right, Shane?
Speaker 5:And I think it was certainly less than 30. I think McNeeley called it or somebody called it the best acquisition since Babe Ruth.
Speaker 1:I yeah. Hi. Well, hi, Dean.
Speaker 4:Thanks thanks for mentioning me. I I so I'm Daryl Ram. I was on the SGI side. There was, there was a faction of SGI people that were violently opposed to the sale. And there was a faction of people that wanted just to to sell it.
Speaker 4:And the the one who sell CBS seemed to really come from with inside Cray. So the Cray supercomputer guys, arguably had wanted to, you know, didn't really see a future with with the business systems and wanted to get rid of it. I've found it about but literally walking to the office one morning and finding my my boss there and and, my general manager I worked for and and saying, what's going on? And he's like, well, they they wanna sell CBS. He'd already been over to the CEO and and president and and pounding on the table trying to, undo the deal.
Speaker 4:And he he filed, and I said and I basically accused him of failing and had a nugget. Well well well, you go and do better then. And I look, okay. Well, fuck it. I will.
Speaker 4:So I walked across to the corporate headquarters and met with our our, lawyer and with the president. And I'll just on the spur of the moment made up, you know, had a list of customers that we knew that we're Sun was running out of power on their high end servers, and we were doing an incredible job with competing with the challenge line. And, you know, I already despite a couple of accounts already had, like, 10 or $20,000,000 of business, I said, why would you give this to Sun? And I think the response to me was, what do you want us to do? Just buy it and bury the spark stuff?
Speaker 4:I said, absolutely.
Speaker 1:That's exactly what I want you to do. I'll give you a shovel.
Speaker 4:Absolutely. And and and in parallel with that, you know, you know, big secret. You know, I was trying to hire Shaheen. We were trying to hire Shaheen personally, and we we we just did not want this deal to happen. It was a super steal ever, and it just irritates me when everybody says SGI, you know, made a bad decision.
Speaker 4:There was a group of people inside SGI that fought like hell to stop it, but we failed.
Speaker 1:It was I it wasn't the stupidest you ever. It was the best you ever. What are you talking about? It was the was there
Speaker 3:a Then then then there was SGI and Windows NT.
Speaker 1:Well, yeah. Exactly. Oh, no. Come on. That's not right to do this.
Speaker 1:Poor Daryl. I mean, that's
Speaker 4:the That's that's that's unfair to bring that up.
Speaker 1:That is unfair. Exactly. We said there no one is to mention Bob Bishop.
Speaker 4:But but at that point in time, just I mean, Sun had done so well with these product lines, but it was running out of steam at the high end. Yeah. Silicon Graphics had had accidentally fallen on this incredible product, the challenge, line, which is just killing it at the high end. And to to give away, it made no sense at all. It was just insane.
Speaker 1:So that product did so on the kind of the sun side of that, the we that bought for, again, south of 30, that did a 1,000,000,002 of revenue in its 1st year, I believe, Shadi. That's it. It
Speaker 5:Oh, it was. I think that the story there was that I remember we launched a product, like, January 17th. And then it's, like, you know, April 5th or something, and, Sun is getting ready for their quarterly, report, the financial analyst report. And, and and we had a we had a meeting among, you know, the marketing team, and we decided that we this was our tagline. So I call up Clark Masters, my boss, and I say, a 100 days, a 100 systems, a $100,000,000.
Speaker 5:And it was like such an irresistible tagline that I think Ed used it on the strength to strength.
Speaker 1:It was. And when that product we that product was the right product at the right time. It was the right bet in exactly the right way at exactly the right time. It was hitting it's hitting the Internet. Internet is going supernova.
Speaker 1:People are unable this is long before distributed systems are really a a thing.
Speaker 3:And So what what year what year was that? 97?
Speaker 1:That's 97. That's That
Speaker 5:was 97. Yeah. So the other big thing, Brian, was SAP r 3. Absolutely. Everybody was moving SAP r 3, and it was a shared memory hog.
Speaker 1:Yeah. Absolutely. So this is and so, Adam, you know, we're, like, we're we're within nanometers here, the origin story of p trace because this is me working on I my birthday in December 3rd 2 1997 being I basically worked 24 hours that day because we had this SAP benchmark that we were doing for for GM. Yes. Do you remember this?
Speaker 1:Yes. I do. So this is this is itself is an incredible story. So we've got this thing has and I'm I am in the poor software group. And, you know, we, of course you know, we've got maybe 1 e ten k that's gonna maybe make its way to us at some point.
Speaker 1:I have basically never seen one of these machines, 64 gigs of RAM. How much money is that? That is so 64 gigs of RAM.
Speaker 2:How much money is that?
Speaker 1:That is so much money that finance was in on the calls because they needed to know when the machines could be released for revenue recognition. So this is like you are sitting on the quarterly numbers, basically, for a multibillion dollar company running a benchmark. And, Shaheen, I don't know. I'm I'm sure you were aware of this going on when it was happening. It was definitely a big deal when it was happening.
Speaker 5:The Oh, yeah. Yeah. You know, part of the thing is coming from Cray was that it was such a techy company that customer benchmarking and industry standard benchmarking reported into marketing.
Speaker 1:Right.
Speaker 5:Because everybody was like IT enough to handle it. So Carlyle's group was heavily involved with the SAP benchmark.
Speaker 1:Well and it was group 3. My team. Yeah. Right. It was awesome.
Speaker 1:They were awesome. So I was working with all those engineers. So we we're basically working hand in glove, and so I'm meeting them for the first time and really enjoying it. And what we and, actually, I don't know if you recall these details, but the this thing is running this benchmark, and it is just clearly on the way to be beating every record out there. This thing is gonna set this world record at, GM for this SAP benchmark.
Speaker 1:And then the great sadness would happen. And there would be, like, 3 minutes of profound sadness where the machine is miserable, and it is entirely unclear what's happening.
Speaker 5:Oh, I totally remember that.
Speaker 1:Okay. I do. And so we are and and this is running on Solaris 26. And so the only tool that I've got is lock stat, which is actually hugely valuable. And so I'm writing custom kernel modules to instrument the kernel, and that machine would take 2 hours to boot.
Speaker 1:And when I I would fuck up, which I definitely did over that, you know, that other whatever the week that we worked on. And all of a sudden, you know, the guy who was working with Accrae is like, is there something wrong with the machine? You're like, oh, fuck. I just the machine just bounced, and now I've got, like, 2 hours to think about what I've done wrong. But that with Shehini was actually after so the the the root cause of that I was convinced that the root cause of that was going to we knew that the network stack was going insane.
Speaker 1:And I'm like, there is a software bug in here. There's a there's a bug in the operating system, obviously. I mean, like, the opera the operating system is suddenly chasing itself. And there was a bug in the operating system to a degree and that there was, like, an order of n cubed algorithm. But with the actual problem was that the for some reason, the operating system had been turned into a router.
Speaker 1:And what was actually happening is in the lab in which they were doing the test, there was another Cisco, router that would bounce that had a firm its own firmware bug. This thing would basically reset. And when it reset, the e 10 k was misconfigured to act as a router. So it would volunteer to be like, oh, instead of being the world's fastest SAP machine, I could be the world's slowest router.
Speaker 2:This is this is true.
Speaker 3:Lame it on the networking guys.
Speaker 2:This is your liquid cooled supercomputer volunteering to be your link system.
Speaker 1:That's exactly right. That's exactly right. Like, I know how to route packets. It's like, no. No.
Speaker 1:No. We really okay. Great. I'm glad you know how to route packets. No.
Speaker 1:I can route packets very poorly. Please get out of my way. Because of course the operating system is like not and that this is where you're getting into, like, all of these suboptimalities, the operating system where it's, like, not really designed to be a router. And the and so I she you know, that that was the root cause of that. And to me, that whole experience was really chilling because it was like I, again, had assumed that this was like going to be an OS bug.
Speaker 1:And, actually, in the end, it was a misconfigured system. And that you begin to realize that, like, wait a minute. If this is a misconfigured system and then we had that finally, we took 2 weeks to debug that. And, Shaheen, you remember that was around the we literally around the clock to make sure Absolutely. To make sure that Yeah.
Speaker 1:Somebody was working on that problem all the time because those machines were so valuable. You could not let them sit idle. And I remember thinking, like, no. Like, you could not have more resources on a problem than this one had. And, boy, where does that leave the poor, like, just person that can't summon the these incredible resources for 2 weeks?
Speaker 1:And you realize, like, we have got to have a better way of being able to answer these questions about the system. So there you go, Shaheen. That's the origin story of DTrace.
Speaker 5:That's that's wonderful. That's wonderful.
Speaker 3:I ask someone just to ask 2
Speaker 7:can can I just ask 2 hour wait time?
Speaker 1:Yeah. That's our joke. Why is that yeah. That so
Speaker 5:We didn't have certain things at the time that we did later on. I think that the file system was not it was FSCKing itself to craziness, if I'm remembering.
Speaker 1:Yep. You're right. It's Veritas. That was the other because Courtney, you're exactly right. And the other the part of what made the early 2000 an exciting time to be at Sun is that a bunch of us had kind of, looked at the state of the world with dissatisfaction.
Speaker 1:And one of the other things that was very dissatisfying and you're right to latch onto it is that FSCK time. And that's when Jeff was like, we've gotta take a from scratch approach. Jeff Arnold was like, we gotta take file systems from scratch. And that was the origin of CFS.
Speaker 9:I I think that was Shaheen, but I I did have a question, also for you, Shaheen.
Speaker 1:You were
Speaker 9:there at Sun, so so you came in with the Cray acquisition there. At the same time, also around the Thinking Machines acquisition, is there any do you have any background on that? I I was always curious.
Speaker 5:No. Well, when we were at Cray, Thinking Machines was kind of a competition. Right? And our joke was that they had the best food in the computer industry.
Speaker 1:This was before Google it's
Speaker 5:before Google got the same, you know, the mantle. But, but they were obviously hot, and they were very, very smart people. So the Thing Machine acquisition, Tom would remember it better, had happened before the Cray acquisition. And, and that's how Greg Papadopoulos came to the company and, you know, went on to be the CTO.
Speaker 3:I I I was long gone from Sun by then, I think. So
Speaker 5:Oh, were you? Ah, okay. Okay.
Speaker 1:And that's the origin story of Greg passing on oxide. Yeah. The great
Speaker 5:Oh, I'm misguided.
Speaker 1:No. God bless Greg. Greg also funded Fish Works. Can't I he also did that on the website. So, you know, there you go.
Speaker 1:Mixed bag.
Speaker 9:There were certainly another example, I think, to Brian's earlier point of, you know, extremely smart people, but from a business standpoint, just kind of all over the place.
Speaker 1:Oh, totally. All over the place.
Speaker 5:That that was their reputation. Yeah.
Speaker 1:Well, in a
Speaker 3:sense If if if we can go back and talk about the people aspect of Seymour Cray, I I mean, it it it seems like he's the kind of guy you would love to work for, and he'd be impossible to work with.
Speaker 1:Yeah. That's interesting. So he did you have exposure to to Cray at Cray, or was he in Chip Wolf also?
Speaker 5:No. He actually was already Craig's computer by the time that The
Speaker 1:creation of the
Speaker 5:BS happened. And he was doing his gallium arsenide in Colorado Springs, was it?
Speaker 1:Yeah, Colorado Springs, yeah.
Speaker 5:Yeah. And of
Speaker 1:course, you know, when he
Speaker 5:passed it was giant news at Cray, and everybody was like because, you know, he actually stayed in the hospital for like a week before he passed away.
Speaker 1:Right.
Speaker 5:And everybody was saying if anybody can pull through, he will, Because he has that kind of a will. But he was surrounded by a couple of people who really made his stuff work. Yes. Like, Les Davis.
Speaker 1:Which is like
Speaker 5:one of the unsung heroes of supercomputing, and
Speaker 7:there were like 2
Speaker 5:or 3 others around him. You know, if you went to Chippewa Falls in that era, it was just an incredible place. Extremely smart people, like, doing nothing else but building this stuff. And I think those guys were the ones who made Seymour's designs
Speaker 1:he does, I think, leave you with the impression that Davis is every bit as important to these machines as as Cray is. In fact, they were a very Absolutely. It seems like they were they had a good like, they kinda needed one another, I feel. I feel like they did their see, they seem to have done their best work when they were working together.
Speaker 5:Definitely. Definitely. And now, Les stayed with Cray Research. Of course, another joke was that Cray Research builds computers and Cray Computer does research. Yeah.
Speaker 5:That's right.
Speaker 1:Well, and actually, it's funny that the I didn't realize this until reading the book that the origin of Cray Research was deliberately wanting to get out from under the thumb of what he viewed as kind of short term machines at CDC and kind of derivative machines at CDC and doing machines that were more speculative. And that's why they deliberately put race research in the name when he kinda defected from but, yeah, that's that's a funny I certainly seem to ask Shane.
Speaker 5:That's right. Actually, I was at IBM when that happened and when I came to I mean, within IBM, there there were a whole bunch of people who were faulting IBM Management for passing on Cray because Cray had approached them to say, will you fund me will you fund me to do this? And they said no, so he'd gone and done his his
Speaker 1:own thing. Yeah. Interesting. And so what was it like? Because I did not realize that the business systems division inside of Cray had come from an acquisition.
Speaker 1:Of course, that makes a lot of sense. Because, I mean, I felt felt like that machine was so commercially minded. It had such a good idea of who the customer was. It must have been tough to I mean, that must have been a a real juxtaposition with certainly with Cray historically. I mean, it was
Speaker 5:Yeah. The motivation for that from what I understand is that the commercial people I mean, the story was people like Walmart were coming to Cray saying, we'd like a commercial supercomputer. Now, the Cray supercomputers, they were running they were running UNIX, sorry. And in fact, between I think outside of Sun, Cray was the only other vendor that actually owned their copy of UNIX that UNICOS was a very nicely implemented UNIX at the time, including, like, hierarchical storage management native to the OS. And and and they've done all of that, but it was real memory.
Speaker 5:There was no virtual memory. It was 64 bit addressable. It was a pain in the neck to port database packages But anyway, they didn't so they basically said, okay, we're gonna go build a commercial supercomputer, and buying floating point system was an acceleration of that, because it was already running Spark Solaris, we had all the catalog, let's go take that and then build like a 64 way system for And that was the Cray Super Server 64100, the initial 64 way system. And then Starfire, code name Crossfire, was the UltraSpark version of it that actually was launched under some.
Speaker 1:Right. But that would have been developed under because I mean, certainly, it or feels like it launched very shortly after the acquisition. It feels like nothing was ready to go.
Speaker 5:Well, actually, David Yen's group and Clark's group were already working very nicely together even with the X Divas that came from Xerox before even the x eighties. I mean, the c s 64100 was already a joint development with Sun.
Speaker 1:Right. Yeah. Right. Right. Right.
Speaker 1:Yeah. We we had and we did have the software group did have a bunch of dragons. The X T bus was with the the the scorpions and dragons, the sun four d's, which I've got very fond memories of Newton, Pod, and all those that we had in our lab. Adam, I don't know if you did you did you ever work on the Sun 4 d's at all?
Speaker 2:I never worked on that. I think I had to, like, test out some wands on them, but, never did anything serious.
Speaker 1:I I part
Speaker 2:of my
Speaker 1:problem, I get too much emotional attachment. These kind of machines are part of my problem, but I I definitely definitely remember those machines fondly.
Speaker 2:All these pats.
Speaker 1:Pretty much. Exactly. Well, and I would those are the ones that Roger would always have me, power cycle those machines. So the so, Shahi, that must have been I mean, certainly, that felt like so from your perspective as well, that just felt like an incredible an incredible, like, I would say, merger of and mergers, they almost too sac on a word. I mean, it was it is kind of confluence of the what the future of commercial compute should be.
Speaker 1:And it was just a fun time. It just felt very explosive.
Speaker 5:Oh, it was huge, and it was emotionally extremely strong for everybody. I mean, when Starfire was launched, people were, like, crying.
Speaker 1:Yeah. Interesting.
Speaker 5:And and and, you know, for all the FPS people who'd, like, slugged it for, like, so many years, it was like coming home. It was like we finally managed to do this.
Speaker 1:It was a big deal. A lot
Speaker 5:of a lot of, like, sweat and tears and, like, emotion into that. Definitely.
Speaker 1:That's amazing. Because certainly from I can tell you from the software side, it felt great. Because, like, this is feels like this is such a vindication for the software vision. Because the software vision was really around I mean, I came to Sun because Sun believed I felt more fervently in SMP than any other company. The, maybe SGI except that SGI was the other one that also believed heavily in SMP.
Speaker 1:So it was a real confluence of visions in that regard.
Speaker 5:Truly was. Yes. And, you know, Jan Pieter gets a lot of credit for that because he was running Solaris at the time. And and and I'm sorry.
Speaker 4:I had a little little strange history thing too. I don't know if everyone knows, but SGI and Sun were almost one company at a point in
Speaker 1:time. Okay. You gotta tell me that story. I actually don't know that story. When was that?
Speaker 2:So in
Speaker 4:the very, very early days, there was actually a meeting at at Ed Ed McNealy's, apartment in Mountain View, where the founders all got together. And I my understanding from folks that were there was that it was, no one could decide who was gonna be CEO. So, it never happened. But
Speaker 1:So this is the this is the crack and I knew we're getting together.
Speaker 3:Well, what what I heard was that Andy and Jim Clark didn't really like each other very much.
Speaker 4:Yeah. Yeah. Yeah. No. It was it was Jim and and yeah.
Speaker 4:Yeah. And, you know, it would have been interesting. The other interesting thing was the early SGI boxes were, you know, you know, licensed some, boards from Stanford. So, you know, there was a very incestuous relationship, obviously, between the companies. But it
Speaker 5:was Oh, I didn't know that.
Speaker 4:It was interesting that yeah. The early terminals were inside an early terminal was actually Andy. That was trying to design, sun board. Sun as in the stamp the stamp of boards. But it was the companies were really close in the early days, and it's just amusing that, you know, history between the 2 of them.
Speaker 1:Yeah. And I feel that I came out at probably the height of the rivalry between the two companies. So I I joined Sun in 1996. And I feel like the in many ways, Gerald, would love to get your perspective on it, but the acquisition of Cray by SGI is kind of the height of SGI hubris. If you disagree.
Speaker 4:It was for me, man. You I've you should've seen how red eye was in that meeting with the, execs trying to stop the sale. But, now There was a point just around then too when internally at Silicon Graphics, it got up my nose. We we had a corporate slide deck presentation, in in our wonderful, show showcase the, 3 d, PowerPoint thing that would, but there's, like, the second slide was, talked about how we're, half the revenue of Sun but twice the market valuation and therefore, you know, the customer is meant to take away something about how good
Speaker 3:we are.
Speaker 4:And and I I thought that's the most disgusting slide to show customers. Like, you're you're overhyped as a company and but but every so many sales guys would take this stupid corporate slide and stick it in the slide decks. That would that particularly got up my nose. I I made sure that any presentation I was in, we did not use that slide. But no.
Speaker 4:Yeah. I think you're right. The the crazed out to me just just killed it. And that's when I I turned around and started to look at leaning, so geographically.
Speaker 1:Yeah. Well, I I've always felt that, you know, before it was either Shaheen or Tom, I think it was Tom that cruelly mentioned, Rocket Rick Belluzzo and the, the the movement to to to Windows NT, which was such a tragedy to watch. And the because it felt like and that that to me felt like it was happening broadly in the industry. I mean, that to me was what kept me certainly at Sun was the belief in system software and not mortgaging our future to
Speaker 4:to Microsoft. No. That was that was a disaster. That was just after my time, but that was then watching it as an outsider, it was just like, what the hell is Silicon Sonic Graphics doing? It's sort of committing suicide.
Speaker 4:But while I was there still, you know, we had the origin series come out to replace the challenges, which was a a Numa machine, but it's sort of funny little anecdote with, you know, the high Numa machine, but it was looks like as John Massey described it the most one of the scalable, NUMA architecture packaged in the most unscalable packaging. The 1st generation packaging was just terrible. But there's a history for why because we were targeting supercomputer the supercomputer market. So we made the the economics only work if you were buying a 100 CPUs. If you wanted to buy a 16 CPU box, it didn't work.
Speaker 4:So that was, you know, that was sort of the second major failure after the the sale of Cray. Interesting. Transition to Origin. But the the day that we announced all the stuff, we announced the low end Octane low end o two work station, and their marketing message is all about being not NUMA. It was all about being NUMA, you know, shared memory for the GPU and the and the thing.
Speaker 4:And how, how sorry. Yuma, UMA, uniform memory access to how shared memory is great for your GPU. And that was literally their marketing message. And we came out with the origin at at the same time, and our message was Numa is good. So you had 2 the low end and the high end of
Speaker 1:the company. Why do you even got it now?
Speaker 4:Numa versus Numa. And I'm, you know, why did you not did they even coordinate with each other the marketing message? And that was just another final straw. I mean, these things were, by then piling up on top of each other, it's like, this is not a good thing for a company.
Speaker 1:Yeah. I would have I mean, you mentioned the challenge I remember Ashin, do you remember the ads that SJI had very good ads that they were running very briefly about why I think Netscape was running the challenger. And they had this Because They Rock ad with the guy with the Nerf gun. And I remember thinking like, man, that's good. It's good copy.
Speaker 1:And it felt like that was, I think it felt like SGI having a real strategic window, but then, well.
Speaker 5:Oh, they they were they were great at that. They all would remember this at about I think it was like the triple 7 Boeing.
Speaker 3:Oh, yeah.
Speaker 1:And they had,
Speaker 5:like, a big photo of this giant plane and underneath it was like this tiny box. And it says, here's like the greatest airplane in the world and here's the box it came
Speaker 4:in. Yeah. That was great. It was it was it was interesting. The marketing was not we were not connected to corporate marketing at all on the commercial service side, and we also have this big disconnect, and that sort of got solved one day.
Speaker 4:Marketing corporate marketing, there's an amazing ad, which is just a field of sheep. And so when we was deciding to get into early web servers and, it's hard to remember back when computers had trouble running web servers. But the early, work with the low end low end servers just running, web servers interacts. And we were doing quite we were doing quite good in that market, but this ad was a tremendous it was, a field of sheep trying to get through a gate. And the message was something like, what do you the worst thing that happens is if you build a popular website.
Speaker 4:And, it was just a really good visual ad. Anyhow, that started a good discussion where we got closer to corporate marketing. We're doing more stuff with them.
Speaker 5:Yeah. Yeah. So someone mentioned the t three d and t three e, and that was another big error at Cray. Because clearly MPPs were coming and of course Cray was very loath to do clusters, because clusters were like not gonna work at all, but MP, you know, MPPs would. So the whole interconnect and space sharing instead of time sharing and all that OS and they kind of got the mock micro kernel and put that into the unicos.
Speaker 5:And, and of course they needed a CPU, and they ended up going with alpha. And that was another big you know, I think they installed a few T3Ds and it it became sort of the company. And I think they lost the IP with that when SGI acquired the company. And when they released Cray again, they basically released it pretty bare bones. They didn't have a whole lot of, IP when it when that happened.
Speaker 6:So that's when it got acquired by Terra from SGI?
Speaker 5:That is correct. I think a few years after SGI acquired Cray, it decided that it needed to spin it out again. It kept all the juicy stuff, like the hierarchical storage management stuff. It was I forget
Speaker 1:what it's called now,
Speaker 5:a DMF, Data Migration Facility. They kept that, they kept all the patents, they kept all the everything. But they basically let go of all the vector systems and maybe them
Speaker 2:It was at this moment that the Twitter space ended suddenly. Brian was disconnected, and for some reason, that brought the Twitter space down. We restarted the Twitter space a few seconds later. We're missing a little bit of the conversation, but only a few seconds.
Speaker 1:Plenty of stories to tell because, I mean, obviously, he was very intimately involved or or knew that that the, the accident was happening.
Speaker 5:He was very intimately involved and we were in violent agreements because from the you know, from the Sun side, from from the Cray side, I was surprised. The whole
Speaker 1:the whole the
Speaker 5:whole SGI Cray deal was a twilight zone deal. It made no sense.
Speaker 1:I mean or it made perfect sense. It made beautiful sense. It just made no sense for them to do it. Yeah. That's right.
Speaker 1:That's right. I mean, it it was glorious. And I remember god, I got I mean, just so many e ten case stories. I remember the
Speaker 3:do you
Speaker 1:remember boo.com?
Speaker 5:I do not know.
Speaker 1:We don't know. Okay. Boo.com was maybe the first Adam, do you remember boo.com?
Speaker 2:Yeah. They were like they were like a online market, maybe like eBay or something.
Speaker 1:Is that You're so close. Yeah. They streetwear. They were gonna sell streetwear on and it was gonna be as very kind of rich Internet experience. The problem is that everyone was still on, like, 24 100 pod modems.
Speaker 1:So it was, like, a little early. And so they were a super early flame out. The benefit family funded them, and they they flamed out in, like, early 2000. But, Shaheen, they had an e 10 k. They had an e 10 k with a single CPU board in it.
Speaker 1:And What is it? It's like, Oh, man. I and I we got the sales guy. And, actually, so the then the CTO of Boo, when he realized that Benetton was getting cold feet and the family was gonna pull out, they pull out. He starts his own LLC to buy the e 10 k from them, and then he sold it into the gray market when it was I mean, he made a bunch of money as a flip he made a bunch of money flipping e 10 k's.
Speaker 1:I don't know how he would have done that in history, but he definitely did.
Speaker 3:For Oh,
Speaker 5:wow. Well, there was a time when he could do that because we were production limited for pretty much the entire history when I was there.
Speaker 1:And, yeah. I mean, I remember I mean, what they told us is that they were having to ramp up a 3rd shift to get that thing. I mean, they just like the manufacturing they were manufacturing it. It was a manufacturing bottleneck ultimately.
Speaker 5:It was. And then we started one in Newark and in addition to Hillsborough, Oregon. So we had like 2 manufacturing sites, and then eventually also in Scotland. So it was like a really popular system. I think at the end of the day, it was something like over 5,000 units sold around the world.
Speaker 1:Man, that's amazing. That is amazing. And there were and and you remember, Jared and Enron had a bunch of them, Adam? I mean,
Speaker 2:they were Sure. Yeah.
Speaker 1:Which was great. The Shaheen, did you Jared Jensen at Enron? Did you talk to him at all?
Speaker 5:No. No. I basically between Clark and I, we split the geo, so he was doing US and I was was doing international because he didn't like traveling long distances.
Speaker 1:Well, then so, yeah, he would so was I'm obviously Enron, a crooked company that that, exploited California's power market, but also, had some really good really sharp IT folks. And Jared was an IT at Enron with a bunch of of 64 k's and or e ten k's and fully racked out e ten k's. And the, we he was in our platinum beta program, and there was this great moment where, you know, he has got this southern accent, and he definitely enjoys laying it on thick before people know how I mean, he's a super sharp guy. And before people kinda know that and and don't you feel that he turns up the accent too? Oh, yeah.
Speaker 2:He he starts fixing to do lots of things.
Speaker 1:Fixing he's fixing to do everything. And, you know, I don't know about y'all, but I and everyone's thinking, like, who like, we're all in IT because we are a big Sun customer, but who's the CEO goal? Like, he doesn't have any and finally, they go around the room being, like, you know, what are, what what are people actually running? Well, we got I don't know. We got 5, 6 e 10 k's, but I'm fixing to get 2 more.
Speaker 1:And this is at a time when people are like, I I I'm not allowed to, like, look at an e 10 k. And this guy has got, like, 6 of them. But it was definitely I I was it was a great machine, honestly. You you all and, Sean, talk about the interconnect. I mean, that that was a that was a very impressive interconnect on the e ten k.
Speaker 5:Yeah. So the c s 64100 was 4 parallel buses that did address and data. And that was the XT bus that could do 1 bus, 2 buses, and 4. The 1 was the spark server 1000, the 2 was like spark Center 2000, I think if I'm not mistaken.
Speaker 1:Yep, that's right. Yeah.
Speaker 5:And then the 4 was the CS 64100. By the time it got to the Ultra Spark, they could do point to point communication. So Starfire had a crossbar, 16 way, 16 by 16 crossbar that connected every board to every other board for data, but then addresses were like a round the single wire sort of a thing. And then with the one after that, the Sunfire, Sunfire 15 k or 20 k, whatever it was, the address was also crossfire, a crossbar, so point to point. Now the point to point connection also allowed us to partition the machine.
Speaker 5:So we had hardware partitioning because you could isolate a subset of boards and boot a copy of Solaris on it separately. And, you know, our joke was e 10 k was multiple versions of Solaris, otherwise known as MVS.
Speaker 1:Yeah. It's saying and so because the e ten k, because it did perform well, I gotta believe that there were Cray customers who were attracting the economics of running MPI on on an e ten k. Was that I I mean, I assume that there were HPC customers for the e 10 k. Certainly, I dealt with a couple.
Speaker 5:We certainly did. I think, if I'm not mistaken, something like 10, 15% of the install base were actual HPC. And those were bigger memory systems. And, it really was more of a memory thing than a CPU thing, if I remember correctly. And of course, you know, UltraSpark was decent.
Speaker 5:It had the, you know, combined the multiply add instruction. But it wasn't like exactly a super computer.
Speaker 1:It was not exactly a super computer. I was gonna say like, yeah, you don't feel like you need to be too heavy.
Speaker 5:No. No. No. I know. Yeah.
Speaker 1:Yeah. Yeah.
Speaker 5:Yeah. Actually speaking of that, back to Seymour, one of the uses obviously was like climate modeling, weather forecasting, but the spooks were using Cray supercomputers because he had a bit count, pop count instruction. And that instruction was specifically put into facilitate whatever it is that
Speaker 1:they do.
Speaker 2:Okay. I've always wondered that because I I I know the legend of pop c, the the spark instruction. But what is it that the spooks could do with that?
Speaker 5:Presumably, if he can quickly count the number of bits in a board that is flipped that are flipped is a useful thing because you can do logical operations like 64 at a time kind of a thing. That's that's close as I've been able to understand it.
Speaker 1:Adam, I've always had the same kind of question. I'm like, I get that like Popsy is used to like fund civil wars in El Salvador or something. Like I I Popsy, c, I get it. It's used for, like, for espionage. I don't understand.
Speaker 2:Right. That that my understanding was that it right. It could enable untold evil and that don't worry about it because it's now emulated.
Speaker 1:Right. Right. And we, say we think we ultimately did end up adding Popsy. They didn't end up adding Popsy to Spark, I believe. I mean
Speaker 5:I believe so too.
Speaker 1:I yeah. They wasn't a big part. Special.
Speaker 2:Yeah. Spark has Popsy, but but it but it trapped for every version of Spark that I worked with.
Speaker 5:Oh, interesting. Yeah.
Speaker 1:And Spooce never complained, as far as Adam knows.
Speaker 2:As far as I know. They they weren't calling me.
Speaker 1:Well, you know, they wouldn't. They just follow you.
Speaker 5:Daryl, do you know do you know the pop count stuff? You might know.
Speaker 4:No. I don't. Sorry.
Speaker 9:I I know it's crypto related. Right? For it's about computing Hamming distance.
Speaker 1:I I you're right. I think it's yeah.
Speaker 7:It's generically for a lot of things. It's not specifically for just crypto. Right? But anytime you're dealing with bit sets or things like Hamming distance or like, there's a lot of uses that don't involve overthrowing small countries.
Speaker 2:So you say?
Speaker 1:So the the other question, so, Shaheen, I guess, you guys came to Cray, said, after Cray himself was at Cray Computer. So this is also I am now dying to know more about Steve Chen and Supercomputing Systems just because every dimension of that company is just so out of sight.
Speaker 5:So Steve Chen, you know, was the guy who really made Cray YMP happen. Cray XMP maybe even. Because Cray 1 was done by Seymour. I think Cray XMP, Steve Chen was like the chief engineer on, and that's K YMP if I'm not mistaken. And then I think after that he branched out and IBM decided to fund him because they'd missed out on Seymour.
Speaker 5:And he went to Eau Claire, Wisconsin, not Chippewa, like, you know, a little bit farther out, and set up shop. I remember at the time, the idea was that he was in fact taking too many risks on too many dimensions, unlike Seymour. Was a system company because I talked to him when I was at Sun, and they were doing some startup that Greg wanted me to look at. And, I actually had breakfast with him, like, a few years ago because he's in the area and he's still, like, doing some cloud oriented thing. But he's obviously another, like, you know, hall of famer in this world.
Speaker 1:Yeah. And I would it'd be really interesting to talk with him and get his perspective on it all. Because, yeah, that's certainly the the the Murray in the Superman. Definitely. I mean, Adam, that's your your 78 layer board, and, it definitely feels like they are just pushing things too hard in too many directions.
Speaker 1:And then they're also, like, rewriting all of software from scratch, which is not a recipe for shipping on time. There is a and Shaheena, there's a line that I that actually Adam, may I do an out loud reading from Superman? Do you mind if
Speaker 7:I No.
Speaker 2:By all means, I encourage it. Yeah.
Speaker 10:Alright. So the this is after throw in one comment before we get too far ahead? I just wanted to add that when you're encrypting something, it looks indistinguishable from random noise. And when you decrypt it, you are almost always going to get a significant entropy drop. So almost no matter what the encryption method is or whatever encryption you're breaking, you know you've won because you did a pop count over the output and said, oh, look, the entropy has changed dramatically from the entropy before.
Speaker 10:I actually found the right key and now I'm happy. So if I'm wasting a lot of cycles on pop count, I'm going to be able to grade a lot fewer keys that I've attempted pretty much no matter what I'm trying to decrypt.
Speaker 1:Well, there's your answer, fishbowl.
Speaker 10:So pop count is really great for the, like, Everything just got better and entropy dropped. And now all your secrets are belong to us.
Speaker 5:Way cool.
Speaker 1:There you go. Well, that makes sense.
Speaker 10:And you know, you gotta check a lot.
Speaker 1:Right. Exactly. Right. Exactly. Right.
Speaker 1:By the way, we're gonna be checking that a lot. So the the thank you very much, Aaron. That is very that's there you go. That answers that question. The so, Adam, this is after so as with all these companies, supercomputing systems fails because they are all locked out of their offices.
Speaker 1:Was it actually a frigid morning in January? I mean, it sounds like it's a morning January. It's described as frigid, probably was. Anyway, so, 2 months later, one of the company I'm reading for the book now. 1 of the company's engineers was driving the backwoods of Wisconsin miles from Eau Claire when he spotted a familiar object, the s s one.
Speaker 1:There on the grounds of a small farm nestled in the Northern Wisconsin forest, forest, sat the machine's outer frame. He slammed on the brakes, veered the car to the side of the road, and jumped out. Examining the machine's skin, he spotted boxes containing more parts from the s s one. He gently ran his hand to open the parts. His heart sank.
Speaker 1:Whatever hope they'd had for resurrecting the machine was now gone. He knew there was no turning back. The s s one had been sold for scrap, which is like, oh, man. And but I think it also goes to, I think, the you know, Shaheen, you were mentioning earlier that the folks at at Floating Point Systems feeling like that sense of exhilaration to watch their machine actually come all the way to market and come into, like, a market that was ready for it and hitting everything right and how I mean, you can see why people were, you know, in tears when their when their machine actually, like, lands because, you you know, that's the alternative. The alternative is you're driving a backcountry road and your machine is, like, on a farm somewhere about to be turned into, you know, a tractor.
Speaker 5:Oh, heartbreaking. Absolutely heartbreaking.
Speaker 1:I am. Heartbreaking. Just like the way it's described to you, like, running his hands over the parts. It's like, oh, man. I'm right there with him anyway.
Speaker 5:Yeah. Yeah. Totally.
Speaker 1:Well, we've been wanting to keep these to about an hour. I know we went over here. Adam, my apologies to your toddler who I know gets I keep waiting for him to, like, join his space under his own account and tell you to, like, you know, feed him dinner. But this is great. Shaheen, Daryl, thank you so much.
Speaker 1:And everyone at Courtney, thank you. Jason, Simeon, everyone, thank you very much. That was a lot of fun and just an incredible for the the folks that actually lived this history to be able to to share it with us.
Speaker 6:Can we tell one last, anecdote?
Speaker 1:Do it. Yeah. But yeah. I would take that live.
Speaker 6:The machine that won, along with the people, who won the, Gordon Bell Price Performance Prize in 2000 called Bunyip, while the people who actually put it together and were running the software were elsewhere. It was in the computer science department at the ANU in Canberra, and it was, 384, Pentium 3 550 Processors, and this is kind of supercomputing like working at an angle rather than supercomputing like massive and there were these Pentium 35 fifties, which were going really cheap, and there were these Pentium 35 fifties, which were at a really good price point for single precision SSE and, so 32 bit floating point. And they managed to convince the judges that this problem was genuine, and I've got some issues with that. But the particulars of this story is that while this thing was running its, you know, big benchmark, it's actually so 192 nodes, and it's got a fancy network architecture, which turned out to be irrelevant. But machines kept blowing up.
Speaker 6:The capacitors would actually go pop off the board, and I was running these machines up to the local vendor who had collaborated with us and who thought that somehow they were involved in supercomputing because they just had a big supply of these PPOX dual processor boards. Turned out in the end, the things that were dying were the Pentium 35 fifties, which were based on the old discrete cache, which had the large geometry processor and were drawing a heap more current, and the Pentium 35 fifties that were all on copper mine with the 256 k on die cache and drawing much less power, they weren't actually degrading the capacitors on these boards as fast because it was during the capacitor plague. And so, basically, we just had this cusp where half of the machine was bought with old Pentium threes and half of it was bought with new Pentium threes, and half of it just broke constantly during this mammoth run to, you know, produce these results that apparently meant that the new computer science department was heralding a new age of supercomputing, but it was really Virginia Tech only on a smaller scale. If you remember Virginia Tech.
Speaker 1:There you go. Yeah. Right. See, the the the consequences of I I love the capacitor plague. I wanna get more details on the capacitor plague.
Speaker 6:But, Virginia well, it was a stolen recipe from a Japanese capacitor electrolytic capacitor, manufacturer turned up in, I guess, Taiwanese capacitor manufacturer at that point. I don't think China was kind of there yet, but it had a missing component, which actually was part of the electrolyte that stopped it from developing gas. And so these things, when you had large, ripple currents through them, they would get hot, and then the electrolyte would start to outgas. And then those bulges that you see in the top of old electrolytic capacitors, that's the sign that they want to go kaboom, and these ones would blow themselves all the way across the the case and sort of land on them. Yeah.
Speaker 6:So so and and, you know, so it was very it was very comical, except I was writing them down and basically just swapping the organs of this machine, trying to keep them up running, that they could do this. Where was I going with all of this? Virginia Tech, you remember they had a whole bunch of, g 5,
Speaker 2:rack. Apples.
Speaker 5:Apple racks. Right?
Speaker 6:Yeah. Yeah. Apple g hotbed.
Speaker 5:Yeah. Apple absolutely quashed that. They wouldn't let anybody else do it.
Speaker 6:That's right. Because they didn't have ECC. Someone worked out that, basically, it was just a Linpack demo machine, and they had just, like, a some incredibly anemic file server at the core of it. So in terms of its ability to do real work, they just took it took it apart and replaced it with ECC G5 rack systems later on, and, you know, that was what they actually moved into production with. So it was just basically, here's $10,000,000.
Speaker 6:Get us on the top 500, and they got to number 3, but it didn't really mean very much. Anyway, yeah, Bunyip was a bit like that.
Speaker 1:There you go. Alright. Well, exploding computers is a good way to as good way to end as as that a alright. Well, thanks again, everyone. Yeah.
Speaker 7:Now now just speaking of exploding computers and racks left in the middle of cornfields, how's the oxide rack bring up going?
Speaker 1:Exactly. There you go. Well well, we'll, we'll we'll tell you next time. No. We have actually blown none of them up exactly.
Speaker 1:We did have we we've had some excitement. No fires, but it's been it's going well. We're having fun. So, we'll we'll we can talk more about that next time. But, no no bad caps.
Speaker 1:And, unfortunately, I'm not a victim of the capacitor plague, the great capacitor plague. So alright. Thanks, everybody. We'll talk to you next week.
Speaker 5:Thank you, guys. Thanks a lot.
Speaker 1:A lot
Speaker 8:of fun.
Speaker 1:You bet. Bye.
Speaker 2:Thanks for thanks for joining, guys. Thank you.