what else can go wrong.
Seems to be my week for hardware problems. I've already fixed 2 servers (32+ CPUs and over a terabyte of RAM between them,) and getting ready to work on a 3rd that's starting to throw hardware events.
So, 1 server in particular. Crashed & burned early on Tuesday morning. 3rd party maintenance company figures 2 DIMMs and a CPU. No problem, when will you be out to fix? Sorry, we're not stocking parts for that system in the Twin Cities any more (turns out my client has the last chassis of that sort here.) Took until Friday morning to get DIMMs and CPU on site (that's also when I found the 3rd system with issues....) And turns out that those aren't sufficient to fix the problem, a cell board also needs to be replaced. Amazingly enough, they had one available as of this morning. The fine gent who was sent out with it HAD NEVER TOUCHED A SYSTEM LIKE THIS BEFORE. I showed him the chassis, verified it was powered down, said it's all yours, let me know if you need me to do something. I watched him out of the corner of my eye for about 5 minutes before I approached him. "I'm not trying to be offensive, but you've never touched one of these before, have you?" "Nope." "Okay, we're doin' this together." Side note: I've only been working with this hardware line back to about 1992, and that includes hardware that was ancient then, not my first rodeo. After shuffling parts and running diags, it's up and running.
No, I am NOT asking what else can go wrong. I still have to be up to move workloads off the next problematic host before it burns, falls over and sinks in the swamp. I really don't need more "opportunities."
Seems to be my week for hardware problems. I've already fixed 2 servers (32+ CPUs and over a terabyte of RAM between them,) and getting ready to work on a 3rd that's starting to throw hardware events.
So, 1 server in particular. Crashed & burned early on Tuesday morning. 3rd party maintenance company figures 2 DIMMs and a CPU. No problem, when will you be out to fix? Sorry, we're not stocking parts for that system in the Twin Cities any more (turns out my client has the last chassis of that sort here.) Took until Friday morning to get DIMMs and CPU on site (that's also when I found the 3rd system with issues....) And turns out that those aren't sufficient to fix the problem, a cell board also needs to be replaced. Amazingly enough, they had one available as of this morning. The fine gent who was sent out with it HAD NEVER TOUCHED A SYSTEM LIKE THIS BEFORE. I showed him the chassis, verified it was powered down, said it's all yours, let me know if you need me to do something. I watched him out of the corner of my eye for about 5 minutes before I approached him. "I'm not trying to be offensive, but you've never touched one of these before, have you?" "Nope." "Okay, we're doin' this together." Side note: I've only been working with this hardware line back to about 1992, and that includes hardware that was ancient then, not my first rodeo. After shuffling parts and running diags, it's up and running.
No, I am NOT asking what else can go wrong. I still have to be up to move workloads off the next problematic host before it burns, falls over and sinks in the swamp. I really don't need more "opportunities."