This month’s T-SQL Tuesday is being hosted by @AirborneGeek – thanks Kerry!
Introduction and Backtrack on Original Post
I had originally wanted to talk about preventable deaths and injuries related to school buses NOT having underride guards installed at the back of the bus. Some background. It was becoming more of a rant as to why we haven’t mandated underride guards on school buses since the high profile death of Jayne Mansfield in 1966 – 54 years ago. It was her death and the following NHTSA recommendation which forced the trucking industry to implement the ICC bar also known as a Mansfield bar. Seeing the graphic images of modern day examples and how nothing has been done about it just wasn’t helping me deal with my own personal experiences of seeing this firsthand myself. So I decided to switch gears and talk about how doing nothing is a valid option…
“Let it Fail”
Many years ago, I worked at a place which hosted its own datacenter in the building. We used high-speed, fiber connected direct storage to a database server – two of them identically set-up. I still remember their names, Ebony and Ivory running SQL2000. We used log shipping to keep them in sync. This was a very high traffic OLTP system.
But we were running out of disk space. We were going to upgrade to SQL2008 connected to the SAN and get rid of the direct attached storage but we weren’t there yet. It was going to be a side-by-side migration.
Due to retention and security requirements of a PCI system, we couldn’t just backup then copy things over to the SAN to free up space. Plus they were upgrading the SAN too- we didn’t have the space yet. And we had a building and datacenter move on the horizon so upper management was hesitant to fiddle with anything until after the move was complete.
The log shipping was a beast. It would take over a day to re-sync (restore from backup then apply all of the t-logs since the backup (backup takes several hours), plus all of the t-logs getting generated while the restore is happening. Then this crazy dance to get it to match at the very end- I’m talking a window of only a few minutes. And if any of this fails, guess what? You get to start ALL over again. And delete old files to free up space for the next attempt. And be sure you don’t run out of disk space during the whole activity- that’s how little space we had work with.
It was during one of these putting my finger in the dike episodes and dealing with lack of disk space, my boss said,
“Todd, let it fail.”
And I’m like what? Ever fiber of my professional being screamed “NO”. And I’m like, “How can you say that?”
“Do you want to keep putting your finger in the dike and putting the business at risk because our DR environment is not sychronized? We need more space and letting it fail (can’t sync) is going to force upper management to get us the space we need.”
Of course I didn’t get his order in writing but I did it- I let it fail. True to his word, upper management freaked out and said we have to fix this immediately, we must have DR, and what do you need to fix this problem ASAP?
So they bought larger disks (with much better I/O too) and since the volumes were set-up as RAID10, all we had to do was to slowly replace each disk in the arrays, one by one and let it re-build the drive.
Afterwards, re-syncing Ebony and Ebony was much faster and we didn’t have to fiddle with disk space ever again on that system before the SQL2008 upgrade.
I think people forget that “doing nothing” is a valid option. Oftentimes we have to go to extremes to force the issue and honestly I didn’t know if that was going to turn into a career-limiting move or not. It was a gamble and it paid off. I didn’t have to waste anymore time dealing with disk space issues on those systems ever again or that sinking feeling of not having a DR system ready.