NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.
  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Sidekick is dead – lessons in data loss

Posted by Preston on 2009-10-12

The net has been rife with reports of an extreme data loss event occurring at Microsoft/Danger/T-Mobile for the Sidekick service over the weekend.

As a backup professional, this doesn’t disappoint me, it doesn’t gall me – it makes me furious on behalf of the affected users that companies would continue to take such a cavalier attitude towards enterprise data protection.

This doesn’t represent just a failure to have a backup in place (which in and of itself is more than sufficient for significant condemnation), but a lack of professionalism in the processes. I.e., there should be some serious head kicking going on regarding this, most notably regarding the following sorts of questions:

  • Why wasn’t there a backup?
  • Where was their change control that prevented the work being done due to the backup not being available?
  • Why wasn’t the system able to handle the failure of a single array?
  • When will the class action law suits start to roll in?

I don’t buy into any nonsense that maybe the backup couldn’t be done because of the amount of data and the time required to do it. That’s just a fanciful workgroup take on what should be a straight forward enterprise level of data backup. Not only that, the system was obviously not designed for redundancy at all … I’ve got (relatively, compared to MS, T-Mobile, etc) small customers using array replication so that if a SAN fails they can at least fall back to a broken off replica. Furthermore, this begs the question: For such a service, why aren’t they running a properly isolated DR site? Restoring access to data should have been as simple as altering the paths to a snapped off replica on an alternate, non-upgraded array.

This points to an utterly untrustworthy system – at the absolute best it smacks of a system where bean counters have prohibited the use of appropriate data protection and redundancy technologies for the scope of the services being provided. At worst, it smacks of an ineptly designed system, an ineptly designed set of maintenance procedures, an inept appreciation of enterprise data protection strategies, and a perhaps even level of contempt for the data of users.

(For any vendor that would wish to crow, based on the reports, that it was a Hitachi SAN that was being upgraded by Hitachi staff and therefore it’s a Hitachi problem: pull your heads in – SANs can fail, particularly during upgrade processes where human errors can creep in, and since every vendor continues to employee humans, they’re all susceptible to such catastrophic failures.)

Advertisements

2 Responses to “Sidekick is dead – lessons in data loss”

  1. There are actually people who argue that data shouldn’t be backed up because there’s too much of it? How can someone even say that with a straight face?

    And yeah, the Hitachi SAN failed. Big deal. It could have been EMC, or Sun or any other storage solution. As you asked, why wasn’t there another?

    The real tragedy here is that this completely non-trivial service was allowed to run on rubber bands and duct tape.

    • Preston said

      Some sites do have enough data that planning/configuring/budgeting for backups are far more challenging – as an example, given the volumes of data that that Large Hadron Collider is meant to be able to produce when it’s running, I’d like to know that I had a bucket load of budget in order to configure a backup environment for that.

      However, in this case, there’s no excuse. The level of data that would have needed to be backed up would have been relatively average, if not on the small side, compared to the average large corporate. So it comes back to the “no excuse” category.

      To see two major failures in such a short period of time (Sidekick, and the IBM debacle for Air New Zealand), I’ve decided to start a Hall of Shame page that will cover stunning ineptness at data protection from companies that should know better…

Sorry, the comment form is closed at this time.

 
%d bloggers like this: