NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.
  • Advertisements
  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Posts Tagged ‘Disaster Recovery’

How much aren’t you backing up?

Posted by Preston on 2009-10-05

Do you have a clear picture of everything that you’re not backing up? For many sites, the answer is not as clear cut as they may think.

It’s easy to quantify the simple stuff – QA or test servers/environments that literally aren’t configured within the backup environment.

It’s also relatively easy to quantify the more esoteric things within a datacentre – PABXs, switch configurations, etc. (Though in a well run backup environment, there’s no reason why you can’t configure scripts that, as part of the backup process, logs onto such devices and retrieves the configuration, etc.)

It should also be very, very easy to quantify what data on any individual system that you’re not backing up – e.g., knowing that for fileservers you may be backing up everything except for files that have a “.mp3” extension.

What most sites find difficult to quantify is the quasi-backup situations – files and/or data that they are backing up, but which is useless in a recovery scenario. Now, many readers of that last sentence will probably think of one of the more immediate examples: live database files that are being “accidentally” picked up in the filesystem backup (even if they’re being backed up elsewhere, by a module). Yes, such a backup does fall into this category, but there are other types of backups which are even less likely to be considered.

I’m talking about information that you only need during a disaster recovery – or worse, a site disaster recovery. Let’s consider an average Unix (or Linux) system. (Windows is no different, I just want to give some command line details here.) If a physical server goes up in smoke, and a new one has to be built, there’s a couple of things that have to be considered pre-recovery:

  • What was the partition layout?
  • What disks were configured in what styles of RAID layout?

In an average backup environment, this sort of information isn’t preserved. Sure, if you’ve got say, HomeBase licenses (taking the EMC approach), or using some other sort of bare metal recovery system, and that system supports your exact environment*, then you may find that such information is preserved and is available.

But what about the high percentage of cases where it’s not?

This is where the backup process needs to be configured/extended to support generation of system or disaster recovery information. It’s all very good for instance, for a Linux machine to say that you can just recover “/etc/fstab”, but what if you can’t remember the size of the partitions referenced by that file system table? Or, what if you aren’t there to remember what the size of the partitions were? (Memory is a wonderful yet entirely fallible and human-dependent process. Disaster recovery situations shouldn’t be bound by what we can or can’t remember about the systems, and so we have to gather all the information required to support disaster recovery.)

On a running system, there’s all sorts of tools available to gather this sort of information, but when the system isn’t running, we can’t run the tools, so we need to run them in advance, either as part of the backup process or as a scheduled, checked-upon function. (My preference is to incorporate it into the backup process.)

For instance, consider that Linux scenario – we can quickly assemble the details of all partition sizes on a system with one simple command – e.g.:

[root@nox ~]# fdisk -l

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2089    16779861   fd  Linux raid autodetect
/dev/sda2            2090        2220     1052257+  82  Linux swap / Solaris
/dev/sda3            2221       19457   138456202+  fd  Linux raid autodetect
/dev/sda4           19458      121601   820471680    5  Extended
/dev/sda5           19458       19701     1959898+  82  Linux swap / Solaris
/dev/sda6           19702      121601   818511718+  fd  Linux raid autodetect

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1         250     2008093+  82  Linux swap / Solaris
/dev/sdb2             251      121601   974751907+  83  Linux

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1      121601   976760001   83  Linux

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *           1        2089    16779861   fd  Linux raid autodetect
/dev/sdd2            2090        2220     1052257+  82  Linux swap / Solaris
/dev/sdd3            2221       19457   138456202+  fd  Linux raid autodetect
/dev/sdd4           19458      121601   820471680    5  Extended
/dev/sdd5           19458       19701     1959898+  82  Linux swap / Solaris
/dev/sdd6           19702      121601   818511718+  fd  Linux raid autodetect

That wasn’t entirely hard. Scripting that to occur at the start of the backup process isn’t difficult either. For systems that have RAID, there’s another, equally simple command to extract RAID layouts as well – again, for Linux:

[root@nox ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda3[0] sdd3[1]
 138456128 blocks [2/2] [UU]

md2 : active raid1 sda6[0] sdd6[1]
 818511616 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sdd1[1]
 16779776 blocks [2/2] [UU]

unused devices: <none>

I don’t want to consume realms of pages discussing what, for each operating system you should be gathering. The average system administrator for any individual platform should, with a cup of coffee (or other preferred beverage) in hand, should be able to sit down and in under 10 minutes jot down the sorts of information that would need to be gathered in advance of a disaster to assist in the total system rebuild of an operating system of a machine they administer.

Once these information gathering steps have been determined, they can be inserted into the backup process as a pre-backup command. (In NetWorker parlance, this would be via a savepnpc “pre” script. Other backup products will equally feature such options.) Once the information is gathered, a copy should be kept on the backup server as well as in an offsite location. (I’ll give you a useful cloud backup function now: it’s called Google Mail. Great for offsiting bootstraps and system configuration details.)

When it comes to disaster recovery, such information can take the guess work or reliance on memory out of the equation, allowing a system or backup administrator in any (potentially sleep-deprived) state, with any level of knowledge about the system in question, to conduct the recovery with a much higher degree of certainty.


* Due to what they offer to do, bare metal recovery (BMR) products tend to be highly specific in which operating system variants, etc., they support. In my experience a significantly higher number of sites don’t use BMR than do.

Advertisements

Posted in Architecture, Backup theory, Linux, NetWorker, Policies, Scripting | Tagged: , , , , | 2 Comments »

Aside – My take on Time Machine

Posted by Preston on 2009-03-06

Introduction

Being one of those freaky weird IT people who are passionate about backups*, when Apple first previewed Mac OS X 10.5 (aka Leopard), the number one thing I of course got excited about was Time Machine. Now, before anyone tells me that it’s “just a poor rip-off of VSS”, let me be blunt – analysts who started that talk have no clue what they’re talking about.

Yes, VSS is great on Windows systems – in fact, its great to see that standard VSS functionality has reached a point in NetWorker 7.5 that it’s just part of the Windows client for filesystem backups, rather than requiring additional licenses.

But VSS in itself is not in the same league as Time Machine for end user backup – and more importantly, recovery – and quite frankly, that’s more important when we’re talking about non-server backup systems.

Evaluating it as an end-user backup system

If you’re not fully across Time Machine, here’s how it works:

  1. You plug a new or otherwise unused hard drive into your Mac.
  2. The OS asks you if you want to use that drive for Time Machine backups.
  3. You answer Yes**.

That’s all there is to getting basic Time Machine backups running. At that point, Time Machine does a full backup, then from that point onwards does incremental backups making use of hard links, thus making very efficient use of space. Backups are taken every hour, and it manages backups such that:

  • Hourly backups are kept for 24 hours.
  • Daily backups are kept for a month.
  • Weekly backups are kept until the disk becomes full.

All pruning of space is automatically handled by the OS. For the system volume at least, Time Machine is an exclusive backup product – it backs up everything by default, and you have to explicitly tell it what you want excluded from the backup. This is a Really Good Thing. However, you can go into preferences and exclude other regions (e.g., I have a “DNB” (Do Not Backup) folder on my desktop that I drop stuff into for temporary storage), or explicitly include other drives attached to the system.

Overall the settings for Time Machine are simple – very simple:

Main preferences for Time Machine

Main preferences for Time Machine

The Options button is what allows you to manage exclusions for your backups:

Options pane for Time Machine

Options pane for Time Machine

To be honest though, who cares about backup? Desktop backup products abound, and in reality what we care about is whether you can recover. Indeed, for desktop products what we care most about is whether our parents, or our grandparents, or those people down the street who ask us for technical support simply because we’re in IT, can recover. Boy, can you recover.

Time Machine presents a visually beautiful way of browsing the backups. Unfortunately we won’t see it appear in other backup products because, well, according to Steve Jobs when it was first introduced, Apple took out a lot of patents on it***. The standard recovery browser will look like the following:

Time Machine Browsing Files

Time Machine Browsing Files

Equally importantly though, Time Machine isn’t just about facilitating file level recoveries, but also recoveries of other data that it understands – such as say, mail. Now, yes, enlightened readers will point out that Apple’s Mail.app program stores mail in files and thus is easily browseable, but the files aren’t named in such a way that say, my father could work out which file needs to be recovered.

Here’s an example of what Time Machine looks like when browsing for recovery of mail:

Browsing mail with Time Machine

Browsing mail with Time Machine

To browse and retrieve email, the user simply browses through the folder structure – and the time of the backups – to pick the email(s) to be recovered. It’s incredibly intuitive, and takes less than 5 minutes to learn for the average user. As an enterprise backup consultant, honestly, I almost cried when I saw this and thought about how much of a pain message level recovery has been for so long. (Yes, getting better now, and has been for a while.)

Browsing back in time is straight forward – just scroll the mouse over the time bar on the right hand side of the screen and select the date you want:

Selecting alternate recovery time

Selecting alternate recovery time

This, quite honestly, is the epitome of simplicity. Going beyond standard backup and recovery operations, Time Machine is also an excellent disaster recovery tool – if you have serious enough issues that you need to rebuild your machine, the Mac OS X installer actually has the option of doing a rebuild and recovery from Time Machine backups.

To be blunt – as a backup utility for end users, Time Machine is an ace in the hole, and one of the most underrated features of Mac OS X.

There are some things that I think are lacking in Time Machine at the moment that will only come in time:

  1. Support for multiple backup destinations – savvy users want to be able to swap out their backup destination periodically to take it off site.
  2. Granular control of timing – some users complain that Time Machine affects the performance of their machine too much. Personally, I consider myself a power user and have not noticed it slowing me down yet, but others feel that it does, and don’t like the frequency at which it backs up. Being able to choose whether you want your most frequent backups done hourly, 2-hourly, 3-hourly, 4-hourly, etc., would be a logical enhancement to Time Machine, and one which I hope does arrive. Personally if this were available I’d more be seeking to keep daily backups for at least a month.
  3. Better application support – this actually isn’t an Apple issue at all, but one for third party software developers. Over time, I want to see any application that does database style storage, or storage where multiple files must remain consistent, to offer Time Machine integration. (The biggest failure in this respect is Microsoft Entourage – the monolithic database format makes hourly backups via Time Machine not only impractical, but unusable.)

Still, regardless of these deficiencies, Time Machine as it currently stands was a fantastic addendum to a robust operating system, one which puts easy recovery in the hands of average users.

(I have no idea what Apple intends to do with Time Machine at the server level – while Time Machine exists on Mac OS X Server, for the most part it’s to backup the server itself plus act as a repository point for machines on the LAN, much in the same way that Apple’s Time Capsule product works. However, if they added a little bit more – say, backing up multiple clients with file level deduplication across the clients, suddenly it would be very interesting.)

Comparing it to enterprise products…

Time Machine is great for providing a backup mechanism for end users, but it pales in comparison to what enterprise backup products such as NetWorker can do for an entire environment. As such, it’s not fair to compare it against those products – it’s not in their league, and it doesn’t pretend to be there. It doesn’t support remote storage, it doesn’t support true centralisation of backups, it doesn’t support removable media, … the list goes on, and on. Most importantly for any enterprise however, it doesn’t really support native backups of other operating systems. (Yes, you can shoe-horn it into say, backing up a SMB or CIFS share, but like any such form of backup, it’s not a true, integrated solution.)

As such, Time Machine isn’t something that’s going to replace your NetWorker environment. Chances are it won’t even replace your Retrospect environment. Used correctly though, it can act as a valuable enhancement in a backup environment, but if you’re a backup administrator, it isn’t going to put you out of a job today, next week, next year, or even in the next 5 years.


* Honestly, tell someone in a different discipline in IT that you specialise in data protection and that you enjoy it, and watch their eyes glaze over…

** Or in my case, since I can never resist the temptation, you answer no, and rename the disk to TARDIS, since if it’s going to be a Time Machine, it may as well be a good one.

*** Good for them. It’s tiresome watching what sometimes seems to be the entire computer industry using Apple as a free R&D centre.

Posted in Aside | Tagged: , , , , | Comments Off on Aside – My take on Time Machine

Offsite your bootstrap reports

Posted by Preston on 2009-02-09

I can’t stress the importance enough of getting your bootstrap reports offsite. If you don’t have a bootstrap report available and you have to rebuild your NetWorker server, then you may potentially have to scan through a lot of media to find your most recent backup.

It’s all well and good having your bootstrap report emailed to your work email address, but what happens if whatever takes out your backup server takes out your mail server as well?

There’s two things you should therefore do with your bootstrap reports:

  • Print them out and send them offsite with your media – offsite media storage companies will usually store other records for you as well, so there’s a good chance yours will, even if it is for a slightly extra fee. If you send your bootstraps offsite with your media, then in the event of a disaster recovery, a physical printout of your bootstrap report should also come back when you recall your media.
  • Email them to an external, secure email address – I recommend using a free, secure mail service, such as say, Google Mail. Doing so keeps electronic copies of the bootstrap available for easy access in the event of a disaster where internet access is still achievable even if local mail isn’t possible. Of course, the password for this account should be (a) kept secure and (b) changed every time someone who knows it leaves the company.

(Hint: if for some reason you need to generate a bootstrap report outside of the regular email, always remember you can at any time run: mminfo -B).

Obviously this should be done in conjunction with local storage methods – local email, and local printouts.

Posted in NetWorker, Policies, Scripting | Tagged: , , | Comments Off on Offsite your bootstrap reports