NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.
  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Posts Tagged ‘Backup’

Nybbles

Posted by Preston on 2009-11-26

If you thought in the storage blogosphere that this week had seen the second coming, you’d be right. Well, the second coming of Drobo, that is. With new connectivity options and capacity for more drives, Drobo has had so many reviews this week I’ve struggled to find non-Drobo things to read at times. (That being said, the new versions do look nifty, and with my power bills shortly to cutover to have high on-peak costs and high “not quite on-peak” costs, one or two Drobos may just do the trick as far as reducing the high number of external drives I have at any given point in time.)

Tearing myself away from non-Drobo news, over at Going Virtual, Brian has an excellent overview of NetWorker v7.6 Virtualisation Features. (I’m starting to think that the main reason why I don’t get into VCBs much though is the ongoing limited support for anything other than Windows.)

At The Backup Blog, Scott asks the perennial question, Do You Need Backup? The answer, unsurprisingly is yes – that was a given. What remains depressing is that backup consultants such as Scott and myself still need to answer that question!

StorageZilla has started Parting Shot, a fairly rapid fire mini-blogging area with frequent updates that are often great to read, so it’s worth bookmarking and returning to it frequently.

Over at PenguinPunk, Dan has been having such a hard time with Mozy that it’s making me question my continued use of them – particularly when bandwidth in Australia is often index-locked to the price of gold. [Edit: Make that has convinced me to cancel my use of them, particularly in light of a couple of recent glitches I’ve had myself with it.]

Palm continues to demonstrate why it’s a dead company walking with the latest mobile backup scare coming from their department. I’d have prepared a blog entry about it, but I don’t like blogging about dead things.

Grumpy Storage asks for comments and feedback on storage LUN sizings and standards. I think a lot of it is governed by the question “how long is a piece of string”, but there are some interesting points regarding procurement and performance that are useful to have stuck in your head next time you go to bind a LUN or plan a SAN.

Finally, the Buzzword Compliance (aka “Yawn”) award goes to this quote from Lauren Whitehouse of the “Enterprise Strategy Group” that got quoted on half a zillion websites covering EMC’s release of Avamar v5:

“Data deduplicated on tape can expire at different rates — CommVault and [IBM] TSM have a pretty good handle on that,” she said. “EMC Avamar positions the feature for very long retention, but as far as a long-term repository, it would seem to be easy for them to implement a cloud connection for EMC Avamar, given their other products like Mozy, rather than the whole dedupe-on-tape thing.”

(That Lauren quote, for the record, came from Search Storage – but it reads the same pretty much anywhere you find it.)

Honestly, Cloud Cloud Cloud. Cloud Cloud Cloud Cloud Cloud. Look, there’s a product! Why doesn’t it have Cloud!? As you can tell, my Cloud Filter is getting a little strained these days.

Don’t even get me started on the crazy assumption that just because a company owns A and B they can merge A and B with the wave of a magic wand. Getting two disparate development teams to merge disparate code in a rush, rather than as a gradual evolution, is usually akin to seeing if you can merge a face and a brick together. Sure, it’ll work, but it won’t be pretty.

Posted in Aside, General thoughts | Tagged: , , , , , , , , , | Comments Off on Nybbles

Is a “copy” a “backup”?

Posted by Preston on 2009-09-30

There’s been of discussions on various storage blogs both previously, and again now on whether a copy (e.g., a tarball, or a snapshot, etc.) is a backup. There have been arguments on both sides of the fence, and I’m going to equally contribute to those arguments now.

You see, a copy is a backup, and it’s not a backup.

It’s almost like Schrödinger’s Cat – it may be a backup, or it may not be a backup, and you won’t know for sure until you look more closely at it.

In my book, I set out early in the process to define a backup, and define it as follows:

A backup is a copy of any data that can be used to restore the data as/when required to its original form. That is, a backup is a valid copy of data, files, applications or operating systems that can be used for the purposes of recovery.

So it would seem then that I come down fairly heavily in favour of the notion that a copy is a backup. Well, yes – and no.

In the broadest sense of the term, a random copy of data such as a tarball, an rsync, a zip file, a read-only snapshot is indeed a “backup”, as it can be used, in a single instance, for the purposes of recovery. However, so too could be a binary print-out/dump of the exact state of every bit on a LUN. Few would argue though that such an arduous and manual re-entry process would really be recoverable, even though in theory it is.

The reason that it’s not really recoverable is we’re all aware of the time frames required for recovery – recoveries must be completed in a timeframe that is useful to the business (or the end user) who needs the data back. Without that, we don’t really have a backup at all – just a random copy of the data.

If we look past the broad term “backup” though, and actually evaluate the term backup system, then I would suggest that a single “backup”, unless it’s an instantiation of protection from the backup system, is not a backup at all, but instead is just a random (or pseudo-random) copy.

To me this boils down to the need to work with the notion of Information Lifecycle Protection. As you may recall, in a previous blog entry I suggested that there’s a need to break off data protection activities from ILM and define a new process that revolves around keeping data available in order to be managed by ILM. It may seem a small distinction, but it’s one which helps in these sorts of discussions. At the time I suggested that conceptually, ILP may be represented as follows:

Components of ILP

Components of ILP

Under this definition, we can cease to worry about whether a copy is a backup, because clearly, a copy will be part of an overall ILP strategy. It’s still data protection, but it doesn’t have to be backup in order to be data protection.

My personal opinion is that a single, isolated copy is technically a backup, but is logically not a backup. “Technically is” because it can be used to restore data. “Logically not” because it’s not in itself a guarantee of a correctly designed backup system. I.e., unless we can say that the copy came from the backup system, we can’t be guaranteed it’s a backup.

One last quote from my book – this time from the back page:

A well-designed backup system comes about only when several key factors coalesce: business involvement, IT acceptance, best practice designs, enterprise software and reliable hardware.

So the answer I guess to “is a copy a backup” is another question – “did the copy from a backup system?” If the answer to that question is yes, then the answer to the original question is the same. If the answer is no, we can’t reliably answer “yes” to the original question.

Posted in Backup theory | Tagged: , | Comments Off on Is a “copy” a “backup”?

What’s backup got to do with it?

Posted by Preston on 2009-09-19

Perhaps one of the most common mistakes that companies can make is to focus on their backup window. You might say this is akin to putting the cart before the horse. While the backup window is important, in a well designed backup system, it’s actually only of tertiary importance.

Here’s the actual order of importance in a backup environment:

  1. Recovery performance.
  2. Cloning (duplication) performance.
  3. Backup performance.

That is, the system must be designed to:

  1. First ensure that all data can be recovered within the required timeframes,
  2. Second ensure that all data that needs to be cloned is cloned within a suitable timeframe to allow off-siting,
  3. Third ensure that all data is backed up within the required backup window.

Obviously for environments with well considered backup windows (i.e., good reasons for the backup window requirements), the backup window should be met – there’s no questioning about that. However, meeting the backup window should not be done at the expense of impacting either the cloning window or the recovery window.

Here’s a case in point: block level backups of dense filesystems often allow for much smaller backup windows – however, due to the way that individual files are reconstructed (read from media, reconstruct in cache, copy back to filesystem), they do this at the expense of required recovery times. (This also goes to the heart of what I keep telling people about backup: test, test, test.)

The focus on the recovery performance in particular is the best possible way (logically, procedurally, best practices – however you want to consider it) to drive the entire backup system architecture. It shouldn’t be a case of how many TB per hour you want to backup, but rather, how many TB per hour you need to recover. Design the system to meet recovery performance requirements and backup will naturally follow*.

If your focus has up until now been the backup window, I suggest you zoom out so you can see the bigger picture.


* I’ll add that for the most part, your recovery performance requirements shouldn’t be “x TB per hour” or anything so arbitrary. Instead, they should be decided by your system maps and your SLAs, and instead should focus on business requirements – e.g., a much more valid recovery metric is “the eCommerce system must be recovered within 2 hours” (that would then refer to all dependencies that provide service to and access for the eCommerce system).

Posted in Backup theory | Tagged: , , , , | Comments Off on What’s backup got to do with it?

Think backup belongs in ILM? Think again

Posted by Preston on 2009-09-12

In my opinion (and after all, this is my blog), there’s a fundamental misconception in the storage industry that backup is a part of Information Lifecycle Management (ILM).

My take is that backup has nothing to do with ILM. Backup instead belongs to a sister (or shadow) activity, Information Lifecycle Protection – ILP. The comparison between the two is somewhat analogous to the comparison I made in “Backup is a Production Activity” between operational production systems and infrastructure support production systems; that is, one is directly related to the operational aspects of the data, and the other exists to support the data.

Here’s an example of what Information Lifecycle Protection would look like:

Information Lifecycle Protection

Information Lifecycle Protection

Obviously there’s some simplification going on in the above diagram – for instance, I’ve encapsulated any online storage based fault-protection into “RAID”, but it does serve to get the basic message across.

If we look at say, Wikipedia’s entry on Information Lifecycle Management, backup is mentioned as being part of the operational aspects of ILM – this is actually a fairly standard definition of the perceived position of backup within ILM; however, standard definition or not, I have to disagree.

At its heart, ILM is about ensuring correct access and lifecycle retention policies for data: neither of these core principles encapsulate the activities in information lifecycle protection. ILP on the other hand is about making sure the data remains available to meet the ILM policies. If you think this is a fine distinction to make, you’re not necessarily wrong. My point is not that there’s a huge difference, but there’s an important difference.

To me, it all boils down to a fundamental need to separate access from protection/availability, and the reason I like to maintain this separation is how it affects end users, and the level of awareness they need to have for it. In their day-to-day activities, users should have an awareness of ILM – they should know what they can and can’t access, they should know what they can and can’t delete, and they should know where they will need to access data from. They shouldn’t however need to concern themselves with RAID, they shouldn’t need to concern themselves with snapshots, they shouldn’t need to concern themselves with replication, and they shouldn’t need to concern themselves with backup.

NOTE: I do, in my book, make it quite clear that end users have a role in backup in that they must know that backup doesn’t represent a blank cheque for them to delete data willy-nilly, and that they should know how to request a recovery; however, in their day to day job activities, backups should not play a part in what they do.

Ultimately, that’s my distinction: ILM is about activities that end-users do, and ILP is about activities that are done for end-users.

Posted in Architecture, Backup theory, General Technology, General thoughts | Tagged: , , , , , , , , , | 2 Comments »

Manual backups and the “quiet” option

Posted by Preston on 2009-08-07

When you run a manual backup in NetWorker (e.g., via the “save” command), NetWorker will by default give you a file-by-file listing of what is being backed up. In theory this is helpful for manual backups, because typically we do manual backups to debug issues, not as part of the production backup process.

If you’re wanting to do a manual backup and make it as high performance as possible, there’s an option you need to use: quiet. For the save command, it’s “-q”; for the GUI, it means going and bringing up a command prompt and learning how to use save*. You can’t turn off a file-by-file listing (currently at least) in the NetWorker user backup program.

So, backing up from a Solaris system to a Linux NetWorker server, using gigabit ethernet and the same backup device (ADV_FILE) each time, here’s some examples of the impact of viewing the per-file progress of the backup. (Each backup was run three times, with the run-time averaged.)

  1. Backing up 77,822 files:
    • Without per-file listing: 53 minutes, 11 seconds.
    • With per-file listing: 55 minutes, 35 seconds.
  2. Backing up 3,710,475 files:
    • Without per-file listing: 1 hour, 38 minutes, 30 seconds.
    • With per-file listing: 1 hour, 56 minutes, 9 seconds.

The time taken to print each file will be dependent on the performance of the console device you’re using. The above tests were run from an GigE ssh session from another host to the Sun client. (For instance, this problem also occurs in recoveries: I remember once running a recovery via a Sun serial console where I waited 6 hours for the recovery to complete only to discover when all the files stopped printing that the recovery had finished hours ago.)

The simple fact is – the more intensively you want to watch status of a backup (or for that matter, a recovery), the more you directly have an impact on its performance.


* Honestly, you should anyway – see here for a good reason.

Posted in NetWorker | Tagged: , , , , | Comments Off on Manual backups and the “quiet” option

Backup is insurance

Posted by Preston on 2009-06-28

Over at The Daily WTF, there’s a story at the moment about a company that went out of business due to a developer deleting the company database for which there were no backups. Lamentably, this is still a common story. Oh, in many cases backups may actually be taken, but it’s still the case that we see situations such as:

  • Backups are never taken off-site,

or

  • Backups are never even taken out of a tape drive (i.e., constantly overwritten),

or

  • Backups are never checked.

My book is titled Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. That’s how much backup, to me, represents insurance. It’s the level of insurance necessary for any business to survive a disaster.

Failing to treat backup as insurance is unfortunately still familiar. The ever obvious-stating Gartner is frequently quoted as saying that one in three companies hit by a disaster will be unprepared and lose critical data.

I’d like to hope that within my career we’ll see that percentage shrink considerably – one in three is an unacceptably high number. One in a hundred might be more acceptable, but realistically, one in twenty would be a good number to start aiming for.

How do we aim for such an improvement? It’s remarkably simple, and comes from a few basic rules:

  • Backup is insurance, it’s not an IT process.
  • Backup requires buy-in from all aspects of a company.
  • Backup budget is sourced from the entire company, not the IT budget.
  • Company policies should prohibit deployment of new systems without a backup/recovery policy.

A good backup system comprises no more than 50% IT infrastructure and operations. The rest stems from policies, procedures, planning and awareness. Paraphrasing what I state in the introduction to my book, having backup software does not mean you have a backup system.

Posted in Backup theory, Policies | Tagged: , | Comments Off on Backup is insurance

Aside – My take on Time Machine

Posted by Preston on 2009-03-06

Introduction

Being one of those freaky weird IT people who are passionate about backups*, when Apple first previewed Mac OS X 10.5 (aka Leopard), the number one thing I of course got excited about was Time Machine. Now, before anyone tells me that it’s “just a poor rip-off of VSS”, let me be blunt – analysts who started that talk have no clue what they’re talking about.

Yes, VSS is great on Windows systems – in fact, its great to see that standard VSS functionality has reached a point in NetWorker 7.5 that it’s just part of the Windows client for filesystem backups, rather than requiring additional licenses.

But VSS in itself is not in the same league as Time Machine for end user backup – and more importantly, recovery – and quite frankly, that’s more important when we’re talking about non-server backup systems.

Evaluating it as an end-user backup system

If you’re not fully across Time Machine, here’s how it works:

  1. You plug a new or otherwise unused hard drive into your Mac.
  2. The OS asks you if you want to use that drive for Time Machine backups.
  3. You answer Yes**.

That’s all there is to getting basic Time Machine backups running. At that point, Time Machine does a full backup, then from that point onwards does incremental backups making use of hard links, thus making very efficient use of space. Backups are taken every hour, and it manages backups such that:

  • Hourly backups are kept for 24 hours.
  • Daily backups are kept for a month.
  • Weekly backups are kept until the disk becomes full.

All pruning of space is automatically handled by the OS. For the system volume at least, Time Machine is an exclusive backup product – it backs up everything by default, and you have to explicitly tell it what you want excluded from the backup. This is a Really Good Thing. However, you can go into preferences and exclude other regions (e.g., I have a “DNB” (Do Not Backup) folder on my desktop that I drop stuff into for temporary storage), or explicitly include other drives attached to the system.

Overall the settings for Time Machine are simple – very simple:

Main preferences for Time Machine

Main preferences for Time Machine

The Options button is what allows you to manage exclusions for your backups:

Options pane for Time Machine

Options pane for Time Machine

To be honest though, who cares about backup? Desktop backup products abound, and in reality what we care about is whether you can recover. Indeed, for desktop products what we care most about is whether our parents, or our grandparents, or those people down the street who ask us for technical support simply because we’re in IT, can recover. Boy, can you recover.

Time Machine presents a visually beautiful way of browsing the backups. Unfortunately we won’t see it appear in other backup products because, well, according to Steve Jobs when it was first introduced, Apple took out a lot of patents on it***. The standard recovery browser will look like the following:

Time Machine Browsing Files

Time Machine Browsing Files

Equally importantly though, Time Machine isn’t just about facilitating file level recoveries, but also recoveries of other data that it understands – such as say, mail. Now, yes, enlightened readers will point out that Apple’s Mail.app program stores mail in files and thus is easily browseable, but the files aren’t named in such a way that say, my father could work out which file needs to be recovered.

Here’s an example of what Time Machine looks like when browsing for recovery of mail:

Browsing mail with Time Machine

Browsing mail with Time Machine

To browse and retrieve email, the user simply browses through the folder structure – and the time of the backups – to pick the email(s) to be recovered. It’s incredibly intuitive, and takes less than 5 minutes to learn for the average user. As an enterprise backup consultant, honestly, I almost cried when I saw this and thought about how much of a pain message level recovery has been for so long. (Yes, getting better now, and has been for a while.)

Browsing back in time is straight forward – just scroll the mouse over the time bar on the right hand side of the screen and select the date you want:

Selecting alternate recovery time

Selecting alternate recovery time

This, quite honestly, is the epitome of simplicity. Going beyond standard backup and recovery operations, Time Machine is also an excellent disaster recovery tool – if you have serious enough issues that you need to rebuild your machine, the Mac OS X installer actually has the option of doing a rebuild and recovery from Time Machine backups.

To be blunt – as a backup utility for end users, Time Machine is an ace in the hole, and one of the most underrated features of Mac OS X.

There are some things that I think are lacking in Time Machine at the moment that will only come in time:

  1. Support for multiple backup destinations – savvy users want to be able to swap out their backup destination periodically to take it off site.
  2. Granular control of timing – some users complain that Time Machine affects the performance of their machine too much. Personally, I consider myself a power user and have not noticed it slowing me down yet, but others feel that it does, and don’t like the frequency at which it backs up. Being able to choose whether you want your most frequent backups done hourly, 2-hourly, 3-hourly, 4-hourly, etc., would be a logical enhancement to Time Machine, and one which I hope does arrive. Personally if this were available I’d more be seeking to keep daily backups for at least a month.
  3. Better application support – this actually isn’t an Apple issue at all, but one for third party software developers. Over time, I want to see any application that does database style storage, or storage where multiple files must remain consistent, to offer Time Machine integration. (The biggest failure in this respect is Microsoft Entourage – the monolithic database format makes hourly backups via Time Machine not only impractical, but unusable.)

Still, regardless of these deficiencies, Time Machine as it currently stands was a fantastic addendum to a robust operating system, one which puts easy recovery in the hands of average users.

(I have no idea what Apple intends to do with Time Machine at the server level – while Time Machine exists on Mac OS X Server, for the most part it’s to backup the server itself plus act as a repository point for machines on the LAN, much in the same way that Apple’s Time Capsule product works. However, if they added a little bit more – say, backing up multiple clients with file level deduplication across the clients, suddenly it would be very interesting.)

Comparing it to enterprise products…

Time Machine is great for providing a backup mechanism for end users, but it pales in comparison to what enterprise backup products such as NetWorker can do for an entire environment. As such, it’s not fair to compare it against those products – it’s not in their league, and it doesn’t pretend to be there. It doesn’t support remote storage, it doesn’t support true centralisation of backups, it doesn’t support removable media, … the list goes on, and on. Most importantly for any enterprise however, it doesn’t really support native backups of other operating systems. (Yes, you can shoe-horn it into say, backing up a SMB or CIFS share, but like any such form of backup, it’s not a true, integrated solution.)

As such, Time Machine isn’t something that’s going to replace your NetWorker environment. Chances are it won’t even replace your Retrospect environment. Used correctly though, it can act as a valuable enhancement in a backup environment, but if you’re a backup administrator, it isn’t going to put you out of a job today, next week, next year, or even in the next 5 years.


* Honestly, tell someone in a different discipline in IT that you specialise in data protection and that you enjoy it, and watch their eyes glaze over…

** Or in my case, since I can never resist the temptation, you answer no, and rename the disk to TARDIS, since if it’s going to be a Time Machine, it may as well be a good one.

*** Good for them. It’s tiresome watching what sometimes seems to be the entire computer industry using Apple as a free R&D centre.

Posted in Aside | Tagged: , , , , | Comments Off on Aside – My take on Time Machine

How important is it to clone?

Posted by Preston on 2009-01-29

This isn’t a topic that’s restricted just to NetWorker. It really does apply to any backup product that you’re using, regardless of the terminology involved. (E.g., for NetBackup, we’re talking duplication).

When talking to a broad audience I don’t like to make broad generalisations, but in the case of cloning, I will, and it’s this:

If your production systems backups aren’t being cloned, your backup system isn’t working.

Yes, that’s a very broad generalisation, and I tend to hear a lot of reasons why backups can’t be cloned/duplicated – time factors, cost factors, even assertions that it isn’t necessary. There may even be instances where this actually is correct – but thus far, I’ve not been convinced by anyone who isn’t cloning their production systems backups that they don’t need to.

I always think of backups as insurance – it’s literally what they are. In fact, my book is titled on that premise. So, on that basis, if you’re not cloning, it’s like taking out an insurance policy from a company that in turn doesn’t have an underwriter – i.e., they can’t guarantee being able to deliver on the insurance if you need to make a claim.

Would you really take out insurance with a company that can’t provide a guarantee they can honour a legitimate claim?

So, let’s disect the common arguments as to why cloning typically isn’t done:

Money

This is the most difficult one, and to me it speaks that the business, overall, doesn’t appreciate the role of backup. It means that the IT department is solely responsible for sourcing funding from its own budget to facilitate backup.

It means the company doesn’t get backup.

Backup is not an IT function. It’s a corporate governance function, or an operating function. It’s a function that belongs to every department. Returning to insurance, therefore, it’s something that must be funded by every department, or rather, the company as a whole. The finance department, for instance, doesn’t solely provide, out of its own departmental budget, the funding for insurance for a company. Funding for such critical, company wide expenditure comes from the entire company operating budget.

So, if you don’t have the money to clone, you have the hardest challenge – you need to convince the business that it, not IT, is responsible for backup budget, and cloning is part of that budget.

Time/Backup Window

If you’re not cloning because of the time it takes to do so, or the potential increase to the backup window (or that the backup window is already too long), then you’ve got a problem.

Typically such a problem has one of two solutions:

  • Revisit the environment – are there architectural changes that can be made to improve the processes? Are there procedural changes that can be made to improve the processes? Are backup windows arbitrary rather than meaningful? Consider the environment at hand – it may be that the solution is there, waiting to be implemented.
  • Money – sometimes the only way to make the time available is to spend money on the environment. If you’re worried about being able to spend money on the environment, revisit the previous comment on money.

Backup to another site

This is probably the most insidious reason that might be invoked for not needing to clone. It goes something like this:

We backup our production datacentre to storage/media in the business continuance/disaster recovery site. Therefore we don’t need to clone.

This argument disturbs me. It’s false for two very, very important reasons:

  • If your storage/media fails in the business continuance/disaster recovery site, you’ve lost  your historical backups anyway. E.g., think Sarbanes-Oxley.
  • If your production site fails, you only have one copy of your data left – on the backups. Not good.

In summary

There are true business imperatives why you should be cloning. At least for production systems, your backups should never represent a single point of failure to your environment, and need to be developed and maintained on the premise that they represent insurance. As such, not having a backup of your backup may be one of the worst business decisions that you could make.

Non-group cloning

If you’re looking to manage cloning outside of NetWorker groups but not wanting to write scripts, I’d suggest you check out IDATA Tools, a suite of utilities I helped to design and continue to write; included in the tools suite is a utility called sslocate, which is expressly targetted at assisting with manual cloning operations.

Posted in Policies | Tagged: , , | 10 Comments »