NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.
  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Sub-saveset checkpointing would be good

Posted by Preston on 2009-07-29

Generally speaking I don’t have a lot of time for NetBackup, primarily due to the lack of dependency checking. That’s right, a backup product that doesn’t ensure that fulls are kept for as long as necessary to guarantee recoverability of dependent incrementals isn’t something I enjoy using.

That being said, there are some nifty ideas within NetBackup that I’d like to see eventually make their way into NetWorker.

One of those nifty ideas is the notion of image checkpointing. To use the NetWorker vernacular, this would be sub-saveset checkpointing. The notion of checkpointing is to allow a saveset to be restarted from a point as close to the failure as possible rather than from the start. E.g., your backup may be 20GB into a 30GB filesystem and a failure occurs. With image checkpointing turned on in NetBackup, the backup won’t need to re-run the entire 20GB previously done, but will pick up from the last point in the backup that a checkpoint was taken.

I’m not saying this would be easy to implement in NetWorker. Indeed, if I were to be throwing a bunch of ideas into a group of “Trivial”, “Easy”, “Hmmm”, “Hard” and “Insanely Difficult” baskets, I’d hazard a guess that the modifications required for sub-saveset checkpointing would fall at least into the “Hard” basket.

To paraphrase a great politician though, sometimes you need to choose to do things not because they’re easy, but because they’re hard.

So, first – why is sub-saveset checkpointing important? Well, as data sizes increase, and filesystems continue to grow, having to restart the entire saveset because of a failure “somewhere” within the stream is increasingly inefficient. For the most part, we work through these issues, but as filesystems continue to grow in size and complexity, this makes it harder to hit backup windows when failures occur.

Secondly – how might sub-saveset checkpointing be done? Well, NetWorker already is capable of doing this – sort of. It’s in chunking or fragments. Long term NetWorker users will be well aware of this: savesets that had a maximum size of 2GB, and so if you were backing up a 7 GB filesystem called “/usr”, you’d get:

/usr
<1>/usr
<2>/usr
<3>/usr

In the above, “/usr” was considered the “parent” of “<1>/usr”, “<1>/usr” was the parent of “<2>/usr”, and so on. (Parent? man mminfo – read about pssid.)

Now, I’m not suggesting a whole-hearted return to this model – it’s a pain in the proverbial to parse and calculate saveset sizes, etc., and I’m sure there’s other inconveniences to it. However, it does an entry to the model we’re looking for – if needing to restart from a checkpoing, a backup could continue via a chunked/fragmented saveset.

The difficulty lays in differentiating between the “broken” part of the parent saveset chunk and the “correct” part of the child saveset chunk, which would likely require extension to at least the media database. However, I think it’s achievable given that the media database contains details about segments within savesets (i.e., file/record markers, etc.), then in theory it should be possible to include a “bad” flag so that a chunk of data at the end of a saveset chunk can be declared as bad, indicating to NetWorker that it needs to move onto the next child chunk.

It’s fair to say that most people would be happy with needing to go through a media database upgrade (i.e., a change to the structure as part of starting a new version of NetWorker) in order to get sub-saveset checkpointing.

Advertisements

Sorry, the comment form is closed at this time.

 
%d bloggers like this: