NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.
  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Routine filesystem checks on disk backup units

Posted by Preston on 2009-09-28

On Linu, filesystems typically have two settings regarding getting complete checks on boot. These are:

  • Maximum number of mounts before a check
  • Interval between checks

The default settings, while reasonably suitable for smaller partitions, are very unsuitable for large partitions, such as what you find in disk backup units. In fact, if you don’t pay particular attention to these settings, you may find after a routine reboot that your backup server (or storage node) can take hours to become available. For instance, it’s not unheard of to see even sub-20TB DBU environments (as say, 10 x 2TB filesystems) take several hours to complete mandatory checks on filesystems after what should have just been a routine reboot.

There are two approaches that you can take to this:

  • If you want to leave the checks enabled, it’s reasonably imperative to ensure that at most only one disk backup unit filesystem will be checked at one time after a reboot; this will at least reduce the size of any check-on-reboot. Thus, ensure you:
    • Configure each filesystem so that it will have a different number of maximum mounts before check than any other filesystem, and,
    • Configure the interval (days) between checks for each filesystem to be a significantly different number.
  • If you don’t want periodic filesystem checks to ever interfere with the reboot process, you need to:
    • Ensure that following a non-graceful restart of the server the DBU filesystems are unmounted and checked before any new backup or recovery activities are done, and,
    • Ensure that there are processes – planned maintenance windows if you will – for manual running of the filesystem checks that are being skipped.

Neither option is particularly “attractive”. In the first case, you can still, if you cherish uptime or don’t need to reboot your backup server often, get into a situation where multiple filesystems need to be checked on reboot if they’ve all exceeded their days-between-checks parameter. In the second instance, you’re having to insert human driven processes into what should normally be a routine operating system function. In particular with the manual option, there must be a process in place to NetWorker shutdown + checking even in the middle of the night if an OS crash occurs.

Actually, the above list is a little limited – there’s a couple of other options that you can consider as well – though they’re a little more left of field:

  • Build into the change control process the timings for complete filesystem checks in case they happen, or
  • Build into the change control process or reboot procedure for the backup server/storage nodes the requirement to temporarily disable filesystem checks (using say, tune2fs) so that you know the reboot to be done won’t be costly in terms of time.

Personally, I’m looking forward to btrfs – in reality, a modern filesystem such as that should solve most, if not all, of the problems discussed above.

Advertisements

Sorry, the comment form is closed at this time.

 
%d bloggers like this: