NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.
  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Dealing with low-bandwidth connection to storage nodes

Posted by Preston on 2009-05-11

The fantastic thing about NetWorker is that being a three-tier architecture, a datazone may encompass far more than just a single site or datacentre. That is, you can design a system where the NetWorker server is in Sydney, and you have storage nodes in Melbourne, Adelaide, Perth, Brisbane, Darwin and Hobart. The server would be responsible for coordinating all backups and storing/retrieving data from the Sydney datacentre, and each storage node would be responsible for the storage/retrieval of backups local to that datazone.

(Or, to use a non-Australian example, you could have a datazone where your backup server is in London, and you have storage nodes in Paris, New York and Cape Town.)

When a NetWorker datazone encompasses only a single datacentre, there’s usually very little tweaking that needs to be done to the server <-> storage node communications, once they’re established. However, when we start talking about datazones that encompass WANs, we do have to take into account the level of latency between the storage nodes and the backup server.

Luckily, there’s settings within NetWorker to account for this. Specifically, there are three key settings, all maintained within the NetWorker server resource itself. These are:

  • nsrmmd polling interval
  • nsrmmd restart interval
  • nsrmmd control timeout

To view these settings in the NetWorker management console, you first have to turn on diagnostic mode. Then, right click the server (absolute topmost entry in the configuration tree) and choose “Properties”. These settings are maintained in the “Media” pane:

Controlling nsrmmd settings in NMC

Controlling nsrmmd settings in NMC

So, what do each of these settings do?

  • nsrmmd polling interval – This is the number of minutes that elapses between times that the NetWorker master process (nsrd) probes the nsrmmd to determine that it is still running. You could think of it as the heartbeat parameter. By default, this is 3 minutes.
  • nsrmmd restart interval – This is how long, in minutes, NetWorker will wait between restart attempts of an nsrmmd process. By default, this is 2 minutes.
  • nsrmmd control timeout – This is the number of minutes NetWorker waits for storage node requests to be completed. By default, this is 5 minutes.

Note that NetWorker is intelligent about this – the man pages for instance explicitly refers to “remote nsrmmd” in each of the first two options, meaning that we should expect local nsrmmd processes on the backup server itself to be dealt with faster, even if these settings are increased.

All these settings work well for regular-sized LAN-contained datazones. However, they may not be optimal in either of the following two scenarios:

  • Very busy datazones that have a large number of devices, even if they’re in the same LAN;
  • WAN-connected datazones.

In either of these scenarios, if you’re seeing periodic phases where NetWorker goes through restarting nsrmmd processes, particularly if this is happening during backups, then it’s a good idea to try to bump up these values to something more compatible with your environment.

My first recommendation, that works for most sites without any further tweaking, is to double each of the first two settings – i.e., increase nsrmmd polling interval to 6 minutes, increase nsrmmd restart interval to 4 minutes, and increase nsrmmd control timeout from 5 to 7 minutes. (I don’t think it’s usually necessary to double nsrmmd control timeout, because usually the delay in such timeouts are caused by devices, not the bandwidth of the connection, and therefore you don’t need to drastically increase the value.)

Advertisements

Sorry, the comment form is closed at this time.

 
%d bloggers like this: