NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.
  • Advertisements
  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Archive for the ‘Scripting’ Category

Validcopies hazardous to your sanity

Posted by Preston on 2009-12-04

While much of NetWorker 7.6’s enhancements have been surrounding updates to virtualisation or (urgh) cloud, there remains a bunch of smaller updates that are of interest.

One of those new features is the validcopies flag, something I unfortunately failed to check out in beta testing. It looks like it could use some more work, but the theory is a good one. The idea behind validcopies is that we can use it in VTL style situations to determine not only whether we’ve got an appropriate number of copies, but they’re also valid – i.e., they’re usable by NetWorker for recovery purposes.

It’s a shame it’s too buggy to be used.

Here’s an example where I backup to an ADV_FILE type device:

[root@tara ~]# save -b Default -e "+3 weeks" -LL -q /usr/share
57777:save:Multiple client instances of tara.pmdg.lab, using the first entry
save: /usr/share  1244 MB 00:03:23  87843 files
completed savetime=1259366579

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1"
 volume        client       date      size   level  name
Default.001    tara.pmdg.lab 11/28/2009 1244 MB manual /usr/share
Default.001.RO tara.pmdg.lab 11/28/2009 1244 MB manual /usr/share

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1" -r validcopies
6095:mminfo: no matches found for the query

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1"
 volume        client       date      size   level  name
Default.001    tara.pmdg.lab 11/28/2009 1244 MB manual /usr/share
Default.001.RO tara.pmdg.lab 11/28/2009 1244 MB manual /usr/share

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1" -r validcopies
6095:mminfo: no matches found for the query

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1" -r validcopies,copies
 validcopies copies
 2     2
 2     2

I have a few problems with the above output, and am working through the bugs in validcopies with EMC. Let’s look at each of those items and see what I’m concerned about:

  1. We don’t have more than one valid copy just because it’s sitting on an ADV_FILE device. If the purpose of the “validcopies” flag is to count the number of unique recoverable copies, we do not have 2 copies for each instance on ADV_FILE. There should be some logic there to not count copies on ADV_FILE devices twice for valid copy counts.
  2. As you can see from the last two commands, the results found differ depending on report options. This is inappropriate, to say the least. We’re getting no validcopies reported at all if we only look for validcopies, or 2 validcopies reported if we search for both validcopies and copies.

Verdict from the above:

  • Don’t use validcopies for disk backup units.
  • Don’t report on validcopies only, or you’ll skew your results.

Let’s move on to VTLs though – we’ll clone the saveset I just generated to the ADV_FILE type over to the VTL:

[root@tara ~]# mminfo -q "volume=Default.001.RO" -r ssid,cloneid
 ssid         clone id
4279265459  1259366578

[root@tara ~]# nsrclone -b "Big Clone" -v -S 4279265459/1259366578
5874:nsrclone: Automatically copying save sets(s) to other volume(s)
6216:nsrclone:
Starting cloning operation...
Nov 28 11:29:42 tara logger: NetWorker media: (waiting) Waiting for 1 writable volume(s)
to backup pool 'Big Clone' tape(s) or disk(s) on tara.pmdg.lab
5884:nsrclone: Successfully cloned all requested save sets
5886:nsrclone: Clones were written to the following volume(s):
 BIG998S3

[root@tara ~]# mminfo -q "ssid=4279265459" -r validcopies
 0

[root@tara ~]# mminfo -q "ssid=4279265459" -r copies,validcopies
 copies validcopies
 3          3
 3          3
 3          3

In the above instance, if we query just by the saveset ID for the number of valid copies, NetWorker happily tells us “0”. If we query for copies and validcopies, we get 3 of each.

So, what does this say to me? Steer away from ‘validcopies’ until it’s fixed.

(On a side note, why does the offsite parameter remain Write Only? We can’t query it through mminfo, and I’ve had an RFE in since the day the offsite option was introduced into nsrmm. Why this is “hard” or taking so long is beyond me.)

Advertisements

Posted in Features, NetWorker, Scripting | Tagged: , , , | Comments Off on Validcopies hazardous to your sanity

Manually Staging? Don’t forget the Clone ID!

Posted by Preston on 2009-11-13

Something that continues to periodically come up is the need to remind people running manual staging to ensure they specify both the SSID and the Clone ID when they stage. I did some initial coverage of this when I first started the blog, but I wanted to revisit and demonstrate exactly why this is necessary.

The short version of why is simple: If you stage by SSID alone, NetWorker will delete/purge all instances of the saveset other than the one you just created. This is Not A Good Thing for 99.999% of what we do within NetWorker.

So to demonstrate, here’s a session where I:

  1. Generate a backup
  2. Clone the backup to tape
  3. Stage the saveset only to tape

In between each step, I’ll run mminfo to get a dump of what the media database says about saveset availability.

Part 1 – Generate the Backup

Here’s a very simple backup for the purposes of this demonstration, and the subsequent mminfo command to find out about the backup:

[root@tara ~]# save -b Default -LL -q /etc
save: /etc  106 MB 00:00:07   2122 files
completed savetime=1258093549

[root@tara ~]# mminfo -q "client=tara.pmdg.lab,name=/etc" -r volume,ssid,cloneid,
savetime
 volume        ssid          clone id  date
Default.001    2600270829  1258093549 11/13/2009
Default.001.RO 2600270829  1258093548 11/13/2009

There’s nothing out of the ordinary here, so we’ll move onto the next step.

Part 2 – Clone the Backup

We’ll just do a manual clone to the Default Clone pool. Here we’ll specify the saveset ID alone, which is fine for cloning – but is often what leads people to being in the habit of not specifying a particular saveset instance. I’m using very small VTL tapes, so don’t be worried that in this case I’ve got a clone of /etc spanning 3 volumes:

[root@tara ~]# nsrclone -b "Default Clone" -S 2600270829
[root@tara ~]# mminfo -q "client=tara.pmdg.lab,name=/etc" -r volume,ssid,cloneid,
savetime
 volume        ssid          clone id  date
800843S3       2600270829  1258094164 11/13/2009
800844S3       2600270829  1258094164 11/13/2009
800845S3       2600270829  1258094164 11/13/2009
Default.001    2600270829  1258093549 11/13/2009
Default.001.RO 2600270829  1258093548 11/13/2009

As you can see there, it’s all looking fairly ordinary at this point – nothing surprising is going on at all.

Part 3 – Stage by Saveset ID Only

In this next step, I’m going to stage by saveset ID alone rather than specifying the saveset ID/clone ID, which is the correct way of staging, so as to demonstrate what happens at the conclusion of the staging. I’ll be staging to a pool called “Big”:

[root@tara ~]# nsrstage -b Big -v -m -S 2600270829
Obtaining media database information on server tara.pmdg.lab
Parsing save set id(s)
Migrating the following save sets (ids):
 2600270829
5874:nsrstage: Automatically copying save sets(s) to other volume(s)

Starting migration operation...
Nov 13 17:34:00 tara logger: NetWorker media: (waiting) Waiting for 1 writable 
volume(s) to backup pool 'Big' disk(s) or tape(s) on tara.pmdg.lab
5884:nsrstage: Successfully cloned all requested save sets
5886:nsrstage: Clones were written to the following volume(s):
 BIG991S3
6359:nsrstage: Deleting the successfully cloned save set 2600270829
Successfully deleted original clone 1258093548 of save set 2600270829 
from media database.
Successfully deleted AFTD's companion clone 1258093549 of save set 2600270829 
from media database with 0 retries.
Successfully deleted original clone 1258094164 of save set 2600270829 
from media database.
Recovering space from volume 4294740163 failed with the error 
'Cannot access volume 800844S3, please mount the volume or verify its label.'.
Refer to the NetWorker log for details.
6330:nsrstage: Cannot access volume 800844S3, please mount the volume 
or verify its label.
Completed recover space operation for volume 4177299774
Refer to the NetWorker log for any failures.
Recovering space from volume 4277962971 failed with the error 
'Cannot access volume 800845S3, please mount the volume or verify its label.'.
Refer to the NetWorker log for details.
6330:nsrstage: Cannot access volume 800845S3, please mount the volume 
or verify its label.
Recovering space from volume 16550059 failed with the error 
'Cannot access volume 800843S3, please mount the volume or verify its label.'.
Refer to the NetWorker log for details.
6330:nsrstage: Cannot access volume 800843S3, please mount the volume 
or verify its label.

You’ll note there’s a bunch of output there about being unable to access the clone volumes the saveset was previously cloned to. When we then check mminfo, we see the consequences of the staging operation though:

[root@tara ~]# mminfo -q "client=tara.pmdg.lab,name=/etc" -r volume,ssid,cloneid,
savetime
 volume        ssid          clone id  date
BIG991S3       2600270829  1258095244 11/13/2009

As you can see – no reference to the clone volumes at all!

Now, has the clone data been erased? No, but it has been removed from the media database, meaning you’d have to manually scan the volumes back in order to be able to use them again. Worse, if those volumes only contained clone data that was subsequently removed from the media database, they may become eligible for recycling and get re-used before you notice what has gone wrong!

Wrapping Up

Hopefully the above session will have demonstrated the danger of staging by saveset ID alone. If instead of staging by saveset ID we staged by saveset ID and clone ID, we’d have had a much more desirable outcome. Here’s a (short) example of that:

[root@tara ~]# save -b Default -LL -q /tmp
save: /tmp  2352 KB 00:00:01     67 files
completed savetime=1258094378
[root@tara ~]# mminfo -q "name=/tmp" -r volume,ssid,cloneid
 volume        ssid          clone id
Default.001    2583494442  1258094378
Default.001.RO 2583494442  1258094377
[root@tara ~]# nsrclone -b "Default Clone" -S 2583494442

[root@tara ~]# mminfo -q "name=/tmp" -r volume,ssid,cloneid
 volume        ssid          clone id
800845S3       2583494442  1258095244
Default.001    2583494442  1258094378
Default.001.RO 2583494442  1258094377
[root@tara ~]# nsrstage -b Big -v -m -S 2583494442/1258094377
Obtaining media database information on server tara.pmdg.lab
Parsing save set id(s)
Migrating the following save sets (ids):
 2583494442
5874:nsrstage: Automatically copying save sets(s) to other volume(s)

Starting migration operation...

5886:nsrstage: Clones were written to the following volume(s):
 BIG991S3
6359:nsrstage: Deleting the successfully cloned save set 2583494442
Successfully deleted original clone 1258094377 of save set 2583494442 from 
media database.
Successfully deleted AFTD's companion clone 1258094378 of save set 2583494442 
from media database with 0 retries.
Completed recover space operation for volume 4177299774
Refer to the NetWorker log for any failures.

[root@tara ~]# mminfo -q "name=/tmp" -r volume,ssid,cloneid
 volume        ssid          clone id
800845S3       2583494442  1258095244
BIG991S3       2583494442  1258096324

The recommendation that I always make is that you forget about using saveset IDs alone unless you absolutely have to. Instead, get yourself into the habit of always specifying a particular instance of a saveset ID via the “ssid/cloneid” option. That way, if you do any manual staging, you won’t wipe out access to data!

Posted in Basics, NetWorker, Scripting | Tagged: , , , , , | 2 Comments »

Laptop/Desktop Backups as easy as 1-2-3!

Posted by Preston on 2009-10-23

When I first mentioned probe based backups a while ago, I suggested that they’re going to be a bit of a sleeper function – that is, I think they’re being largely ignored at the moment because people aren’t quite sure how to make use of them. My take however is that over time we’re going to see a lot of sites shifting particular backups over to probe groups.

Why?

Currently a lot of sites shoe-horn ill-fitting backup requirements into rigid schedules. This results in frequent violations of the best practices approach to backup of Zero Error Policies. Here’s a prime example: for those sites that need to do laptop and/or desktop backups using NetWorker, the administrators are basically resigned on those sites to having failure rates in such groups of 50% or more depending on how many machines are currently not connected to the network.

This doesn’t need to be the case – well, not any more thanks to probe based backups. So, if you’ve been scratching your head looking for a practical use for these backups, here’s something that may whet your appetite.

Scenario

Let’s consider a site where there are group of laptops and desktops that are integrated into the NetWorker backup environment. However, there’s never a guarantee of which machines may be connected to the network at any given time. Therefore administrators typically configure laptop/desktop backup groups to start at say, 10am, on the premise that the most systems are likely to be available at that time.

Theory of Resolution

Traditional time-of-day start backups aren’t really appropriate to this scenario. What we want is a situation where the NetWorker server waits for those infrequently connected clients to be connected, then runs a backup at the next opportunity.

Rather than having a single group for all clients and accepting that the group will suffer significant failure rates, split each irregularly connected client into its own group, and configure a backup probe.

The backup system will loop probes of the configured clients during nominated periods in the day/night at regular intervals. When the client is connected to the network and the probe successfully returns that (a) the client is running and (b) a backup should be done, the backup is started on the spot.

Requirements

In order to get this working, we’ll need the following:

  • NetWorker 7.5 or higher (clients and server)
  • A probe script – one per operating system type
  • A probe resource – one per operating system type
  • A 1:1 mapping between clients of this type and groups.

Practical Application

Probe Script

This is a command which is installed on the client(s), in the same directory as the “save” or “save.exe” binary (depending on OS type), and starts with either nsr or save. I’ll be calling my script:

nsrcheckbackup.sh

I don’t write Windows batch scripts. Therefore, I’ll give an example as a Linux/Unix shell script, with an overview of the program flow. Anyone who wants to write a batch script version is welcome to do so and submit it.

The “proof of concept” algorithm for the probe script works as follows:

  • Establish a “state” directory in the client nsr directory called bckchk. I.e., if the directory doesn’t exist, create it.
  • Establish a “README” file in that directory for reference purposes, if it doesn’t already exist.
  • Determine the current date.
  • Check for a previous date file. If there was a previous date file:
    • If the current date equals the previous date found:
      • Write a status file indicating that no backup is required.
      • Exit, signaling that no backup is required.
    • If the current date does not equal the previous date found:
      • Write the current date to the “previous” date file.
      • Write a status file indicating that the current date doesn’t match the “previous” date, so a new backup is required.
      • Exit, signaling that a backup is required.
  • If there wasn’t a previous date file:
    • Write the current date to the “previous” date file.
    • Write a status file indicating that no previous date was found so a backup will be signaled.
    • Exit, signaling backup should be done.

Obviously, this is a fairly simplistic approach, but is suitable for a proof of concept demonstration. If you were wishing to make the logic more robust for production deployment, my first suggestion would be to build in mminfo checks to determine (even if the dates match), whether there has been a backup “today”. If there hasn’t, that would override and force a backup to start. Additionally, if users can connect via VPN and the backup server can communicate with connected clients, you may want to introduce some logic into the script to deny probe success over the VPN.

If you were wanting a OS independent script for this, you may wish to code in Perl, but I’ve hung off doing that in this case simply because a lot of sites have reservations about installing Perl on Windows systems. (Sigh.)

Without any further guff, here’s the sample script:

preston@aralathan ~
$ cat /usr/sbin/nsrcheckbackup.sh
#!/bin/bash 

PATH=$PATH:/bin:/sbin:/usr/sbin:/usr/bin
CHKDIR=/nsr/bckchk

README=`cat <<EOF
==== Purpose of this directory ====

This directory holds state file(s) associated with the probe based
laptop/desktop backup system. These state file(s) should not be
deleted without consulting the backup administrator.
EOF
`

if [ ! -d "$CHKDIR" ]
then
   mkdir -p "$CHKDIR"
fi

if [ ! -f "$CHKDIR/README" ]
then
   echo $README > "$CHKDIR/README"
fi

DATE=`date +%Y%m%d`
COMPDATE=`date "+%Y%m%d %H%M%S"`
LASTDATE="none"
STATUS="$CHKDIR/status.txt"
CHECK="$CHKDIR/datecheck.lck"

if [ -f "$CHECK" ]
then
   LASTDATE=`cat $CHKDIR/datecheck.lck`
else
   echo $DATE > "$CHECK"
   echo "$COMPDATE Check file did not exist. Backup required" > "$STATUS"
   exit 0
fi

if [ -z "$LASTDATE" ]
then
   echo "$COMPDATE Previous check was null. Backup required" > "$STATUS"
   echo $DATE > "$CHECK"
   exit 0
fi

if [ "$DATE" = "$LASTDATE" ]
then
   echo "$COMPDATE Last backup was today. No action required" > "$STATUS"
   exit 1
else
   echo "$COMPDATE Last backup was not today. Backup required" > "$STATUS"
   echo $DATE > "$CHECK"
   exit 0
fi

As you can see, there’s really not a lot to this in the simplest form.

Once the script has been created, it should be made executable and (for Linux/Unix/Mac OS X systems), be placed in /usr/sbin.

Probe Resource

The next step is, within the NetWorker, to create a probe resource. This will be shared by all the probe clients of the same operating system type.

A completed probe resource might resemble the following:

Configuring the probe resource

Configuring the probe resource

Note that there’s no path in the above probe command – that’s because NetWorker requires the probe command to be in the same location as the save command.

Once this has been done, you can either configure the client or the probe group next. Since the client has to be reconfigured after the probe group is created, we’ll create the probe group first.

Creating the Probe Groups

First step in creating the probe groups is to come up with a standard so that they can be easily identified in relation to all other standard groups within the overall configuration. There are two approaches you can take towards this:

  • Preface each group name with a keyword (e.g., “probe”) followed by the host name the group is for.
  • Name each group after the client that will be in the group, but set a comment along the lines of say, “Probe Backup for <hostname>”.

Personally, I prefer the second option. That way you can sort by comment to easily locate all probe based groups but the group name clearly states up front which client it is for.

When creating a new probe based group, there are two tabs you’ll need to configure – Setup and Advanced – within the group configuration. Let’s look at each of these:

Probe group configuration – Setup Tab

Probe group configuration – Setup Tab

You’ll see from the above that I’m using the convention where the group name matches the client name, and the comment field is configured appropriately for easy differentiation of probe based backups.

You’ll need to set the group to having an autostart value of Enabled. Also, the Start Time field does have relevance exactly once for probe based backups – it still seems to define the first start time of the probe. After that, the probe backups will follow the interval and start/finish times defined on the second tab.

Here’s the second tab:

Probe Group - Advanced Tab

Probe Group - Advanced Tab

The key thing on this obviously is the configuration of the probe section. Let’s look at each option:

  • Probe based group – Checked
  • Probe interval – Set in minutes. My recommendation is to have each group a different number of minutes. (Or at least reduce the number of groups that have exactly the same probe interval.) That way over time as probes run, there’s less likelihood of multiple groups starting at the same time. For instance, in my test setup, I have 5 clients, set to intervals of 90 minutes, 78 minutes, 104 minutes, 82 minutes and 95 minutes*.
  • Probe start time – Time of day that probing starts. I’ve left this on the defaults, which may be suitable for desktops, but for laptops where there’s a very high chance of machines being disconnected of a night time, you may wish to start probing closer to the start of business hours.
  • Probe end time – Time of day that NetWorker stops probing the client. Same caveats as per the probe start time above.
  • Probe success criteria – Since there’s only one client per group, you can leave this at all.
  • Time since successful backup – How many days NetWorker should allow probing to run unsuccessfully before it forcibly sets a backup running. If set to zero it will never force a backup running. I’ve actually changed, since I took the screen-shot, that value, and set it to 3 on my configured clients. Set yours to a site-optimal value. Note that since the aim is to run only one backup every 24 hours, setting this to “1” is probably not all that logical an idea.

(The last field, “Time of the last successful backup” is just a status field, there’s nothing to configure there.)

If you have schedules enforced out of groups, you’ll want to set the schedule up here as well.

With this done, we’re ready to move onto the client configuration!

Configuring the Client for Probe Backups

There’s two changes required here. In the General tab of the client properties, move the client into the appropriate group:

Adding the client to the correct group

Adding the client to the correct group

In the “Apps & Modules” tab, identify the probe resource to be used for that client:

Configuring the client probe resource

Configuring the client probe resource

Once this has been done, you’ve got everything configured, and it’s just a case of sitting back and watching the probes run and trigger backups of clients as they become available. You’ll note, in the example above, that you can still use savepnpc (pre/post commands) with clients that are configured for probe backups. The pre/post commands will only be run if the backup probe confirms that a backup should take place.

Wrapping Up

I’ll accept that this configuration can result in a lot of groups if you happen to have a lot of clients that require this style of backup. However, that isn’t the end of the world. Reducing the number of errors reported in savegroup completion notifications does make the life of backup administrators easier, even if there’s a little administrative overhead.

Is this suitable for all types of clients? E.g., should you use this to shift away from standard group based backups for the servers within an environment? The answer to that is a big unlikely. I do really see this as something that is more suitable for companies that are using NetWorker to backup laptops and/or desktops (or a subset thereof).

If you think no-one does this, I can think of at least five of my customers alone who have requirements to do exactly this, and I’m sure they’re not unique.

Even if you don’t particularly need to enact this style of configuration for your site, what I’m hoping is that by demonstrating a valid use for probe based backup functionality, I may get you thinking about where it could be used at your site for making life easier.

Here’s a few examples I can immediately think of:

  • Triggering a backup+purge of Oracle archived redo logs that kick in once the used capacity of the filesystem the logs are stored on exceed a certain percentage (e.g., 85%).
  • Triggering a backup when the number of snapshots of a fileserver exceed a particular threshold.
  • Triggering a backup when the number of logged in users falls below a certain threshold. (For example, on development servers.)
  • Triggering a backup of a database server whenever a new database is added.

Trust me: probe based backups are going to make your life easier.


* There currently appears to be a “feature” with probe based backups where changes to the probe interval only take place after the next “probe start time”. I need to do some more review on this and see whether it’s (a) true and (b) warrants logging a case.

Posted in Backup theory, NetWorker, Policies, Scripting | Tagged: , , , | Comments Off on Laptop/Desktop Backups as easy as 1-2-3!

How much aren’t you backing up?

Posted by Preston on 2009-10-05

Do you have a clear picture of everything that you’re not backing up? For many sites, the answer is not as clear cut as they may think.

It’s easy to quantify the simple stuff – QA or test servers/environments that literally aren’t configured within the backup environment.

It’s also relatively easy to quantify the more esoteric things within a datacentre – PABXs, switch configurations, etc. (Though in a well run backup environment, there’s no reason why you can’t configure scripts that, as part of the backup process, logs onto such devices and retrieves the configuration, etc.)

It should also be very, very easy to quantify what data on any individual system that you’re not backing up – e.g., knowing that for fileservers you may be backing up everything except for files that have a “.mp3” extension.

What most sites find difficult to quantify is the quasi-backup situations – files and/or data that they are backing up, but which is useless in a recovery scenario. Now, many readers of that last sentence will probably think of one of the more immediate examples: live database files that are being “accidentally” picked up in the filesystem backup (even if they’re being backed up elsewhere, by a module). Yes, such a backup does fall into this category, but there are other types of backups which are even less likely to be considered.

I’m talking about information that you only need during a disaster recovery – or worse, a site disaster recovery. Let’s consider an average Unix (or Linux) system. (Windows is no different, I just want to give some command line details here.) If a physical server goes up in smoke, and a new one has to be built, there’s a couple of things that have to be considered pre-recovery:

  • What was the partition layout?
  • What disks were configured in what styles of RAID layout?

In an average backup environment, this sort of information isn’t preserved. Sure, if you’ve got say, HomeBase licenses (taking the EMC approach), or using some other sort of bare metal recovery system, and that system supports your exact environment*, then you may find that such information is preserved and is available.

But what about the high percentage of cases where it’s not?

This is where the backup process needs to be configured/extended to support generation of system or disaster recovery information. It’s all very good for instance, for a Linux machine to say that you can just recover “/etc/fstab”, but what if you can’t remember the size of the partitions referenced by that file system table? Or, what if you aren’t there to remember what the size of the partitions were? (Memory is a wonderful yet entirely fallible and human-dependent process. Disaster recovery situations shouldn’t be bound by what we can or can’t remember about the systems, and so we have to gather all the information required to support disaster recovery.)

On a running system, there’s all sorts of tools available to gather this sort of information, but when the system isn’t running, we can’t run the tools, so we need to run them in advance, either as part of the backup process or as a scheduled, checked-upon function. (My preference is to incorporate it into the backup process.)

For instance, consider that Linux scenario – we can quickly assemble the details of all partition sizes on a system with one simple command – e.g.:

[root@nox ~]# fdisk -l

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2089    16779861   fd  Linux raid autodetect
/dev/sda2            2090        2220     1052257+  82  Linux swap / Solaris
/dev/sda3            2221       19457   138456202+  fd  Linux raid autodetect
/dev/sda4           19458      121601   820471680    5  Extended
/dev/sda5           19458       19701     1959898+  82  Linux swap / Solaris
/dev/sda6           19702      121601   818511718+  fd  Linux raid autodetect

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1         250     2008093+  82  Linux swap / Solaris
/dev/sdb2             251      121601   974751907+  83  Linux

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1      121601   976760001   83  Linux

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *           1        2089    16779861   fd  Linux raid autodetect
/dev/sdd2            2090        2220     1052257+  82  Linux swap / Solaris
/dev/sdd3            2221       19457   138456202+  fd  Linux raid autodetect
/dev/sdd4           19458      121601   820471680    5  Extended
/dev/sdd5           19458       19701     1959898+  82  Linux swap / Solaris
/dev/sdd6           19702      121601   818511718+  fd  Linux raid autodetect

That wasn’t entirely hard. Scripting that to occur at the start of the backup process isn’t difficult either. For systems that have RAID, there’s another, equally simple command to extract RAID layouts as well – again, for Linux:

[root@nox ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda3[0] sdd3[1]
 138456128 blocks [2/2] [UU]

md2 : active raid1 sda6[0] sdd6[1]
 818511616 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sdd1[1]
 16779776 blocks [2/2] [UU]

unused devices: <none>

I don’t want to consume realms of pages discussing what, for each operating system you should be gathering. The average system administrator for any individual platform should, with a cup of coffee (or other preferred beverage) in hand, should be able to sit down and in under 10 minutes jot down the sorts of information that would need to be gathered in advance of a disaster to assist in the total system rebuild of an operating system of a machine they administer.

Once these information gathering steps have been determined, they can be inserted into the backup process as a pre-backup command. (In NetWorker parlance, this would be via a savepnpc “pre” script. Other backup products will equally feature such options.) Once the information is gathered, a copy should be kept on the backup server as well as in an offsite location. (I’ll give you a useful cloud backup function now: it’s called Google Mail. Great for offsiting bootstraps and system configuration details.)

When it comes to disaster recovery, such information can take the guess work or reliance on memory out of the equation, allowing a system or backup administrator in any (potentially sleep-deprived) state, with any level of knowledge about the system in question, to conduct the recovery with a much higher degree of certainty.


* Due to what they offer to do, bare metal recovery (BMR) products tend to be highly specific in which operating system variants, etc., they support. In my experience a significantly higher number of sites don’t use BMR than do.

Posted in Architecture, Backup theory, Linux, NetWorker, Policies, Scripting | Tagged: , , , , | 2 Comments »

Using NetWorker for Live Search

Posted by Preston on 2009-10-01

There was recently a discussion on the NetWorker mailing list regarding a situation whereby a company was skipping certain media files (e.g., *.mp3) from a fileserver, but still wanted to know when those files were present. In this case, the backup administrator didn’t have administrative rights to the fileserver, so doing a straight search of the fileserver wasn’t really an option.

One proposed solution was to change the “skip” to “null”, which does cause some index information to be stored that can be searched with nsrinfo. How useful that “some” is though is of debate. The reason for this is that if you “null” a file in a backup, it will only be reported via an nsrinfo -v command, and it won’t be reported with the full path to the file, meaning it’s necessary to walk the nsrinfo -v output to scan each new change of path then construct the null’d file paths from there.

As an exercise for the list, I tried this out – here’s what I reported at the time:

Creating the test area we want to test with using the following commands –

# mkdir /testing
# cp bigfile /testing
# cd /testing
# dd if=/dev/zero bs=1024k count=1024 of=test.dat

Following this, configure a /testing/.nsr directive with the following content:

<< . >>
null: test.dat

Now a backup can be run of the “/testing” directory; because of the directive, “test.dat” will be excluded.

Finding test.dat in nsrinfo output however is a little more tricky:

[root@nox testing]# nsrinfo -t `mminfo -q "name=/testing" -r nsavetime` nox
scanning client `nox' for savetime 1254265148(Wed 30 Sep 2009 08:59:08 AM EST)
from the backup namespace
/testing/.nsr
/testing/bigfile
/testing/
/
4 objects found

There’s no test.dat listed there. Resorting to ‘-v’ on nsrinfo, we do get the information, but as you can see, it’s more challenging to isolate the full path to the file:

[root@nox testing]# nsrinfo -t `mminfo -q "name=/testing" -r nsavetime` -v nox
scanning client `nox' for savetime 1254265148(Wed 30 Sep 2009 08:59:08 AM EST) from the
backup namespace
UNIX ASDF v2 file `/testing/.nsr', NSR size=188, fid = 2304.586641, file size=23
UNIX ASDF v2 file `/testing/bigfile', NSR size=196195024, fid = 2304.97665, file
 size=196176812
UNIX ASDF v2 file `/testing/', NSR size=244, fid = 2304.97664, file size=4096
 ndirentry->586641    .nsr
 ndirentry->97669    test.dat          <----- Here it is ------>
 ndirentry->2    ..
 ndirentry->97665    bigfile
UNIX ASDF v2 file `/', NSR size=772, fid = 2304.2, file size=4096
 ndirentry->3677476    .vmware/
 ndirentry->846145    mnt/
 ndirentry->97633    .autorelabel
 ndirentry->11    lost+found/
 ndirentry->2343169    lib64/
 ndirentry->2701153    media/
 ndirentry->1985185    opt/
 ndirentry->2668609    etc/
 ndirentry->1431937    sbin/
 ndirentry->4133089    srv/
 ndirentry->618337    boot/
 ndirentry->97638    .bash_history
 ndirentry->1366849    bin/
 ndirentry->2766241    selinux/
 ndirentry->3417121    tmp/
 ndirentry->97644    .autofsck
 ndirentry->585793    root/
 ndirentry->1692289    lib/
 ndirentry->97647    nsr
 ndirentry->0    sys/
 ndirentry->97648    home
 ndirentry->1887553    usr/
 ndirentry->2147905    var/
 ndirentry->2245537    d/
 ndirentry->0    dev/
 ndirentry->0    net/
 ndirentry->0    misc/
 ndirentry->0    proc/
 ndirentry->2    ..
 ndirentry->97664    testing/
4 objects found

So while this solution works, I’m not convinced it’s ideal in all instances.

The other solution I came up with I think works a little better, more reliably, and has the advantage of doing a live search on the filesystem even if the backup administrator doesn’t have administrator privileges.

So, the other solution is to make use of the RUSER/RCMD functionality in NetWorker to “have a chat” to the client daemons and get them to do something useful. Note that this is reasonably secure – you can only ask to run commands starting with “nsr” or “save”, and those commands must reside in the same directory as the save binary. In this case, we want to invoke save. All you have to do is turn off whatever directives are in place for the client, then setup a no-save command execution for the client.

In the example below, we’re going to get the client “asgard” to do a no-save backup of its /root folder, reporting back to the server “nox” but without actually transferring any data:

[root@nox testing]# RCMD="save -s nox -n /root"
[root@nox testing]# RUSER=root
[root@nox testing]# export RCMD RUSER
[root@nox testing]# nsrexec -c asgard
Warning: Could not determine job id: Connection timed out. Continuing ...
/root/.elinks/globhist
/root/.elinks/cookies
/root/.elinks/gotohist
/root/.elinks/
/root/.bash_profile
/root/.tcshrc
/root/aralathan.pub
/root/.my.cnf
/root/anaconda-ks.cfg
/root/nmsql521_win_x86.zip
/root/nw_linux_x86.tar.gz
<snip>
32477:(pid 29247):
save: /root 155 records 32 KB header 855 MB data

save: /root 855 MB estimated

This style of command will work equally for Windows and Unix systems – indeed, I’ve done similar things on both Windows and Unix.

Obviously, once you’re done gathering the file list, it’s important to then re-enable any directives turned off for the test/file walk.

Posted in NetWorker, Scripting | Tagged: , , , , | Comments Off on Using NetWorker for Live Search

Of cascading failures and the need to test

Posted by Preston on 2009-07-22

Over at Daily WTF, there’s a new story that has two facets of relevance to any backup administrator. Titled “Bourne Into Oblivion“, the key points for a backup administrator are:

  • Cascading failures.
  • Test, test, test.

In my book, I discuss the both the implications of cascading failures, and the need to test within a backup environment. Indeed, my ongoing attitude is that if you want to assume something about an untested backup, assume it’s failed. (Similarly, if you want to make an assumption about an unchecked backup, assume it failed too.)

While normally in backup, cascading failures come down to situations such as “the original failed, and the clone failed too”, this article points out a more common form of data loss through cascading failures –  the original failure coupled with backup failure.

In the article, a shell script includes the line:

rm -rf $var1/$var2

Any long-term Unix user will shudder to think of what can happen with the above script. (I’d hazard a guess that a lot of Unix users have themselves written scripts such as the above, and suffered the consequences. What we can hope for in most situations is that we do it on well backed up personal systems rather than corporate systems with inadequate data protection!)

Something I’ve seen in several sites however is the unfortunate coupling of the above shell script with the execution of said script on a host that has read/write network mounted a host of other filesystems across the corporate network. (Indeed, the first system administration group I ever worked with told me a horror story about a script with a similar command run from a system with automounts enabled under /a.)

The net result in the story at Daily WTF? Most of a corporate network wiped out by a script run with the above command where a new user hadn’t populated either $var1 or $var2, making the script instead:

rm -rf /

You could almost argue that there’s already been a cascading failure in the above – allowing scripts to be written that have the potential for that much data loss and allowing said scripts to be run on systems that mount many other systems.

The true cascading failure however was that the backup media was unusable, having been repeatedly overwritten rather than replaced. Whether this meant that the backups ran after the above incident, or that the backups couldn’t recover all required data (e.g., running an incremental on top of a tape with a previous incremental on top of a tape with a previous full, each time overwriting the previous data), or that the tapes were literally unusable due to high overuse (or indeed, all 3), the results were the same – data loss coupled with recovery loss.

With backups not being tested periodically, such errors (in some form) can creep into any environment. Obviously in the case in this article, there’s also the problem that either (a) procedures were not established regarding rotation of media or (b) procedures were not followed.

The short of it: any business that collectively thinks that either formalisation of backup processes or the rigorous checking of backups is unnecessary is just asking for data loss.

Posted in Policies, Scripting | Tagged: , | Comments Off on Of cascading failures and the need to test

Quibbles – A better NetWorker init script would be nice…

Posted by Preston on 2009-07-16

Coming primarily from a Unix background, I’ve remained disappointed for 10+ years that NetWorker’s init script has barely changed in that time. Or rather, the only things that have really changed in the script are checks for additional software – e.g., Legato License Manager.

It’s frustrating, in a pesky sort of way, that in all this time the engineers at EMC have never bothered to implement a restart function within the script.

For just about every Unix platform in NetWorker, the only arguments that /etc/init.d/networker takes are:

  • stop – stop the NetWorker services
  • start – start the NetWorker services

That’s it. Once, a long time ago, I got frustrated, and hacked on the NetWorker init script to include a restart option, one that worked with the following logic:

  1. Issued a stop command.
  2. Waited 30 seconds.
  3. Checked to see if there were any NetWorker daemons still running.
  4. If there were NetWorker daemons still running, warn the user and abort.
  5. If there were no NetWorker daemons still running, issue a start command.

Over time, I got tired of inserting this hack back into the init scripts after every upgrade or reinstall, and in time even gave up keeping it around. Call it laziness or apathy on my part, but whatever you want to label it, the same applies to first Legato, then EMC engineering for not adding this absolute basic and practically expected functionality after all these years.

Is there an RFE for this? I don’t know, but logically there should be no need for one. As systems have matured, the restart option has effectively become a default/expected setting on most platforms for init scripts. NetWorker has sadly lagged behind and still requires the administrator to manually run the stop and the start, one after the other.

A minor quibble, I know, but nevertheless a quibble.

Posted in NetWorker, Quibbles, Scripting | Tagged: , | 3 Comments »

Oracle channels and NetWorker parallelism

Posted by Preston on 2009-06-30

If you’re backing up Oracle with the NetWorker module/RMAN, there are an extremely large number of options you can choose from. RMAN, after all, is a complete backup/recovery system in and of itself, and so when you combine RMAN and NetWorker you, well, find yourself swimming in options.

One such option is the allocate channel command within RMAN. If you’ve not seen a basic RMAN script before, I should put one here for your reference:

connect target rman/supersecretpassword@DB10;

run {
 allocate channel t1 type 'SBT_TAPE';
 send 'NSR_ENV=(NSR_SAVESET_EXPIRATION=14 days,
       NSR_SERVER=nox,NSR_DATA_VOLUME_POOL=Daily)';

 backup format '/%d_%p_%t.%s/'
 (database);

 backup format '/%d_%p_%t_al.%s/'
 (archivelog from time 'SYSDATE-2');

 release channel t1;
}

You’ll note that one of the first commands used in the script is the allocate channel command. This effectively tells RMAN to open up a line of communication with NetWorker. Now, you can consider an RMAN channel to be a unit of parallelism in NetWorker parlance. Thus, if you want to backup (larger) databases with higher levels of parallelism, you need to allocate more channels.

In many NetWorker/Oracle scenarios, the NetWorker administrator has very little, if no, control over the construction and the configuration of the RMAN script. (The introduction of v5 of the module may change this.)

As a consequence, there’s often a reduced level of communication between the NetWorker administrator and the Oracle DBA which can result in reduced performance or scheduling conflicts. One particular issue that can occur though is that the Oracle DBA, eager to have the database backed up as quickly as possible, will throw a lot of allocate channel commands in. That little script above may become something such as say:

connect target rman/supersecretpassword@DB10;

run {

 allocate channel t1 type 'SBT_TAPE';
 allocate channel t2 type 'SBT_TAPE';
 allocate channel t3 type 'SBT_TAPE';
 allocate channel t4 type 'SBT_TAPE';
 allocate channel t5 type 'SBT_TAPE';
 allocate channel t6 type 'SBT_TAPE';
 allocate channel t7 type 'SBT_TAPE';
 allocate channel t8 type 'SBT_TAPE';

 send 'NSR_ENV=(NSR_SAVESET_EXPIRATION=14 days,
       NSR_SERVER=nox,NSR_DATA_VOLUME_POOL=Daily)';

 backup filesperset 4
 format '/%d_%p_%t.%s/'
 (database);

 backup format '/%d_%p_%t_al.%s/'
 (archivelog from time 'SYSDATE-2');

 release channel t1;
 release channel t2;
 release channel t3;
 release channel t4;
 release channel t5;
 release channel t6;
 release channel t7;
 release channel t8;
}

However, there’s a catch to lots of channels being allocated – channel allocation has no bearing on or is in any way impacted by NetWorker client parallelism. You see, the NetWorker client instance has a single saveset – the RMAN script name (or equivilant thereof, when using the Wizard in v5). Thus, to NetWorker, any Oracle client instance only has one saveset. Thus, that client parallelism will not affect the number of channels that can be allocated, but instead the number of simultaneous instances of the client that can be initiated.

The net result? Consider a client with parallelism of 4, that has 6 databases to be backed up. This would have 6 client instances, one per database. Assuming they’re all in the same group*, then at any one instance NetWorker will only allow the backup for 4 of those instances to be running. However, each instance, or each Oracle RMAN script, can start as many channels as it wants. If each RMAN script has been “tweaked” to allocate say, 8 channels like the above script example, this would mean that backing up 4 instances simultaneously would potentially see the client trying to send 32 savesets simultaneously to NetWorker.

Thus, if using multiple Oracle channels in RMAN backups with NetWorker, and particularly if backing up multiple Oracle databases simultaneously, it’s very important to have the NetWorker administrator and the DBA responsible for the RMAN scripts to communicate effectively and plan overall levels of parallelism/number of channels to avoid swamping the NetWorker server, swamping the network, or swamping the Oracle server.


* There are other considerations for starting multiple Oracle backups on the same machine and at the same time. In other words I’m not necessarily calling this best practice, just using an example.

Posted in NetWorker, Scripting | Tagged: , , , | Comments Off on Oracle channels and NetWorker parallelism

Basics – Important mminfo fields

Posted by Preston on 2009-05-27

Most NetWorker administrators with even a passing familiarity of mminfo will be aware of the “savetime” field, which reports when a saveset was created (i.e., when the backup was taken).

There are however some other fields that also provide additional date/time details about savesets, and knowing about them can be a real boon. Here’s a quick summary of the important date/time fields that provide information about savesets:

  • savetime – The time/date, on the client of the backup.
  • sscreate – The time/date on the server of the backup.
  • ssinsert – The time/date on the server of the last time the saveset was inserted into the media database.
  • sscomp – The time/date that the backup completed*.
  • ssaccess – The date/time that the backup was last accessed for backup or recovery purposes**.

Now, remembering that we can append, in the report specifications, a field length to any field, we can get some very useful information out of the media database for savesets. For instance, to see when the backups started and stopped for a volume, you might run:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001" -r "name,
savetime(23),sscomp(23)"
 name                               date     time          ss completed
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 12:34:18 PM

So, not only do we have the date, but also the time of both the start and the finish of the backup.

To compare the client savetime with the server savetime, we’d use the sscreate field:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001" -r "name,
savetime(23),sscreate(23)"
 name                               date     time           ss created
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:38:42 AM

Note in this second there was a 2 second skew between the backup server and the client at the time the backup was run.

I’ll leave ssinsert as an exercise to the reader – if you’ve got any recently scanned in savesets, give it a try and compare it against the output from sscreate and savetime.

However, moving on to the last field I mentioned, ssaccess, we get some very interesting results. Let’s see the output from:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001,name=/d/03/share-a/ISO" -r "name,
savetime(23),ssaccess(23)"
 name                               date     time            ss access
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:42:25 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:43:31 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:46:00 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:48:12 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:49:04 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:49:55 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:54:10 AM
(snip)

Now, if you’ve been following the thread, the above doesn’t immediately appear to make sense. On that volume there’s only one saveset, so why are we suddenly getting entries for what appears to be multiple savesets? Well, they’re not multiple savesets – let’s try it again with SSID, rather than name:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001,name=/d/03/share-a/ISO" -r "ssid,
savetime(23),ssaccess(23)"
 ssid           date     time            ss access
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:42:25 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:43:31 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:46:00 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:48:12 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:49:04 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:49:55 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:54:10 AM
(snip)

An astute reader may think I’ve got some problem with my media database at this point – only one instance of a saveset can ever appear on the same volume, so the above looks like it simply shouldn’t happen.

Here’s where it gets really interesting though. NetWorker writes savesets in fragments, and each fragment of the saveset is generated and may be accessed separately – therefore, mminfo is reporting the access time for each fragment of the saveset. We can fully see this by expanding what we’re asking mminfo to report – including fragsize, mediafile and mediarec.

[root@nox 02]# mminfo -q "volume=ISO_Archive.001" -r "savetime(23),ssaccess(23),
fragsize,mediafile,mediarec"
 date     time            ss access          size file  rec
 05/06/2009 08:38:40 AM 05/06/2009 08:42:25 AM 1040 MB   2    0
 05/06/2009 08:38:40 AM 05/06/2009 08:43:31 AM 1040 MB   3    0
 05/06/2009 08:38:40 AM 05/06/2009 08:46:00 AM 1040 MB   4    0
 05/06/2009 08:38:40 AM 05/06/2009 08:48:12 AM 1040 MB   5    0
 05/06/2009 08:38:40 AM 05/06/2009 08:49:04 AM 1040 MB   6    0
 05/06/2009 08:38:40 AM 05/06/2009 08:49:55 AM 1040 MB   7    0

Now, the man page for mminfo says that the ssaccess time is updated for both backup and recovery operations, but despite various recovery tests I can’t yet get it to update. Despite this however, this is still useful – it allows us to tell how long each fragment took to backup, which lets us interrogate, at a later point, whether there were any pauses of significant delays in the data stream.

Regardless of the little discrepancy with ssaccess, you can see that there’s a great set of options available to retrieve additional date/time related details about savesets using mminfo.

(I’ve currently got a case open with EMC to determine whether ssaccess should be updated on recovery attempts, or whether the documentation has an error. I’ll update this posting once I find out.)


* The man page for mminfo does not document whether this is server time or client time. I assume, given that savetime is client time, that sscomp is also client time.

** The man page for mminfo does not document whether this is server time or client time. I assume that it’s in server time.

Posted in NetWorker, Scripting | Tagged: , , , , , | Comments Off on Basics – Important mminfo fields

Basics – Remote Control

Posted by Preston on 2009-03-03

I learned this technique several years ago before NetWorker supported running an nsrjb command from the backup server to manipulate a jukebox on a storage node. (Previously it had required that actual nsrjb commands, when run from the command line, be run from the owner of the jukebox.)

While you don’t need it any more to control remote jukeboxes, it can still be useful to know how to do remote control operations in NetWorker – particularly if you’re say, debugging backups on a client but you don’t have console access to that client.

Note that this only applies to commands that obey the following restrictions:

  • Name starts with “nsr” or “save”
  • Resides in the same directory as the “save” command on the client.

Thus, this isn’t about remote hijacking of a NetWorker client. Such commands can only be executed by an authorised administrator on the backup server as specified in the nsr/servers file. Before you (quite rightly) point out that it would mean that a valid NetWorker administrator could in fact hijack a client by say, doing a directed recovery out to that client of an appropriately named file, they can do that already – that’s part of the trust relationship of the NetWorker administrator anyway.

So, with all those caveats out of the way, it’s remarkably simple. Going from a Unix host, you do the following:

# export RUSER=<user>
# export RCMD=<cmd>
# nsrexec -c <client>

Where:

  • user will typically be “root” or “administrator”, depending on whether your backup server is Unix or Windows.
  • cmd will be a NetWorker command you want executed.
  • clientName is the name of the client the command is to be executed from.

For instance, say I’ve got a NetWorker server called “nox”, and a client called “asgard” that I don’t have administrative login to, but I want to simulate a backup without firing up a savegroup. To do so, I could do the following:

# export RUSER=root
# export RCMD="save -e tomorrow -b Default -LL -q -s nox /tmp"
# nsrexec -c asgard
save: /tmp  36 MB 00:00:05     38 files
completed savetime=1235278395

Admittedly this is not a technique you should need to know often, but it’s useful to know about.


* You do populate your nsr/servers file, don’t you? If you don’t, go do it NOW. I mean it, stop what you’re doing, go and fix up the nsr/servers file on every client in your environment!

Posted in Basics, Scripting | Tagged: , | Comments Off on Basics – Remote Control