NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.

Posts Tagged ‘nsrstage’

Manually Staging? Don’t forget the Clone ID!

Posted by Preston on 2009-11-13

Something that continues to periodically come up is the need to remind people running manual staging to ensure they specify both the SSID and the Clone ID when they stage. I did some initial coverage of this when I first started the blog, but I wanted to revisit and demonstrate exactly why this is necessary.

The short version of why is simple: If you stage by SSID alone, NetWorker will delete/purge all instances of the saveset other than the one you just created. This is Not A Good Thing for 99.999% of what we do within NetWorker.

So to demonstrate, here’s a session where I:

  1. Generate a backup
  2. Clone the backup to tape
  3. Stage the saveset only to tape

In between each step, I’ll run mminfo to get a dump of what the media database says about saveset availability.

Part 1 – Generate the Backup

Here’s a very simple backup for the purposes of this demonstration, and the subsequent mminfo command to find out about the backup:

[root@tara ~]# save -b Default -LL -q /etc
save: /etc  106 MB 00:00:07   2122 files
completed savetime=1258093549

[root@tara ~]# mminfo -q "client=tara.pmdg.lab,name=/etc" -r volume,ssid,cloneid,
savetime
 volume        ssid          clone id  date
Default.001    2600270829  1258093549 11/13/2009
Default.001.RO 2600270829  1258093548 11/13/2009

There’s nothing out of the ordinary here, so we’ll move onto the next step.

Part 2 – Clone the Backup

We’ll just do a manual clone to the Default Clone pool. Here we’ll specify the saveset ID alone, which is fine for cloning – but is often what leads people to being in the habit of not specifying a particular saveset instance. I’m using very small VTL tapes, so don’t be worried that in this case I’ve got a clone of /etc spanning 3 volumes:

[root@tara ~]# nsrclone -b "Default Clone" -S 2600270829
[root@tara ~]# mminfo -q "client=tara.pmdg.lab,name=/etc" -r volume,ssid,cloneid,
savetime
 volume        ssid          clone id  date
800843S3       2600270829  1258094164 11/13/2009
800844S3       2600270829  1258094164 11/13/2009
800845S3       2600270829  1258094164 11/13/2009
Default.001    2600270829  1258093549 11/13/2009
Default.001.RO 2600270829  1258093548 11/13/2009

As you can see there, it’s all looking fairly ordinary at this point – nothing surprising is going on at all.

Part 3 – Stage by Saveset ID Only

In this next step, I’m going to stage by saveset ID alone rather than specifying the saveset ID/clone ID, which is the correct way of staging, so as to demonstrate what happens at the conclusion of the staging. I’ll be staging to a pool called “Big”:

[root@tara ~]# nsrstage -b Big -v -m -S 2600270829
Obtaining media database information on server tara.pmdg.lab
Parsing save set id(s)
Migrating the following save sets (ids):
 2600270829
5874:nsrstage: Automatically copying save sets(s) to other volume(s)

Starting migration operation...
Nov 13 17:34:00 tara logger: NetWorker media: (waiting) Waiting for 1 writable 
volume(s) to backup pool 'Big' disk(s) or tape(s) on tara.pmdg.lab
5884:nsrstage: Successfully cloned all requested save sets
5886:nsrstage: Clones were written to the following volume(s):
 BIG991S3
6359:nsrstage: Deleting the successfully cloned save set 2600270829
Successfully deleted original clone 1258093548 of save set 2600270829 
from media database.
Successfully deleted AFTD's companion clone 1258093549 of save set 2600270829 
from media database with 0 retries.
Successfully deleted original clone 1258094164 of save set 2600270829 
from media database.
Recovering space from volume 4294740163 failed with the error 
'Cannot access volume 800844S3, please mount the volume or verify its label.'.
Refer to the NetWorker log for details.
6330:nsrstage: Cannot access volume 800844S3, please mount the volume 
or verify its label.
Completed recover space operation for volume 4177299774
Refer to the NetWorker log for any failures.
Recovering space from volume 4277962971 failed with the error 
'Cannot access volume 800845S3, please mount the volume or verify its label.'.
Refer to the NetWorker log for details.
6330:nsrstage: Cannot access volume 800845S3, please mount the volume 
or verify its label.
Recovering space from volume 16550059 failed with the error 
'Cannot access volume 800843S3, please mount the volume or verify its label.'.
Refer to the NetWorker log for details.
6330:nsrstage: Cannot access volume 800843S3, please mount the volume 
or verify its label.

You’ll note there’s a bunch of output there about being unable to access the clone volumes the saveset was previously cloned to. When we then check mminfo, we see the consequences of the staging operation though:

[root@tara ~]# mminfo -q "client=tara.pmdg.lab,name=/etc" -r volume,ssid,cloneid,
savetime
 volume        ssid          clone id  date
BIG991S3       2600270829  1258095244 11/13/2009

As you can see – no reference to the clone volumes at all!

Now, has the clone data been erased? No, but it has been removed from the media database, meaning you’d have to manually scan the volumes back in order to be able to use them again. Worse, if those volumes only contained clone data that was subsequently removed from the media database, they may become eligible for recycling and get re-used before you notice what has gone wrong!

Wrapping Up

Hopefully the above session will have demonstrated the danger of staging by saveset ID alone. If instead of staging by saveset ID we staged by saveset ID and clone ID, we’d have had a much more desirable outcome. Here’s a (short) example of that:

[root@tara ~]# save -b Default -LL -q /tmp
save: /tmp  2352 KB 00:00:01     67 files
completed savetime=1258094378
[root@tara ~]# mminfo -q "name=/tmp" -r volume,ssid,cloneid
 volume        ssid          clone id
Default.001    2583494442  1258094378
Default.001.RO 2583494442  1258094377
[root@tara ~]# nsrclone -b "Default Clone" -S 2583494442

[root@tara ~]# mminfo -q "name=/tmp" -r volume,ssid,cloneid
 volume        ssid          clone id
800845S3       2583494442  1258095244
Default.001    2583494442  1258094378
Default.001.RO 2583494442  1258094377
[root@tara ~]# nsrstage -b Big -v -m -S 2583494442/1258094377
Obtaining media database information on server tara.pmdg.lab
Parsing save set id(s)
Migrating the following save sets (ids):
 2583494442
5874:nsrstage: Automatically copying save sets(s) to other volume(s)

Starting migration operation...

5886:nsrstage: Clones were written to the following volume(s):
 BIG991S3
6359:nsrstage: Deleting the successfully cloned save set 2583494442
Successfully deleted original clone 1258094377 of save set 2583494442 from 
media database.
Successfully deleted AFTD's companion clone 1258094378 of save set 2583494442 
from media database with 0 retries.
Completed recover space operation for volume 4177299774
Refer to the NetWorker log for any failures.

[root@tara ~]# mminfo -q "name=/tmp" -r volume,ssid,cloneid
 volume        ssid          clone id
800845S3       2583494442  1258095244
BIG991S3       2583494442  1258096324

The recommendation that I always make is that you forget about using saveset IDs alone unless you absolutely have to. Instead, get yourself into the habit of always specifying a particular instance of a saveset ID via the “ssid/cloneid” option. That way, if you do any manual staging, you won’t wipe out access to data!

Posted in Basics, NetWorker, Scripting | Tagged: , , , , , | 2 Comments »

Basics – Staging

Posted by Preston on 2009-09-08

Hi!

This text of this post can be viewed over at the NetWorker Information Hub.

Posted in Basics | Tagged: , , , | Comments Off on Basics – Staging

Quibbles – Cloning and Staging

Posted by Preston on 2009-08-13

For the most part cloning and staging within NetWorker are pretty satisfactory, particularly when viewed from a combination of automated and manual operations. However, one thing that constantly drives me nuts is the inane reporting of status for cloning and staging.

Honestly, how hard can it be to design cloning and staging to accurately report the following at all times:

Cloning X of Y savesets, W GB of Z GB

or

Staging X of Y savesets, W GB of Z GB

While there have been various updates to cloning and staging reporting, and sometimes it at least updates how many savesets it has done, it continually breaks when dealing with the total amount staged/cloned in as much as it resets whenever a destination volume is changed.

Oh, and while I’m begging for this change, I will request one other – include in daemon.raw whenever cloning/staging occurs, the full list of ssid/cloneids that have been cloned or staged, and the client/saveset details for each one – not just minimal details when a failure occurs. It’s called auditing.

Posted in Quibbles | Tagged: , , , , | Comments Off on Quibbles – Cloning and Staging

7.5(.1) changed behaviour – deleting savesets from adv_file

Posted by Preston on 2009-04-25

Yesterday I wanted to delete a few savesets from a lab server I’d upgraded from 7.4.4 to 7.5.1.

Wanting to go about it quickly, I did the following:

  • I used “nsrmm -dy -S ssid” for each saveset ID I wanted to delete, to erase it from the media database.
  • I used “nsrstage -C -V volumeName” for the disk backup unit volumes to run a cleaning operation.

Imagine my surprise when, instead of seeing a chunk of space being freed up, I got a lot of the following notifications:

nsrd adv_file warning: Failed to fetch the saveset(ss_t) structure for ssid 1890993582

I got one of these for every saveset I deleted. And since I’d run a lot of tests, that was a lot of savesets. The corresponding result was that they all remained on disk. What had been a tried and true version of saveset deletion under 7.4.x and below appears to not be so useful under 7.5.1.

In the end I had to run a comparison between media database content and disk backup unit content – i.e.:

# mminfo -q "volume=volName" -r "ssid(60)"

To extract the long saveset IDs, which are in effect the names of the files stored on disk, then:

# find /path/to/volume -name -print

Then for each filename, check to see whether it existed in the media database, and if it didn’t, manually delete it. This is not something the average user should do without talking to their support people by the way, but, well, I am support people and it was a lab server…

This change is worrying enough that I’ll be running up a couple of test servers using multiple operating systems (the above happened on Linux) to see whether its reproducible or whether there was just say, some freaky accident with the media database on my lab machine.

I’ll update this post accordingly.

[Update – 2009-04-27]

Have done some more tests on 7.5.1 on various Linux servers, comparing results to 7.4.4. This is definitely changed behaviour and I don’t like it, given that it’s very common for backup administrators to delete one or two savesets here and there from disk. Chatting to EMC about it.

In the interim, here’s a workaround I’ve come up with – instead of using nsrmm -d to delete the saveset, instead run:

# nsrmm -w now -e now -S ssid

To mark the saveset as immediately recyclable. Then run “nsrim -X” to force a purge. That will work. If you have scripts though that manually delete savesets from disk backup units, you should act now to update them.

[Update – 2009-04-30]

It would appear as well that if you delete then attempt to reclaim space, NetWorker will flag the “scan required” flag for a volume. Assuming you’re 100% OK with what you’ve manually deleted and then purged from disk using rm, you can probably safely clear the flag (nsrmm -o notscan). If you’re feeling paranoid, unmount the volume, scan it, then clear the flag.

[Update – 2009-05-06]

Confirmed this isn’t present in vanilla 7.5. It seemed to occur in 7.5.1.

[Update – 2009-06-16]

Cumulative patches for 7.5.1 have been released; according to EMC support these patches include the fixes for addressing this issue, allowing a return to normal operations. If you’re having this issue, make sure you touch base with EMC support or your EMC support partner to get access to the patches. (Note: I’ve not had a chance to review the cumulative patches, so I can’t vouch for them yet.)

[Update 2009-08-11]

I forgot to update earlier; the cumulative patches (7.5.1.2 in the case of what I received) did properly incorporate the patch for this issue.

Posted in Features, NetWorker | Tagged: , , , , | 7 Comments »