NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.
  • Advertisements
  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Posts Tagged ‘staging’

Manually Staging? Don’t forget the Clone ID!

Posted by Preston on 2009-11-13

Something that continues to periodically come up is the need to remind people running manual staging to ensure they specify both the SSID and the Clone ID when they stage. I did some initial coverage of this when I first started the blog, but I wanted to revisit and demonstrate exactly why this is necessary.

The short version of why is simple: If you stage by SSID alone, NetWorker will delete/purge all instances of the saveset other than the one you just created. This is Not A Good Thing for 99.999% of what we do within NetWorker.

So to demonstrate, here’s a session where I:

  1. Generate a backup
  2. Clone the backup to tape
  3. Stage the saveset only to tape

In between each step, I’ll run mminfo to get a dump of what the media database says about saveset availability.

Part 1 – Generate the Backup

Here’s a very simple backup for the purposes of this demonstration, and the subsequent mminfo command to find out about the backup:

[root@tara ~]# save -b Default -LL -q /etc
save: /etc  106 MB 00:00:07   2122 files
completed savetime=1258093549

[root@tara ~]# mminfo -q "client=tara.pmdg.lab,name=/etc" -r volume,ssid,cloneid,
savetime
 volume        ssid          clone id  date
Default.001    2600270829  1258093549 11/13/2009
Default.001.RO 2600270829  1258093548 11/13/2009

There’s nothing out of the ordinary here, so we’ll move onto the next step.

Part 2 – Clone the Backup

We’ll just do a manual clone to the Default Clone pool. Here we’ll specify the saveset ID alone, which is fine for cloning – but is often what leads people to being in the habit of not specifying a particular saveset instance. I’m using very small VTL tapes, so don’t be worried that in this case I’ve got a clone of /etc spanning 3 volumes:

[root@tara ~]# nsrclone -b "Default Clone" -S 2600270829
[root@tara ~]# mminfo -q "client=tara.pmdg.lab,name=/etc" -r volume,ssid,cloneid,
savetime
 volume        ssid          clone id  date
800843S3       2600270829  1258094164 11/13/2009
800844S3       2600270829  1258094164 11/13/2009
800845S3       2600270829  1258094164 11/13/2009
Default.001    2600270829  1258093549 11/13/2009
Default.001.RO 2600270829  1258093548 11/13/2009

As you can see there, it’s all looking fairly ordinary at this point – nothing surprising is going on at all.

Part 3 – Stage by Saveset ID Only

In this next step, I’m going to stage by saveset ID alone rather than specifying the saveset ID/clone ID, which is the correct way of staging, so as to demonstrate what happens at the conclusion of the staging. I’ll be staging to a pool called “Big”:

[root@tara ~]# nsrstage -b Big -v -m -S 2600270829
Obtaining media database information on server tara.pmdg.lab
Parsing save set id(s)
Migrating the following save sets (ids):
 2600270829
5874:nsrstage: Automatically copying save sets(s) to other volume(s)

Starting migration operation...
Nov 13 17:34:00 tara logger: NetWorker media: (waiting) Waiting for 1 writable 
volume(s) to backup pool 'Big' disk(s) or tape(s) on tara.pmdg.lab
5884:nsrstage: Successfully cloned all requested save sets
5886:nsrstage: Clones were written to the following volume(s):
 BIG991S3
6359:nsrstage: Deleting the successfully cloned save set 2600270829
Successfully deleted original clone 1258093548 of save set 2600270829 
from media database.
Successfully deleted AFTD's companion clone 1258093549 of save set 2600270829 
from media database with 0 retries.
Successfully deleted original clone 1258094164 of save set 2600270829 
from media database.
Recovering space from volume 4294740163 failed with the error 
'Cannot access volume 800844S3, please mount the volume or verify its label.'.
Refer to the NetWorker log for details.
6330:nsrstage: Cannot access volume 800844S3, please mount the volume 
or verify its label.
Completed recover space operation for volume 4177299774
Refer to the NetWorker log for any failures.
Recovering space from volume 4277962971 failed with the error 
'Cannot access volume 800845S3, please mount the volume or verify its label.'.
Refer to the NetWorker log for details.
6330:nsrstage: Cannot access volume 800845S3, please mount the volume 
or verify its label.
Recovering space from volume 16550059 failed with the error 
'Cannot access volume 800843S3, please mount the volume or verify its label.'.
Refer to the NetWorker log for details.
6330:nsrstage: Cannot access volume 800843S3, please mount the volume 
or verify its label.

You’ll note there’s a bunch of output there about being unable to access the clone volumes the saveset was previously cloned to. When we then check mminfo, we see the consequences of the staging operation though:

[root@tara ~]# mminfo -q "client=tara.pmdg.lab,name=/etc" -r volume,ssid,cloneid,
savetime
 volume        ssid          clone id  date
BIG991S3       2600270829  1258095244 11/13/2009

As you can see – no reference to the clone volumes at all!

Now, has the clone data been erased? No, but it has been removed from the media database, meaning you’d have to manually scan the volumes back in order to be able to use them again. Worse, if those volumes only contained clone data that was subsequently removed from the media database, they may become eligible for recycling and get re-used before you notice what has gone wrong!

Wrapping Up

Hopefully the above session will have demonstrated the danger of staging by saveset ID alone. If instead of staging by saveset ID we staged by saveset ID and clone ID, we’d have had a much more desirable outcome. Here’s a (short) example of that:

[root@tara ~]# save -b Default -LL -q /tmp
save: /tmp  2352 KB 00:00:01     67 files
completed savetime=1258094378
[root@tara ~]# mminfo -q "name=/tmp" -r volume,ssid,cloneid
 volume        ssid          clone id
Default.001    2583494442  1258094378
Default.001.RO 2583494442  1258094377
[root@tara ~]# nsrclone -b "Default Clone" -S 2583494442

[root@tara ~]# mminfo -q "name=/tmp" -r volume,ssid,cloneid
 volume        ssid          clone id
800845S3       2583494442  1258095244
Default.001    2583494442  1258094378
Default.001.RO 2583494442  1258094377
[root@tara ~]# nsrstage -b Big -v -m -S 2583494442/1258094377
Obtaining media database information on server tara.pmdg.lab
Parsing save set id(s)
Migrating the following save sets (ids):
 2583494442
5874:nsrstage: Automatically copying save sets(s) to other volume(s)

Starting migration operation...

5886:nsrstage: Clones were written to the following volume(s):
 BIG991S3
6359:nsrstage: Deleting the successfully cloned save set 2583494442
Successfully deleted original clone 1258094377 of save set 2583494442 from 
media database.
Successfully deleted AFTD's companion clone 1258094378 of save set 2583494442 
from media database with 0 retries.
Completed recover space operation for volume 4177299774
Refer to the NetWorker log for any failures.

[root@tara ~]# mminfo -q "name=/tmp" -r volume,ssid,cloneid
 volume        ssid          clone id
800845S3       2583494442  1258095244
BIG991S3       2583494442  1258096324

The recommendation that I always make is that you forget about using saveset IDs alone unless you absolutely have to. Instead, get yourself into the habit of always specifying a particular instance of a saveset ID via the “ssid/cloneid” option. That way, if you do any manual staging, you won’t wipe out access to data!

Advertisements

Posted in Basics, NetWorker, Scripting | Tagged: , , , , , | 2 Comments »

Staging and Connectivity Loss

Posted by Preston on 2009-10-16

For a while now I’ve been working with EMC support on an issue that’s only likely to strike sites that have intermittent connectivity between the server and storage nodes and that stage from ADV_FILE on the storage node to ADV_FILE on the server.

The crux of the problem is that if you’re staging from storage node to server and comms between the sites are lost for long enough that NetWorker:

  • Detects the storage node nsrmmd processes have failed, and
  • Attempts to restart the storage node nsrmmd processes, and
  • Fails to restart the storage node nsrmmd processes

Then you can end up in a situation where the staging aborts in an ‘interesting’ way. The first hint of the problem is that you’ll see a message such as the following in your daemon.raw:

68975 10/15/2009 09:59:05 AM  2 0 0 526402000 4495 0 tara.pmdg.lab nsrmmd filesys_nuke_ssid: unable to unlink /backup/84/05/notes/c452f569-00000006-fed6525c-4ad6525c-00051c00-dfb3d342 on device `/backup’: No such file or directory

(The above was rendered for your convenience.)

However, if you look for the cited file, you’ll find that it doesn’t exist. That’s not quite the end of the matter though. Unfortunately, while the saveset file that was being staged didn’t stay on disk, its media database details did. So in order to restart staging, it becomes necessary to first locate the saveset in question and delete the media database entry for the (failed) server disk backup unit copy. Interestingly, this is only ever to be found on the RW device, not the RO device:

[root@tara ~]# mminfo -q "ssid=c452f569-00000006-fed6525c-4ad6525c-00051c00-dfb3d342"
 volume        client       date      size   level  name
Tara.001       fawn      10/15/2009 1287 MB manual  /usr/share
Fawn.001       fawn      10/15/2009 1287 MB manual  /usr/share
Fawn.001.RO    fawn      10/15/2009 1287 MB manual  /usr/share

We had hoped that it was fixed in 7.5.1.5, but my tests aren’t showing that to be the case. Regardless, it’s certainly around in 7.4.x as well and (given the nature of it) has quite possibly been around for a while longer than that.

As I said at the outset, this isn’t likely to affect many sites, but it is something to be aware of.

Posted in Features, NetWorker | Tagged: , , , , , , | 1 Comment »

Basics – Staging

Posted by Preston on 2009-09-08

Hi!

This text of this post can be viewed over at the NetWorker Information Hub.

Posted in Basics | Tagged: , , , | Comments Off on Basics – Staging