NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.

Posts Tagged ‘Scripting’

Basics – Matching savesets with files on disk backup units

Posted by Preston on 2009-02-03

Generally speaking, you don’t want to be mucking around with the contents of your disk backup units except under extreme circumstances. In fact, I really recommend that you don’t do so unless you 100% know what you’re doing.

So this post is all about that 1-5% of the time where you may find it necessary to say, search for a saveset that’s reported in the media database that you’re having problems accessing from the disk backup unit.

It’s actually trivially easy, once you know how.

You may be familiar with the following style of query:

# mminfo -q "savetime>=24 hours ago" -r "volume,client,level,sumsize,savetime(22)"

The (22) at the end of the savetime report parameter tells mminfo to allow 22 characters for the reporting of the savetime. The benefit of this is that you get not only the savetime date, but also the time as well.

NetWorker actually allows you to put that (number) postfix onto any field that it can output in mminfo. This can output additional information, such as the above, or give more room to output longer fields, or even limit the size of fieldnames when you don’t need too much information. (E.g., if the first 4 characters of all client names can uniquely identify the client, you might limit the client to 4 characters in an mminfo report.)

Now, where we’re heading with this, is that the sorts of filenames used for the savesets written to disk backup units are not some random collection of strings – they’re actually the long saveset ID.

Consider then a filename of:

/d/nsr/02/63/05/cd3a182f-00000006-7b7801de-497801de-01871a00-3d2a4f4b

This isn’t just a random filename, it’s the saveset ID, but just in a format you may not used to.

To get the long saveset ID in mminfo output, we use the (number) postfix on the ssid field. This would be as follows:

# mminfo -q "ssid=071462366" -r "ssid(53)"
cd3a182f-00000006-7b7801de-497801de-01871a00-3d2a4f4b

With that information in hand, you can then search for a file with the same name as the long saveset ID on disk.

You can also do a reverse lookup. Say for instance, you know there’s an issue with a particular saveset file on a disk backup unit. To find out what the actual saveset ID is for this saveset, you can run the counter-query:

mminfo -q "ssid=cd3a182f-00000006-7b7801de-497801de-01871a00-3d2a4f4b" -r ssid

So, there you go – very easy!

Posted in Basics, NetWorker | Tagged: , , | 2 Comments »

Basics – mminfo, savetime and greater than/less than

Posted by Preston on 2009-02-01

Hi!

The text of this article should be read over at the NetWorker Information Hub.

Posted in NetWorker, Scripting | Tagged: , , | 4 Comments »

Basics – Listing files in a backup

Posted by Preston on 2009-01-27

To read this post, please go to its new location at the NetWorker Information Hub.

Posted in NetWorker, Scripting | Tagged: , , , | 5 Comments »

Instantiating savesets

Posted by Preston on 2009-01-25

Following a recent discussion I’ve been having on the NetWorker Mailing List, I thought I should put a few details down about clone IDs.

If you don’t clone your backups (and if you don’t: why not?), you may not have really encountered clone IDs very much. They’re the shadowy twin of the saveset ID, and serve a fairly important purpose.

From hereon in, I’ll use the following nomenclature:

  • SSID = Save Set ID
  • CLID = CLone ID

“SSID” is pretty much the standard NetWorker terminology for saveset ID, but usually clone ID is just written as “clone ID” or “clone-id”, etc., which gets a bit tiresome after a while.

Every saveset in NetWorker is tagged with a unique SSID. However, every copy of a saveset is tagged with the same SSID, but a different CLID.

You can see this when you ask mminfo to show both:

[root@nox ~]# mminfo -q "savetime>=18 hours ago,pool=Staging,client=archon,
name=/Volumes/TARDIS" -r volume,ssid,cloneid,nsavetime
 volume        ssid          clone id  save time
Staging-01     3962821973  1228135765 1228135764
Staging-01.RO  3962821973  1228135764 1228135764

(If you must know, being a fan of Doctor Who, all my Time Machine drives are called “TARDIS” – and no, I don’t backup my Time Machine copies with NetWorker, it would be a truly arduous and wasteful thing to do; I use my Time Machine drives for other database dumps from my Macs.)

In this case we’re not only seeing the SSID and CLID, but also a special instance of the SSID/CLID combination – that which is assigned for disk backup units. In the above example, you’ll note that the CLID associated with the read-only (.RO) version of the disk backup unit is exactly one less than the CLID associated with the read-write version of the disk backup unit. This is done by NetWorker for a very specific reason.

So, you might wonder then what the purpose of the CLID is, since we use the SSID to identify an individual saveset, right?

I had hunted for ages for a really good analogy on SSID/CLIDs, and stupidly the most obvious one never occurred to me. One of the NetWorker Mailing List’s most helpful posters, Davina Treiber, posted the (in retrospect) obvious and smartest analogy I’ve seen – comparing savesets to books in a library. To paraphrase, while a library may have multiple copies of the same book (with each copy having the same ISBN – after all, it’s the same book), they will obviously need to keep track of the individual copies of the book to know who has which copy, how many copies they have left, etc. Thus, the library would assign an individual copy number to each instance of the book they have, even if they only have one instance.

This, quite simply, is the purpose of the CLID – to identify individual instances of a single saveset. This means that you can, for example, do any of the following (and more!):

  • Clone a saveset by reading from a particular cited copy.
  • Recover from a saveset by reading from a particular cited copy.
  • Instruct NetWorker to remove from its media database reference to a particular cited copy.

In particular, in the final example, if you know that a particular tape is bad, and you want to delete that tape, you only want NetWorker to delete reference to the saveset instances on that tape – you wouldn’t want to also delete reference to perfectly good copies sitting on other tapes. Thus you would refer to SSID/CLID.

I’ve not been using the terminology SSID/CLID randomly. When working with NetWorker in a situation where you either want to, or must specify a specific instance of a saveset, you literally use that in the command. E.g.,:

# nsrclone -b “Daily Clone” -S 3962821973/1228135764

Would clone the saveset 3962821973 to the “Daily Clone” pool, using the saveset instance (CLID) 1228135764.

The same command could be specified as:

# nsrclone -b “Daily Clone” -S 3962821973

However, this would mean that NetWorker would pick which instance of the saveset to read from in order to clone the nominated saveset. The same thing happens when NetWorker is asked to perform a recovery in standard situations (i.e., non-SSID based recoveries).

So, how does NetWorker pick which instance of a saveset should be used to facilitate a recovery? The algorithm used goes a little like this:

  • If there are instances online, then the most available instance is used.
  • If there are multiple instances equally online, then the instance with the lowest CLID is requested.
  • If all instances are offline, then the instance with the lowest CLID not marked as offsite is requested.

The first point may not immediately make sense. Most available? If you say, have 2 copies on tape, and one tape is in a library, but the other is physically mounted in a tape drive, and is not in use, that tape in the drive will be used.

For the second point, consider disk backup units – adv_file type devices. In this case, both the RW and the RO “version” of the saveset (remembering, there’s only one real physical copy on disk, NetWorker just mungs some details to make it appear to the media database that there’s 2 copies) are equally online – they’re both mounted disk volumes. So, to prevent recoveries automatically running from the RW “version” of the saveset on disk, when the instances are setup, the “version” on the RO portion of the disk backup unit is assigned a CLID one less than the CLID of the “version” on the RW device.

Thus, we get “guaranteed” recovery/reading from the RO version of the disk backup unit. In normal circumstances, that is. (You can still force recovery/reading from the RW version if you so desire.)

In the final point, if all copies are equally offline, NetWorker previously just requested the copy with the lowest CLID. This works well in a tape only environment – i.e.:

  • Backup to tape
  • Clone backup to another tape
  • Send clone offsite
  • Keep ‘original’ onsite

In this scenario, NetWorker would ask for the ‘original’ by virtue of it having the lowest CLID. However, the CLID is only generated when the saveset is cloned. Thus, consider the backup to disk scenario:

  • Backup to disk
  • Clone from disk to tape
  • Send clone offsite
  • Later, when disk becomes full or savesets are too old, stage from disk to tape
  • Keep new “originals” on-site.

This created a problem – in this scenario, if you went to do a recovery after staging, then NetWorker would (annoyingly for many!) request the clone version of the saveset. This either meant requesting it to be pulled back from the offsite location, or doing a SSID/CLID recovery or marking the clone SSID/CLID as suspect or mounting the “original”. However you looked at it, it was a lot of work that you really shouldn’t have needed to do.

NetWorker 7.3.x however introduced the notion of an offsite flag; this isn’t the same as setting the volume location to offsite however. It’s literally a new flag:

# nsrmm -o offsite 800841

Would mark the volume 800841 in the media database as not being onsite – I.e., having a less desirable availability for recovery/read operations.

The net result is that in this situation, even if the offsite clone has a lower CLID, if it is flagged as offsite, but there’s a clone with a higher CLID not flagged as offsite, NetWorker will bypass that normal “use the lowest CLID” preference to instead request the onsite copy.

It would certainly be preferable however if a future version of NetWorker could have read priority established as a flag for pools; that way, rather than having to bugger around with the offsite flag (which, incidentally, can only be set/cleared from the command line, and can’t be queried!), an administrator could nominate “This pool has highest recovery priority, whereas this pool has lower recovery priority”. That way, NetWorker would pick the lowest CLID in the highest recovery priority pool.

(I wait, and hope.)

Posted in NetWorker, Scripting | Tagged: , , , | 4 Comments »