NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.

Posts Tagged ‘mminfo’

Validcopies hazardous to your sanity

Posted by Preston on 2009-12-04

While much of NetWorker 7.6’s enhancements have been surrounding updates to virtualisation or (urgh) cloud, there remains a bunch of smaller updates that are of interest.

One of those new features is the validcopies flag, something I unfortunately failed to check out in beta testing. It looks like it could use some more work, but the theory is a good one. The idea behind validcopies is that we can use it in VTL style situations to determine not only whether we’ve got an appropriate number of copies, but they’re also valid – i.e., they’re usable by NetWorker for recovery purposes.

It’s a shame it’s too buggy to be used.

Here’s an example where I backup to an ADV_FILE type device:

[root@tara ~]# save -b Default -e "+3 weeks" -LL -q /usr/share
57777:save:Multiple client instances of tara.pmdg.lab, using the first entry
save: /usr/share  1244 MB 00:03:23  87843 files
completed savetime=1259366579

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1"
 volume        client       date      size   level  name
Default.001    tara.pmdg.lab 11/28/2009 1244 MB manual /usr/share
Default.001.RO tara.pmdg.lab 11/28/2009 1244 MB manual /usr/share

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1" -r validcopies
6095:mminfo: no matches found for the query

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1"
 volume        client       date      size   level  name
Default.001    tara.pmdg.lab 11/28/2009 1244 MB manual /usr/share
Default.001.RO tara.pmdg.lab 11/28/2009 1244 MB manual /usr/share

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1" -r validcopies
6095:mminfo: no matches found for the query

[root@tara ~]# mminfo -q "name=/usr/share,validcopies>1" -r validcopies,copies
 validcopies copies
 2     2
 2     2

I have a few problems with the above output, and am working through the bugs in validcopies with EMC. Let’s look at each of those items and see what I’m concerned about:

  1. We don’t have more than one valid copy just because it’s sitting on an ADV_FILE device. If the purpose of the “validcopies” flag is to count the number of unique recoverable copies, we do not have 2 copies for each instance on ADV_FILE. There should be some logic there to not count copies on ADV_FILE devices twice for valid copy counts.
  2. As you can see from the last two commands, the results found differ depending on report options. This is inappropriate, to say the least. We’re getting no validcopies reported at all if we only look for validcopies, or 2 validcopies reported if we search for both validcopies and copies.

Verdict from the above:

  • Don’t use validcopies for disk backup units.
  • Don’t report on validcopies only, or you’ll skew your results.

Let’s move on to VTLs though – we’ll clone the saveset I just generated to the ADV_FILE type over to the VTL:

[root@tara ~]# mminfo -q "volume=Default.001.RO" -r ssid,cloneid
 ssid         clone id
4279265459  1259366578

[root@tara ~]# nsrclone -b "Big Clone" -v -S 4279265459/1259366578
5874:nsrclone: Automatically copying save sets(s) to other volume(s)
6216:nsrclone:
Starting cloning operation...
Nov 28 11:29:42 tara logger: NetWorker media: (waiting) Waiting for 1 writable volume(s)
to backup pool 'Big Clone' tape(s) or disk(s) on tara.pmdg.lab
5884:nsrclone: Successfully cloned all requested save sets
5886:nsrclone: Clones were written to the following volume(s):
 BIG998S3

[root@tara ~]# mminfo -q "ssid=4279265459" -r validcopies
 0

[root@tara ~]# mminfo -q "ssid=4279265459" -r copies,validcopies
 copies validcopies
 3          3
 3          3
 3          3

In the above instance, if we query just by the saveset ID for the number of valid copies, NetWorker happily tells us “0”. If we query for copies and validcopies, we get 3 of each.

So, what does this say to me? Steer away from ‘validcopies’ until it’s fixed.

(On a side note, why does the offsite parameter remain Write Only? We can’t query it through mminfo, and I’ve had an RFE in since the day the offsite option was introduced into nsrmm. Why this is “hard” or taking so long is beyond me.)

Posted in Features, NetWorker, Scripting | Tagged: , , , | Comments Off on Validcopies hazardous to your sanity

Preventing users seeing backups from other hosts

Posted by Preston on 2009-09-24

Something I’ve seen a few people complain about – and indeed that I’ve also complained about in the past, is that in high security environments, NetWorker allows end users on one host to be able to see the backups done for other hosts. This is obviously a security concern.

After a brief discussion with EMC, it was also obviously something that is readily changeable with only a couple of clicks of the mouse button – so I feel somewhat sheepish that I hadn’t picked up on it before. All you have to do is take away the “Monitor NetWorker” privilege from the Users usergroup.

Here’s the (to some environments) offending setting:

Monitor users privilege

Monitor users privilege

Once that setting is unchecked, end users won’t be able to view the backups for other hosts – just their own.

Posted in NetWorker, Security | Tagged: | 4 Comments »

Top 5 Reflections

Posted by Preston on 2009-07-17

So this morning I was looking through the stats for this blog, and I generated the list of most popular posts thus far. I can’t say any of the results surprised me. Every single one of the top 5 comes from the “Basics” series.

Number 5, on that list, was Basics – Listing Files in a backup. There’s a lot of people out there who want to know how to use nsrinfo in general, and specifically want to know about pulling file lists for savesets. Net result? I think it would be greatly beneficial if in NMC users could double-click on browsable savesets and get a complete listing of files therein.

Number 4 was Basics – mminfo, savetime and greater than/less than. Now, I’m not going to pretend that every person who visited that article was looking for details about how greater than and less than works in mminfo in relation to savetimes, though I suspect a reasonable percentage of people new to mminfo found that interesting. My take on it is that it proves there’s not really enough documentation about mminfo, and that mminfo needs some expansion. My personal preference? Having a full SQL-like query engine for mminfo would greatly expand the options available to NetWorker administrators.

Number 3 on the list is Basics – Changing saveset browse/retention times. As regularly as possible I try to check the search strings that have brought people to my blog (as recorded by wordpress), and I can practically guarantee that every day there are multiple combinations to do with savesets, browse and retention times. Sometimes those combinations reference nsrmm, sometimes they don’t. Clearly, extending saveset browse/retention times in NetWorker needs to be more manageable from within the GUI as a bare minimum. I’ll get to the command line in a moment.

Moving on to number 2, we have something that I get search results for every day without fail. That’s Basics – Fixing “NSR Peer information” errors. It’s actually a reasonably simple error to fix, but sometimes finding the information about it is a bit like the old needle-in-a-haystack. I’m hoping that the posting on it has helped quite a few sites to clear out the warnings/errors in their logs and reduce the amount of clutter being reported.

Finally, for number 1, a topic I’m completely unsurprised to see at the top, we have Basics – Parallelism in NetWorker. Not because it’s difficult, but because there’s no absolute rules, parallelism is a topic in NetWorker that many administrators, regardless of length of time with the product, find challenging at times. Set too low, and backups may overrun. Set too high, and device contention, client slow-downs, recovery performance issues, etc., may come into play. Tuning parallelism in NetWorker has to take a lot into account.

The content of this list suggests a few things to me:

  • None of this information is out of reach in the product manuals, but, since the product manuals are (necessarily) lengthy, it is logistically is out of reach for a lot of users who don’t have time to read lengthy manuals.
  • EMC product management could take a few tips from the top 5 articles on my blog – I think they represent areas that could be improved within usability of the product. While parallelism is not something that can “solved” by changes within the GUI (it is, by necessity, complex), other options, such as improving mminfo search, making saveset contents more accessible within the GUI, etc., are readily fixable.
  • It seems there might be scope for a “Getting Started with NetWorker” style manual. I think a traditional book would (a) be too expensive and (b) be unsuitable. This is the sort of information that people want readily to hand on their desktops.

On the last point, I’m interested in writing such a manual. I obviously have some experience with writing – but more so than just the book, over the years I’ve written literally thousands of pages of NetWorker instructions as part of professional services documentation, training courses, etc.

So here’s a question – would people be interested in say, an eBook along the lines of “Getting Started with NetWorker” that gives basic operational and instruction usage so that rather than having to wade through the (close to 1000+) pages of the official documentation they had something shorter, and geared towards day to day operation?

Let me know what you think.

Posted in Basics, General thoughts, NetWorker | Tagged: , , , , , , , , , , | 10 Comments »

Basics – Important mminfo fields

Posted by Preston on 2009-05-27

Most NetWorker administrators with even a passing familiarity of mminfo will be aware of the “savetime” field, which reports when a saveset was created (i.e., when the backup was taken).

There are however some other fields that also provide additional date/time details about savesets, and knowing about them can be a real boon. Here’s a quick summary of the important date/time fields that provide information about savesets:

  • savetime – The time/date, on the client of the backup.
  • sscreate – The time/date on the server of the backup.
  • ssinsert – The time/date on the server of the last time the saveset was inserted into the media database.
  • sscomp – The time/date that the backup completed*.
  • ssaccess – The date/time that the backup was last accessed for backup or recovery purposes**.

Now, remembering that we can append, in the report specifications, a field length to any field, we can get some very useful information out of the media database for savesets. For instance, to see when the backups started and stopped for a volume, you might run:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001" -r "name,
savetime(23),sscomp(23)"
 name                               date     time          ss completed
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 12:34:18 PM

So, not only do we have the date, but also the time of both the start and the finish of the backup.

To compare the client savetime with the server savetime, we’d use the sscreate field:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001" -r "name,
savetime(23),sscreate(23)"
 name                               date     time           ss created
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:38:42 AM

Note in this second there was a 2 second skew between the backup server and the client at the time the backup was run.

I’ll leave ssinsert as an exercise to the reader – if you’ve got any recently scanned in savesets, give it a try and compare it against the output from sscreate and savetime.

However, moving on to the last field I mentioned, ssaccess, we get some very interesting results. Let’s see the output from:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001,name=/d/03/share-a/ISO" -r "name,
savetime(23),ssaccess(23)"
 name                               date     time            ss access
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:42:25 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:43:31 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:46:00 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:48:12 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:49:04 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:49:55 AM
/d/03/share-a/ISO               05/06/2009 08:38:40 AM 05/06/2009 08:54:10 AM
(snip)

Now, if you’ve been following the thread, the above doesn’t immediately appear to make sense. On that volume there’s only one saveset, so why are we suddenly getting entries for what appears to be multiple savesets? Well, they’re not multiple savesets – let’s try it again with SSID, rather than name:

[root@nox ~]# mminfo -q "volume=ISO_Archive.001,name=/d/03/share-a/ISO" -r "ssid,
savetime(23),ssaccess(23)"
 ssid           date     time            ss access
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:42:25 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:43:31 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:46:00 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:48:12 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:49:04 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:49:55 AM
67158002    05/06/2009 08:38:40 AM 05/06/2009 08:54:10 AM
(snip)

An astute reader may think I’ve got some problem with my media database at this point – only one instance of a saveset can ever appear on the same volume, so the above looks like it simply shouldn’t happen.

Here’s where it gets really interesting though. NetWorker writes savesets in fragments, and each fragment of the saveset is generated and may be accessed separately – therefore, mminfo is reporting the access time for each fragment of the saveset. We can fully see this by expanding what we’re asking mminfo to report – including fragsize, mediafile and mediarec.

[root@nox 02]# mminfo -q "volume=ISO_Archive.001" -r "savetime(23),ssaccess(23),
fragsize,mediafile,mediarec"
 date     time            ss access          size file  rec
 05/06/2009 08:38:40 AM 05/06/2009 08:42:25 AM 1040 MB   2    0
 05/06/2009 08:38:40 AM 05/06/2009 08:43:31 AM 1040 MB   3    0
 05/06/2009 08:38:40 AM 05/06/2009 08:46:00 AM 1040 MB   4    0
 05/06/2009 08:38:40 AM 05/06/2009 08:48:12 AM 1040 MB   5    0
 05/06/2009 08:38:40 AM 05/06/2009 08:49:04 AM 1040 MB   6    0
 05/06/2009 08:38:40 AM 05/06/2009 08:49:55 AM 1040 MB   7    0

Now, the man page for mminfo says that the ssaccess time is updated for both backup and recovery operations, but despite various recovery tests I can’t yet get it to update. Despite this however, this is still useful – it allows us to tell how long each fragment took to backup, which lets us interrogate, at a later point, whether there were any pauses of significant delays in the data stream.

Regardless of the little discrepancy with ssaccess, you can see that there’s a great set of options available to retrieve additional date/time related details about savesets using mminfo.

(I’ve currently got a case open with EMC to determine whether ssaccess should be updated on recovery attempts, or whether the documentation has an error. I’ll update this posting once I find out.)


* The man page for mminfo does not document whether this is server time or client time. I assume, given that savetime is client time, that sscomp is also client time.

** The man page for mminfo does not document whether this is server time or client time. I assume that it’s in server time.

Posted in NetWorker, Scripting | Tagged: , , , , , | Comments Off on Basics – Important mminfo fields

Finding files in your indices

Posted by Preston on 2009-05-23

Everyone has had that horror recovery scenario, where a user wants a file recovered, but they can’t tell you where the file was, or even on what machine it was stored. You can find this information out through a series of mminfo and nsrinfo commands, or, if you’re in a hurry and you have IDATA Tools installed, you can run the find-files utility to quickly locate it.

Say for instance I’ve got a user who lost the file “Safari4.0BetaLeo.dmg” somewhere between 6 and 1 week ago on either the machine archon or aralathan. To find where this file may be located in backups, one would run the following command:

[root@nox nsr]# find-files -c archon,aralathan -S "6 weeks ago" -F "last week" 
-f Safari4.0BetaLeo.dmg
=== Probe backups ===
    aralathan
    archon

=== Search for Safari4.0BetaLeo.dmg ===
    Check aralathan, 20 savesets to check
    Check archon, 8 savesets to check

=== Results ===
aralathan:/ @ 04/24/2009 23:45 (384942702)
Volumes: Staging-01, Staging-01.RO
    /Users/preston/Desktop/* Incoming/Safari4.0BetaLeo.dmg

archon:/ @ 04/25/2009 04:27 (15860863)
Volumes: Staging-01, Staging-01.RO
    /Users/preston/Desktop/DNB/Safari4.0BetaLeo.dmg

As I mentioned before, you can run mminfo and nsrinfo queries yourself to do this, but having a tool there just waiting for you to point it in the right direction can be a time-saving boon.

Posted in NetWorker | Tagged: , , | Comments Off on Finding files in your indices

Basics – mmlocate vs ‘offsite’ flag

Posted by Preston on 2009-02-08

NetWorker has long supported a volume location field; this can be shown in the GUI, and can be set and reported on via the command line tool, ‘mmlocate’.

One of the most typical ways that mmlocate is used is to set that a volume’s location is “Offsite”. For example:

# mmlocate -u -n 800841 Offsite

Thus, when you look at the volume in the GUI (or run the command: mmlocate -l Offsite), you’re able to see that the volume is offsite.

However, somewhere in the 7.3.x cycle, EMC introduced an offsite flag that could be associated with a volume, and this fulfills a very different function. First, in order to set the flag, you need to use the nsrmm command, and it would work like this:

# nsrmm -o offsite volumeName

Such as:

# nsrmm -o offsite 800841

This doesn’t set the location field. (Nor, equally, does a location field of ‘offsite’ equate to a flag set for offsite.) If you want to manually clear the offsite field, you can run the nsrmm command again, using the flag ‘notoffsite’ rather than the flag ‘offsite’. Alternatively, as soon as the volume is either (a) mounted in a standalone drive or (b) imported into a tape library, the flag is cleared.

Unfortunately, there’s currently no way of querying for volumes based on this field. (I consider this to be a silly mistake, and hope it’s rectified soon.)

So, what is this volume flag used for, if you can’t query it, and it’s not displayed anywhere? It actually fulfills an important function. I briefly covered that function in my post, Instantiating Savesets, but I’ll quickly revisit it now.

NetWorker assigns a unique clone ID to every saveset copy that is made (be that through cloning or staging). The clone ID is effectively the number of seconds past the epoch, or if not that, some other very similar number of seconds.

When NetWorker wants to use a saveset to facilitate a recovery, and there’s no copies of the saveset immediately online (i.e., in a drive, or in a library), it must request a volume that holds a copy of the saveset. Previously it would always ask for the saveset with the smallest clone ID. This would create problems if you backed up to disk, cloned to tape, then staged to tape later – the clone would end up with the smallest clone ID, and if neither volume was available, NetWorker would ask for the clone volume, rather than the ‘original’, staged volume.

To solve this problem we use the ‘offsite’ flag for the clone volume: if NetWorker needs to read from a saveset that has more than one copy, and one of those copies is stored on a volume that is flagged as ‘offsite’ in the media database, then it is least likely to pick that volume.

(An alternative technique advocated by EMC (and even Legato, before the acquisition), before the development of the ‘offsite’ flag, was to temporarily mark the initially requested volume as ‘suspect’ so that NetWorker would instead request the ‘preferred’ volume. While there’s technically nothing wrong with this technique, I find marking good backups as bad – even temporarily – as inelegant. With the availability of the ‘offsite’ flag instead, I’d encourage anyone still using the ‘suspect’/’notsuspect’ flags to switch.)

Posted in Basics, NetWorker | Tagged: , , , | Comments Off on Basics – mmlocate vs ‘offsite’ flag

Basics – mminfo, savetime and greater than/less than

Posted by Preston on 2009-02-01

Hi!

The text of this article should be read over at the NetWorker Information Hub.

Posted in NetWorker, Scripting | Tagged: , , | 4 Comments »