NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.
  • Advertisements
  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Archive for the ‘Linux’ Category

NetWorker and linuxvtl, Redux

Posted by Preston on 2009-11-14

Some time ago, I posted a blog entry titled Carry a Jukebox with you, if you’re using Linux, which referred to using linuxvtl with NetWorker. The linuxvtl project is run by my friend Mark Harvey, who has been working with enterprise backup products as long as me.

At the time I blogged, the key problem with the LinuxVTL implementation was that NetWorker didn’t recognise the alternate device IDs generated by the code – it relied on WWNN’s, which were the same for each device.

I was over the moon when I received an email from Mark a short while ago saying he’s now got multiple devices working in a way that is compatible with NetWorker. This is a huge step forward for Linux VTL.

So, what’s changed?

While I’ve not had confirmation from Mark, I’m working on the basis that you do need the latest source code (mhvtl-2009-11-10.tgz as of the time of writing).

The next step, to quote Mark, is that we need to step away from StorageTek and define the library as SpectraLogic:

p.s. The “fix” is to define the robot as a Spectralogic NOT an L700.
The STK L700 does not follow the SMC standards too well. It looks like
NetWorker uses the ‘L700’ version and not the standards.
The Spectralogic follows the SMC standards (or at least their
interruption is the same as mine :) )

The final part is to update the configuration files to include details that allow the VTL code to generate unique WWNNs for NetWorker’s use.

Starting out with just 2 devices, here’s what my inquire output now looks like:

[root@tara ~]# inquire -l

-l flag found: searching all LUNs, which may take over 10 minutes per adapter
	for some fibre channel adapters.  Please be patient.

scsidev@0.0.0:SPECTRA PYTHON    5500|Autochanger (Jukebox), /dev/sg2
			        S/N:	XYZZY
			        ATNN=SPECTRA PYTHON          XYZZY
			        WWNN=11223344ABCDEF00
scsidev@0.1.0:QUANTUM SDLT600   5500|Tape, /dev/nst0
			        S/N:	ZF7584364
			        ATNN=QUANTUM SDLT600         ZF7584364
			        WWNN=11223344ABCDEF01
scsidev@0.2.0:QUANTUM SDLT600   5500|Tape, /dev/nst1
			        S/N:	ZF7584366
			        ATNN=QUANTUM SDLT600         ZF7584366
			        WWNN=11223344ABCDEF02

As you can see – each device has a different WWNN now, which is instrumental for NetWorker. (Note, I have adjusted the spacing slightly to make sure it fits in.)

Finally, here’s what my /etc/mhvtl/device.conf and /etc/mhvtl/library_contents files now look like:

[root@tara mhvtl]# cat device.conf 

VERSION: 2

# VPD page format:
# <page #> <Length> <x> <x+1>... <x+n>

# NOTE: The order of records is IMPORTANT...
# The 'Unit serial number:' should be last (except for VPD data)
# i.e.
# Order is : Vendor ID, Product ID, Product Rev and serial number finally
# Zero, one or more VPD entries.
#
# Each 'record' is sperated by one (or more) blank lines.
# Each 'record' starts at column 1

Library: 0 CHANNEL: 0 TARGET: 0 LUN: 0
 Vendor identification: SPECTRA
 Product identification: PYTHON
 Product revision level: 5500
 Unit serial number: XYZZY
 NAA: 11:22:33:44:ab:cd:ef:00

Drive: 1 CHANNEL: 0 TARGET: 1 LUN: 0
 Vendor identification: QUANTUM
 Product identification: SDLT600
 Product revision level: 5500
 Max density: 0x46
 NAA: 11:22:33:44:ab:cd:ef:01
 Unit serial number: ZF7584364
 VPD: b0 04 00 02 01 00

Drive: 2 CHANNEL: 0 TARGET: 2 LUN: 0
 Vendor identification: QUANTUM
 Product identification: SDLT600
 Product revision level: 5500
 Max density: 0x46
 NAA: 11:22:33:44:ab:cd:ef:02
 Unit serial number: ZF7584366
 VPD: b0 04 00 02 01 00

[root@tara mhvtl]# cat library_contents
# Define how many tape drives you want in the vtl..
# The ‘XYZZY_…’ is the serial number assigned to
# this tape device.
Drive 1: ZF7584364
Drive 2: ZF7584366
# Place holder for the robotic arm. Not really used.
Picker 1:
# Media Access Port
# (mailslots, Cartridge Access Port, <insert your favourate name here>)
# Again, define how many MAPs this vtl will contain.
MAP 1:
MAP 2:
MAP 3:
MAP 4:
# And the ‘big’ on, define your media and in which slot contains media.
# When the rc script is started, all media listed here will be created
# using the default media capacity.
Slot 1: 800843S3
Slot 2: 800844S3
Slot 3: 800845S3
Slot 4: 800846S3
Slot 5: 800847S3
Slot 6: 800848S3
Slot 7: 800849S3
Slot 8: 800850S3
Slot 9: 800851S3
Slot 10: 800852S3
Slot 11: 800853S3
Slot 12: 800854S3
Slot 13: 800855S3
Slot 14: 800856S3
Slot 15: 800857S3
Slot 16: 800858S3
Slot 17: 800859S3
Slot 18: 800860S3
Slot 19: 800861S3
Slot 20: 800862S3
Slot 21: BIG990S3
Slot 22: BIG991S3
Slot 23: BIG992S3
Slot 24: BIG993S3
Slot 25: BIG994S3
Slot 26: BIG995S3
Slot 27: BIG996S3
Slot 28: BIG997S3
Slot 29: BIG998S3
Slot 30: BIG999S3
Slot 31: CLN001L1
Slot 32: CLN002L1

NOTE in the “device.conf” file the NAA entries – these are key!

With these changes done, jbconfig worked without missing a beat, and suddenly I had a 2 drive VTL running.

Great going, Mark!

While I’ve not yet tested, I suspect this fix will also ensure that the VTL can be configured on multiple storage nodes, which will be a fantastic improvement for library support work as well.

[Edit, 2009-11-18]

I’m pleased to say that the changes that have been made allow for the VTL to be created on more than one storage node. This presents excellent opportunities for debugging, testing and training:

LinuxVTL on server and storage node

Advertisements

Posted in Linux, NetWorker | Tagged: , , | 5 Comments »

NetWorker on Linux – Ditching ext3 for xfs

Posted by Preston on 2009-11-05

Recently when I made an exasperated posting about lengthy ext3 check times and looking forward to btrfs, Siobhán Ellis pointed out that there was already a filesystem available for Linux that met a lot of my needs – particularly in the backup space, where I’m after:

  • Being able to create large filesystems that don’t take exorbitantly long to check
  • Being able to avoid checks on abrupt system resets
  • Speeding up the removal of files when staging completes or large backups abort

That filesystem of course is XFS.

I’ve recently spent some time shuffling data around and presenting XFS filesystems to my Linux lab servers in place of ext3, and I’ll fully admit that I’m horribly embarrassed I hadn’t thought to try this out earlier. If anything, I’m stuck looking for the right superlative to describe the changes.

Case in point – I was (and indeed still am) doing some testing where I need to generate >2.5TB of backup data from a Windows 32-bit client for a single saveset. As you can imagine, not only does this take a while to generate, but it also takes a while to clear from disk. I had got about 400 GB into the saveset the first time I was testing and realised I’d made a mistake with the setup so I needed to stop and start again. On an ext3 filesystem, it took more than 10 minutes after cancelling the backup before the saveset had been fully deleted. It may have taken longer – I gave up waiting at that point, went to another terminal to do something else and lost track of how long it actually took.

It was around that point that I recalled having XFS recommended to me for testing purposes, so I downloaded the extra packages required to use XFS within CentOS and reformatting the ~3TB filesystem to XFS.

The next test that I ran aborted due to a (!!!) comms error 1.8TB through the backup. Guess how long it took to clear the space? No, seriously, guess – because I couldn’t log onto the test server fast enough to actually see the space clearing. The backup aborted, and the space was suddenly back again. That’s a 1.8TB file deleted in seconds.

That’s the way a filesystem should work.

I’ve since done some (in VMs) nasty power-cycle mid-operation tests and the XFS filesystems come back up practically instantaneously – no extended check sessions that make you want to cry in frustration.

If you’re backing up to disk on Linux, you’d be mad to use anything other than XFS as your filesystem. Quite frankly, I’m kicking myself that I didn’t do this years ago.

Posted in Linux, NetWorker | Tagged: , , , , , | 8 Comments »

How much aren’t you backing up?

Posted by Preston on 2009-10-05

Do you have a clear picture of everything that you’re not backing up? For many sites, the answer is not as clear cut as they may think.

It’s easy to quantify the simple stuff – QA or test servers/environments that literally aren’t configured within the backup environment.

It’s also relatively easy to quantify the more esoteric things within a datacentre – PABXs, switch configurations, etc. (Though in a well run backup environment, there’s no reason why you can’t configure scripts that, as part of the backup process, logs onto such devices and retrieves the configuration, etc.)

It should also be very, very easy to quantify what data on any individual system that you’re not backing up – e.g., knowing that for fileservers you may be backing up everything except for files that have a “.mp3” extension.

What most sites find difficult to quantify is the quasi-backup situations – files and/or data that they are backing up, but which is useless in a recovery scenario. Now, many readers of that last sentence will probably think of one of the more immediate examples: live database files that are being “accidentally” picked up in the filesystem backup (even if they’re being backed up elsewhere, by a module). Yes, such a backup does fall into this category, but there are other types of backups which are even less likely to be considered.

I’m talking about information that you only need during a disaster recovery – or worse, a site disaster recovery. Let’s consider an average Unix (or Linux) system. (Windows is no different, I just want to give some command line details here.) If a physical server goes up in smoke, and a new one has to be built, there’s a couple of things that have to be considered pre-recovery:

  • What was the partition layout?
  • What disks were configured in what styles of RAID layout?

In an average backup environment, this sort of information isn’t preserved. Sure, if you’ve got say, HomeBase licenses (taking the EMC approach), or using some other sort of bare metal recovery system, and that system supports your exact environment*, then you may find that such information is preserved and is available.

But what about the high percentage of cases where it’s not?

This is where the backup process needs to be configured/extended to support generation of system or disaster recovery information. It’s all very good for instance, for a Linux machine to say that you can just recover “/etc/fstab”, but what if you can’t remember the size of the partitions referenced by that file system table? Or, what if you aren’t there to remember what the size of the partitions were? (Memory is a wonderful yet entirely fallible and human-dependent process. Disaster recovery situations shouldn’t be bound by what we can or can’t remember about the systems, and so we have to gather all the information required to support disaster recovery.)

On a running system, there’s all sorts of tools available to gather this sort of information, but when the system isn’t running, we can’t run the tools, so we need to run them in advance, either as part of the backup process or as a scheduled, checked-upon function. (My preference is to incorporate it into the backup process.)

For instance, consider that Linux scenario – we can quickly assemble the details of all partition sizes on a system with one simple command – e.g.:

[root@nox ~]# fdisk -l

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2089    16779861   fd  Linux raid autodetect
/dev/sda2            2090        2220     1052257+  82  Linux swap / Solaris
/dev/sda3            2221       19457   138456202+  fd  Linux raid autodetect
/dev/sda4           19458      121601   820471680    5  Extended
/dev/sda5           19458       19701     1959898+  82  Linux swap / Solaris
/dev/sda6           19702      121601   818511718+  fd  Linux raid autodetect

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1         250     2008093+  82  Linux swap / Solaris
/dev/sdb2             251      121601   974751907+  83  Linux

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1      121601   976760001   83  Linux

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *           1        2089    16779861   fd  Linux raid autodetect
/dev/sdd2            2090        2220     1052257+  82  Linux swap / Solaris
/dev/sdd3            2221       19457   138456202+  fd  Linux raid autodetect
/dev/sdd4           19458      121601   820471680    5  Extended
/dev/sdd5           19458       19701     1959898+  82  Linux swap / Solaris
/dev/sdd6           19702      121601   818511718+  fd  Linux raid autodetect

That wasn’t entirely hard. Scripting that to occur at the start of the backup process isn’t difficult either. For systems that have RAID, there’s another, equally simple command to extract RAID layouts as well – again, for Linux:

[root@nox ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda3[0] sdd3[1]
 138456128 blocks [2/2] [UU]

md2 : active raid1 sda6[0] sdd6[1]
 818511616 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sdd1[1]
 16779776 blocks [2/2] [UU]

unused devices: <none>

I don’t want to consume realms of pages discussing what, for each operating system you should be gathering. The average system administrator for any individual platform should, with a cup of coffee (or other preferred beverage) in hand, should be able to sit down and in under 10 minutes jot down the sorts of information that would need to be gathered in advance of a disaster to assist in the total system rebuild of an operating system of a machine they administer.

Once these information gathering steps have been determined, they can be inserted into the backup process as a pre-backup command. (In NetWorker parlance, this would be via a savepnpc “pre” script. Other backup products will equally feature such options.) Once the information is gathered, a copy should be kept on the backup server as well as in an offsite location. (I’ll give you a useful cloud backup function now: it’s called Google Mail. Great for offsiting bootstraps and system configuration details.)

When it comes to disaster recovery, such information can take the guess work or reliance on memory out of the equation, allowing a system or backup administrator in any (potentially sleep-deprived) state, with any level of knowledge about the system in question, to conduct the recovery with a much higher degree of certainty.


* Due to what they offer to do, bare metal recovery (BMR) products tend to be highly specific in which operating system variants, etc., they support. In my experience a significantly higher number of sites don’t use BMR than do.

Posted in Architecture, Backup theory, Linux, NetWorker, Policies, Scripting | Tagged: , , , , | 2 Comments »

Routine filesystem checks on disk backup units

Posted by Preston on 2009-09-28

On Linu, filesystems typically have two settings regarding getting complete checks on boot. These are:

  • Maximum number of mounts before a check
  • Interval between checks

The default settings, while reasonably suitable for smaller partitions, are very unsuitable for large partitions, such as what you find in disk backup units. In fact, if you don’t pay particular attention to these settings, you may find after a routine reboot that your backup server (or storage node) can take hours to become available. For instance, it’s not unheard of to see even sub-20TB DBU environments (as say, 10 x 2TB filesystems) take several hours to complete mandatory checks on filesystems after what should have just been a routine reboot.

There are two approaches that you can take to this:

  • If you want to leave the checks enabled, it’s reasonably imperative to ensure that at most only one disk backup unit filesystem will be checked at one time after a reboot; this will at least reduce the size of any check-on-reboot. Thus, ensure you:
    • Configure each filesystem so that it will have a different number of maximum mounts before check than any other filesystem, and,
    • Configure the interval (days) between checks for each filesystem to be a significantly different number.
  • If you don’t want periodic filesystem checks to ever interfere with the reboot process, you need to:
    • Ensure that following a non-graceful restart of the server the DBU filesystems are unmounted and checked before any new backup or recovery activities are done, and,
    • Ensure that there are processes – planned maintenance windows if you will – for manual running of the filesystem checks that are being skipped.

Neither option is particularly “attractive”. In the first case, you can still, if you cherish uptime or don’t need to reboot your backup server often, get into a situation where multiple filesystems need to be checked on reboot if they’ve all exceeded their days-between-checks parameter. In the second instance, you’re having to insert human driven processes into what should normally be a routine operating system function. In particular with the manual option, there must be a process in place to NetWorker shutdown + checking even in the middle of the night if an OS crash occurs.

Actually, the above list is a little limited – there’s a couple of other options that you can consider as well – though they’re a little more left of field:

  • Build into the change control process the timings for complete filesystem checks in case they happen, or
  • Build into the change control process or reboot procedure for the backup server/storage nodes the requirement to temporarily disable filesystem checks (using say, tune2fs) so that you know the reboot to be done won’t be costly in terms of time.

Personally, I’m looking forward to btrfs – in reality, a modern filesystem such as that should solve most, if not all, of the problems discussed above.

Posted in Linux, NetWorker | Tagged: , , , , | Comments Off on Routine filesystem checks on disk backup units

Using the NetWorker Software Distribution Functionality

Posted by Preston on 2009-08-05

There’s an old saying, “Be careful what you wish for; you may get it.”

Many would say that’s an apt description of the NetWorker software distribution functionality. When it first appeared, it was inadequately documented and error prone. Fast forward to now, and it’s less error prone but still reasonably inadequately documented.

I would suggest that unless you’re currently running NetWorker 7.4.4 or NetWorker 7.5.1 (as of time of writing), you will probably find the software distribution functionality too problematic to work with. Any 7.4.x version higher than 7.4.4, and any version of NetWorker higher than 7.5.1 should continue to show improvements. Versions lower than these in either the 7.4.x tree or 7.5 tree should generally be avoided.

A few of the known problems with the software distribution functionality include:

  • Works very poorly in a heterogenous environment, where heterogenous means operating systems outside of what can be updated.
  • Works very poorly when large numbers of clients are included in an operation at once.
  • Gives inadequate progress updates during GUI operations.
  • Can fail in an undesirable state on clients, leaving NetWorker either uninstalled or shutdown.

That being said, you might think that it’s somewhat of an oxymoron for the title of this post to be “Using …”, but I do think it’s now at the point where it can be used, but just with more caution than you might otherwise expect.

(NB: For simplicity here on in, I’m going to call the “NetWorker Software Distribution Functionality” the “NetWorker repository”, since that’s the term for the storage area for the distribution system, and less cumbersome to type or read!)

Here’s some key tips on using the repository successfully:

  1. Do not use NMC. Forget about it. Maybe in another v7.x release, or a v8 release, NMC might catch up.
  2. Do not target multiple operating systems in any activity.
  3. Do not run software updates in parallel.

You might wonder then, what’s the point? If you can’t use the GUI, and you can’t operate against the entire environment at once, or indeed against more than one client at once, where is it better than manual operations?

If you’re prepared to spend the time to get the repository working, it’s better than entirely manual operations because of the following factors:

  1. Centralised storage of NetWorker software.
  2. Push updates are still very worthwhile, particularly in environments where you may not always have easy access to the clients of the datazone.
  3. Activities can be scripted with minimum effort, allowing for centralised yet granular operations.

Despite the limitations described previously, there are still some key activities that you can do with the repository. These are:

  1. Inventory the existing software install state of clients.
  2. Query the current state of repository software.
  3. Add software to the repository.
  4. Remove software from the repository.
  5. Update client software.

Here’s some examples of how to go about doing these operations, from the command line.

Inventorying clients

When inventorying clients, ensure that you don’t include in the client inventory list unsupported operating systems. For the average environment, this particularly applies to Mac OS X clients, NetWare clients and OpenVMS clients. The inventory operation has a long history of failure when these clients are thrown into the mix.

To inventory clients, you use one of two commands:

# nsrpush -i clients

or

# nsrpush -i -all

However, as I’ve pointed out, doing operations against all clients isn’t the best idea, so the second option there is really only presented for completeness.

Here’s an example of inventorying 3 clients, ‘nox’, ‘nimrod’ and ‘asgard’:

[root@nox ~]# nsrpush -i asgard nox nimrod
Starting Inventory Operation on selected clients
Creating client type job for client asgard.
Copying inventory scripts to client asgard.
Successfully copied inventory scripts to client asgard.
Starting inventory of client asgard.
Successfully ran inventory scripts on client asgard.
Cleaning up client asgard.
Successfully cleaned up client asgard.
Creating client type job for client nox.
Copying inventory scripts to client nox.
Successfully copied inventory scripts to client nox.
Starting inventory of client nox.
Successfully ran inventory scripts on client nox.
Cleaning up client nox.
Successfully cleaned up client nox.
Creating client type job for client nimrod.
Copying inventory scripts to client nimrod.
Successfully copied inventory scripts to client nimrod.
Starting inventory of client nimrod.
Successfully ran inventory scripts on client nimrod.
Cleaning up client nimrod.
Successfully cleaned up client nimrod.

Inventory status: succeeded

Perhaps one of the sillier parts of inventorying is that while it updates the database of client versions, it actually doesn’t report those versions to you in the same output – you have to run another command for that.

To show the versions, you want to invoke one of:

# nsrpush -s -all

or

# nsrpush -s clients

Now, in this case, the “show” option only runs against the local database, so it’s reasonably safe to use the ‘-all’ option. To reduce the amount of output though, I’ll show the output for just those three clients inventoried before:

[root@nox ~]# nsrpush -s asgard nimrod nox

 nox  linux_x86_64
     NetWorker 7.5.1
                Client                         
                Man Pages                      
                Server                         
                Storage Node                   

 nimrod  linux_x86
     NetWorker 7.5.1
                Management Console             
                Man Pages                      
                Client                         

 asgard  linux_x86
     NetWorker 7.4.2
                Man Pages                      
                Client

As you can see in the above, the output is reasonably comprehensive, covering for each host, all three of:

  • Version
  • Platform
  • Packages

Note that you can’t show before you inventory, so don’t start by trying to show the state of all systems – it just won’t work.

Querying software in the repository

To get a list of the software currently held in the repository, you’d run the command:

# nsrpush -l

For instance, here’s the (abridged) output from one of my lab servers:

[root@nox ~]# nsrpush -l

Products in the repository
================================

   NetWorker 7.5.1
     linux_x86                      
         Korean Language Pack           
         Chinese Language Pack          
         Japanese Language Pack         
         French Language Pack           
         Server                         
         License Manager                
         Storage Node                   
         Man Pages                      
         Client                         
         Management Console             
     linux_x86_64                   
         Chinese Language Pack          
         Japanese Language Pack         
         French Language Pack           
         Server                         
         License Manager                
         Storage Node                   
         Man Pages                      
         Client                         
         Management Console             
         Korean Language Pack
   NetWorker 7.5.0
     win_x64                        
         Management Console             
         English Language Pack          
         License Manager                
         Server                         
         Storage Node                   
         Client                         
     win_x86                        
         Management Console             
         English Language Pack          
         Language Packs                 
         License Manager                
         Server                         
         Storage Node                   
         Client

Note that both Windows and Unix versions of NetWorker are stored in the same repository. (Due to restrictions with copy/paste in WordPress not all spacing has been preserved in the above.)

Adding software to the repository

Adding software to the repository is reasonably straight forward. There are two different techniques for this – one for adding the software designated as the same platform of the backup server, and one for adding software designated as the different platform of the backup server.

By “same” I mean:

  • Adding Unix/Linux software to a Unix/Linux backup server
  • Adding Windows software to a Windows backup server

By “different” I mean:

  • Adding Windows software to a Unix backup server
  • Adding Unix software to a Windows backup server

Adding software from the same platform

Let’s say I’ve got a Unix NetWorker server, and I want to add software from a supported Unix platform to the repository. Here’s the steps:

  1. Download the package from EMC PowerLink.
  2. Uncompress and untar the downloaded package into a directory.
  3. Run nsrpush against that directory with the appropriate arguments.

So, let’s assume that I want to add NetWorker 7.4.4 for Linux 32-bit to a software repository. I’d start with getting the software into a known directory.

[root@nox tmp]# mkdir /tmp/repo
[root@nox tmp]# cd repo
[root@nox repo]# gunzip -c /d/03/share-a/Downloads/NetWorker/v7_4/
NetWorker\ 7.4\ SP4/nw74sp4_linux_x86.tar.gz | tar xvpf -
LGTO_METAFILE.linuxx86
jre-1_5_0_15-linux-i586.bin
lgtoclnt-7.4.4-1.i686.rpm
lgtofr-7.4.4-1.i686.rpm
lgtoja-7.4.4-1.i686.rpm
lgtoko-7.4.4-1.i686.rpm
lgtolicm-7.4.4-1.i686.rpm
lgtoman-7.4.4-1.i686.rpm
lgtonmc-3.4.4-1.i686.rpm
lgtonode-7.4.4-1.i686.rpm
lgtoserv-7.4.4-1.i686.rpm
lgtozh-7.4.4-1.i686.rpm
sd_products.res

Now, the command we want to use in this case is:

# nsrpush -a -p product -v version -P platform -m path -U

(NB: The “-U” is for “Unix”; obviously if you’re running a like-for-like installation on Windows, using “-U” here isn’t going to be a good idea – instead, use “-W”.)

Now, to work out what we need to put into the product, version and platform strings, we want to check the LGTO_METAFILE.linuxx86 file that was included in the distribution. If you look at the start of the file, you’ll find a section detailing this, and for the file specified above on my lab server, this section looks like the following:

# Client Push metafile to be included in linux_x86 NetWorker distributions

OPERATING_SYSTEM=Linux

OS_PLATFORM=linux_x86

PRODUCT_NAME=NetWorker

PRODUCT_VERSION=7.4.4

DESCRIPTION=Networker 7.4.4

So, that means our nsrpush command will look like this:

# nsrpush -a -p NetWorker -v 7.4.4 -P linux_x86 -m /tmp/repo -U
Success adding product from:
/tmp/repo

Add to repository status: succeeded

That’s all there is to it! That software is now in the repository and ready to be rolled out to a client.

Adding software from the other platform

If you need to add Windows software to a Unix server, or Unix software to a Windows server, you’ll need a helper client. This is a client of the same platform as the software you are adding, with NetWorker already running, and a copy of the software extracted on the client. The procedure works along the lines of the following:

  1. Unpack the alternate platform software onto a ‘like’ client.
  2. Unpack the alternate platform software onto the backup server.
  3. Run nsrpush to add the software.

So in this case, I’m going to use a Windows client called ‘medusa’ as a helper/proxy to allow me to add NetWorker 7.5.1 64-bit for Windows to a Unix server’s repository. On my Windows client, I’ve got a directory called “C:\temp\751”:

Adding alternate platform software - folder where software can be found on client

Adding alternate platform software - folder where software can be found on client

Now, I also need the software unpacked on the backup server, so I flush out the contents of my /tmp/repo directory, and unzip the Windows 64-bit NetWorker 7.5.1 archive. This gives me a path to the LGTO_METAFILE.winx64 file of:

/tmp/repo/win_x64/LGTO_METAFILE.xinx64

Now, I need to add – the command will be:

# nsrpush -a -p product -v version -P platform -m path -c client -C path -W

Where:

  • Product, version and platform will be extracted from the LGTO_METAFILE as per usual.
  • -m path will refer to the local copy, on the backup server, of the software.
  • -c client will refer to the helper/proxy client (in this case, ‘medusa’),
  • -C path will refer to the path on the helper/proxy client where the software has also been extracted on
  • -W tells NetWorker we need to add Windows software.

The relevant product/version/platform details of the LGTO_METAFILE.winx64 looks like the following:

#########################
# Product Definition
#########################

OPERATING_SYSTEM=Windows
OS_PLATFORM=win_x64
PRODUCT_NAME=NetWorker
PRODUCT_PLATFORM_NAME=NetWorker
PRODUCT_VERSION=7.5.1
DESCRIPTION=Networker 7.5.1

So, our command will therefore be:

[root@nox repo]# nsrpush -a -p NetWorker -P win_x64 -v 7.5.1 -m /tmp/repo
-c medusa -C 'C:\temp\751' -W
Hostname and mount point recorded.
Success adding product from:
/tmp/repo/win_x64

Add to repository status: succeeded

We can subsequently confirm successful addition to the repository with our trusty nsrpush -l command:

[root@nox repo]# nsrpush -l

Products in the repository
================================

   NetWorker 7.5.1
            win_x64                        
                 Client                         
                 Management Console             
                 Chinese Language Pack          
                 Korean Language Pack           
                 Japanese Language Pack         
                 French Language Pack           
                 English Language Pack          
                 Language Packs                 
                 License Manager                
                 Server                         
                 Storage Node

There you have it – software for an alternate platform added to the repository, with very little work.

Updating client software

Finally, there wouldn’t be much use for the repository without being able to push out updates.

The upgrade command is reasonably straight forward:

# nsrpush -u -p product -v version clients

Where ‘clients’ is a space separated list of one or more clients. (Note that while you can in theory use the ‘-all’ option for all clients, I’d really, really strongly recommend against it.)

Now, our client ‘asgard’ from before was still running NetWorker 7.4.2 for Linux, and I’d like to upgrade it to 7.4.4. The command and upgrade process is as follows:

[root@nox tmp]# nsrpush -u -p NetWorker -v 7.4.4 asgard
Starting upgrade operation on selected clients.
Creating client type job for client asgard.
Copying upgrade scripts to client asgard.
Successfully copied upgrade scripts to client asgard.
Checking tmp space on client asgard.
Successfully issued a tmp space check command to client asgard.
Copying upgrade packages to client asgard.
Successfully copied upgrade packages to client asgard.
Starting upgrade of client asgard.
Successfully issued an upgrade command to client asgard.
Waiting for upgrade status for client asgard.
Still waiting for the upgrade to complete on client asgard.
Waiting for upgrade status for client asgard.
Successfully retrieved and processed the package upgrade status data for client asgard
Starting cleanup phase on client asgard.
Cleaning up client asgard.
Successfully issued the cleanup command to client asgard.

Upgrade status: succeeded

Pushing out software to alternate platforms (i.e., Windows from Unix or vice versa) is exactly the same command as above.

Cautions for upgrades

When you run upgrades, here’s what I’d suggest should be your cautions:

  1. Always capture the output and confirm you get an “Upgrade status: succeeded” message for each client that you upgrade.
  2. Don’t run the command against more than say, 5 clients at once.
  3. Don’t run the command for Windows and Unix clients at the same time.
  4. If the upgrade fails, confirm from the output whether NetWorker was able to restart on the client before attempting another push. If it doesn’t look like it did, you may have to install manually.

Many errors that can come out of nsrpush are currently undocumented. Thus, if you run an nsrpush operation and get an error that you can’t find in PowerLink from a casual search, you probably need to log a case with EMC. (For instance, in tests today I got an error “client X has products installed that prevent upgrade”. There’s no publically visible documentation on this error, and despite the seeming logic of it, after checking the client, I don’t understand why the error came about.)

[Edit – 2009-08-19 – While reviewing the 7.4.5 release notes I stumbled across the reason for the “client X has products installed that prevent upgrade” error; while I had been aware that you couldn’t use the repository to update hosts with the NetWorker server software installed, I wasn’t aware that you also can’t use it for hosts with NMC or Legato License Manager installed. The client I experienced an issue on was running LLM.]

Why would you use the repository?

It’s worth revisiting this – obviously I’ve couched all of the above in quite a few caveats and thrown in some curly comments (“many errors … are currently undocumented”). With this in mind, you may be wondering now that you’ve read to the bottom of the post why you would want to risk using the repository and running into problems.

To me the most obvious reason is that the software repository is going to continue to improve. For instance, between 7.4.4 and 7.5.1 many known issues were sorted out, making 7.5.1 much more stable to use. It’s already, say, useful now, if you’re careful with it, and thus keeping abreast of it is going to allow you to continue to save time as each successive version comes out. It is already in a position to save you time and to help you keep clients (and modules) up to date – so this in itself, I believe, is reason enough to use it.

[Edit 2009-08-21 Erroneous information left from a draft removed. Thanks Rolf for pointing this out.]

Finally, even as the GUI continues to improve, I have been and always will be a fan of command line functionality because it allows for scripting. So even when there are limitations with what you can do in NMC, using nsrpush on the command line isn’t rocket science once you get used to the syntax.

Much as there are parts of the repository that I find a bit curmudgeonly at the moment, I’m genuinely happy with where it has (finally) got to, and I’m equally genuinely looking forward to increased functionality and stability as newer versions of NetWorker come out. Over time it’s going to evolve from something which is a “partial time saver” to something that’s a real time saver.

Posted in Linux, NetWorker, Windows | Tagged: , , | 3 Comments »

Last month’s top story

Posted by Preston on 2009-08-01

I thought that I might from now on try to do a brief comment at the start of each month on what the most popular story of the previous month was.

There is one caveat – as I aluded to in the previous “Top 5” post, I have some posts that get hit an awful lot of times. So, the absolute most-referenced posts, being “Fixing NSR peer information”, “Parallelism in NetWorker” and “Changing saveset browse/retention time” are effectively disqualified for making it into consideration for a “Last month’s top story”.

For this inuagural entry, we have “Carry a jukebox with you (if you’re using Linux)“. Outside of the above 3 articles, this one was viewed the most – and, for what it’s worth, generated a lot of follow-through clicks going through to Mark Harvey’s linuxvtl web page.While not a production VTL, Mark’s Linux VTL software has already given me a great deal of efficiencies over this last month in my lab environment. I have versions of NetWorker on lab servers all with the VTL configured, making testing of a wide variety of options considerably easier than having physical tape libraries connected and powered on. I hope others are finding it similarly useful. One of the comments to the article was someone asking about a more complete set of instructions for getting the VTL up and running – I aim to have this done by the end of this weekend.

Posted in Aside, Linux | Tagged: , , | Comments Off on Last month’s top story

Using yum to install NetWorker on Linux

Posted by Preston on 2009-05-26

Hi!

The content of this post should now be viewed on the NetWorker Information Hub, here.

Posted in Linux, NetWorker | Tagged: , | 2 Comments »

Client operating systems should not be changed

Posted by Preston on 2009-05-04

While researching an issue today I was reminded of a problem I had a few years ago that still remains quite valid.

Client operating systems must not be changed. To be more specific, client platforms must not be changed – at least while keeping the same client name.

The reason for this is that index data is incompatible between client platforms. That is, if you’ve been backing up the machine ‘cerberus’ which is a Linux machine, then the machine is rebuilt as a Windows machine, it must be renamed.

Failing to rename a client in these circumstances can result in index corruption. This may manifest in one of three different ways:

  1. Inability to backup at all.
  2. Backups may consistently fail at specific points.
  3. Backups may succeed but recoveries may fail.

Now, there may be some workarounds, and I’ll profess to having tried some of them out in lab environments, but let me be blunt: this causes so many problems (and is so unrecommended by EMC) that changing a client operating system (platform) should never be done without the client being renamed as well.

This may be done by either:

  • Creating a new client name entirely for the rebuilt machine

or

  • Using the standard procedure for first renaming the machine in NetWorker before the rebuild, so that when the new operating system is configured in NetWorker with the ‘previous’ name, there is no namespace clash or client ID similarities.

Trust me, if you don’t do one of the above when a client operating system is changed, at some point you will come a cropper.

[2009-05-10]

I thought I’d come back and add output from a backup attempt to show what happens when you do this. Here’s the scenario – I created a client called ‘dopey’*, installed CentOS 5, ran a full + incremental backup, then reinstalled ‘dopey’ as a Windows 2003 machine. Here’s (some of) the output from that backup, using NetWorker 7.5.1 as the server and client in all 3 instances:

[root@nox ~]# savegrp -l full -c dopey idata
40473:savegrp: command ' save -s nox.anywebdb.com -g idata -LL -m dopey -l full -q -W 78 -N "VSS ASR DISK:\\" "VSS ASR DISK:\\"' for client dopey exited with return code 5.
7336:savegrp: Log file /nsr/tmp/sg/idata/sso.dopey.pSMS7s is empty.
68703:savegrp: savegrp:idata * dopey:VSS ASR DISK:\ Cannot determine status of backup process.  Use mminfo to determine job status.

7341:savegrp: dopey:VSS ASR DISK:\ unexpectedly exited.
(snip)40473:savegrp: command ' save -s nox.anywebdb.com -g idata -LL -m dopey -l full -q -W 78 -N "C:\\" "C:\\"' for client dopey exited with return code 5.
7336:savegrp: Log file /nsr/tmp/sg/idata/sso.dopey.BlOmrS is empty.
68703:savegrp: savegrp:idata * dopey:C:\ Cannot determine status of backup process.  Use mminfo to determine job status.

7341:savegrp: dopey:C:\ unexpectedly exited.

As you can see, we’re just not getting a successful backup at all, since NetWorker is unable to merge the index entries from the two incompatible platforms.


* Currently running through and naming test machines based on the Seven Dwarves from Snow White, that’s all.

Posted in Linux, NetWorker, Windows | Tagged: , | Comments Off on Client operating systems should not be changed

Distribution plug – SME Server

Posted by Preston on 2009-02-07

Sometime in 2003 I installed my last (and possibly last ever) Linux desktop machine. Having been a fan of Linux for a long time, I had started to become increasingly disillusioned in its failings as a desktop OS, particularly when it came to things that should have been simple, such as plugging in an iPod, watching a movie, or synching a Palm Pilot. After seeing time and time again the lack of issues my partner had with Mac OS X, I finally took the plunge, bought an eMac, and have since sworn off any other OS for my desktop.

That being said, I’ve been using a Linux distribution called SME Server (previously known as e-smith) for over a decade now. This is a server distribution, but not the average server style distribution you may be thinking of. It’s not designed for use with a commercial database, nor as even a workgroup, let alone enterprise backup server, and certainly not as a high performance computing server.

It’s a workgroup server with one explicit function in mind: a single server that covers all the ‘basics’ in a small environment. It handles all of the following:

  • Internet connectivity – dialup, DSL or ethernet, with full proxying
  • Internet security – SPAM filtering, strong port lockdown, etc.
  • Users and groups
  • Fileserving with anti-virus options
  • IMAP and POP email
  • Port redirection/pass through
  • Web site hosting
  • DNS

Plus probably more that to this day I’ve not needed to discover. Almost all functions are readily controllable via its simple web interface, meaning that basic administration doesn’t even require system administration skills. (Being free, it’s also considerably cheaper than Windows or even Mac OS X.)

It’s damn simple – boot from the CD/ISO, agree to really install, and it will blow away the contents of the hard drive and create a machine in its image. If you’ve got two hard drives, it automatically configures mirroring; if you’ve only got one, it still configures mirroring so that if you add another hard drive later it can enable mirroring.

It can even run in VMware if you’re so inclined and you can dedicate a NIC to it for connectivity to a DSL modem (if necessary). With a bit of kludging, you can even install NetWorker on it if you need to (the same sort of kludging you need to do to install NetWorker on ESX).

If you’re in a situation where you’re needing to install some basic server (e.g., for a charity, for home, as part of a small consulting job), I’d thoroughly recommend that you give SME Server a good close look.

Posted in General thoughts, Linux | Tagged: , , | 3 Comments »