NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.
  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Posts Tagged ‘Linux’

NetWorker and linuxvtl, Redux

Posted by Preston on 2009-11-14

Some time ago, I posted a blog entry titled Carry a Jukebox with you, if you’re using Linux, which referred to using linuxvtl with NetWorker. The linuxvtl project is run by my friend Mark Harvey, who has been working with enterprise backup products as long as me.

At the time I blogged, the key problem with the LinuxVTL implementation was that NetWorker didn’t recognise the alternate device IDs generated by the code – it relied on WWNN’s, which were the same for each device.

I was over the moon when I received an email from Mark a short while ago saying he’s now got multiple devices working in a way that is compatible with NetWorker. This is a huge step forward for Linux VTL.

So, what’s changed?

While I’ve not had confirmation from Mark, I’m working on the basis that you do need the latest source code (mhvtl-2009-11-10.tgz as of the time of writing).

The next step, to quote Mark, is that we need to step away from StorageTek and define the library as SpectraLogic:

p.s. The “fix” is to define the robot as a Spectralogic NOT an L700.
The STK L700 does not follow the SMC standards too well. It looks like
NetWorker uses the ‘L700′ version and not the standards.
The Spectralogic follows the SMC standards (or at least their
interruption is the same as mine :) )

The final part is to update the configuration files to include details that allow the VTL code to generate unique WWNNs for NetWorker’s use.

Starting out with just 2 devices, here’s what my inquire output now looks like:

[root@tara ~]# inquire -l

-l flag found: searching all LUNs, which may take over 10 minutes per adapter
	for some fibre channel adapters.  Please be patient.

scsidev@0.0.0:SPECTRA PYTHON    5500|Autochanger (Jukebox), /dev/sg2
			        S/N:	XYZZY
			        ATNN=SPECTRA PYTHON          XYZZY
			        WWNN=11223344ABCDEF00
scsidev@0.1.0:QUANTUM SDLT600   5500|Tape, /dev/nst0
			        S/N:	ZF7584364
			        ATNN=QUANTUM SDLT600         ZF7584364
			        WWNN=11223344ABCDEF01
scsidev@0.2.0:QUANTUM SDLT600   5500|Tape, /dev/nst1
			        S/N:	ZF7584366
			        ATNN=QUANTUM SDLT600         ZF7584366
			        WWNN=11223344ABCDEF02

As you can see – each device has a different WWNN now, which is instrumental for NetWorker. (Note, I have adjusted the spacing slightly to make sure it fits in.)

Finally, here’s what my /etc/mhvtl/device.conf and /etc/mhvtl/library_contents files now look like:

[root@tara mhvtl]# cat device.conf 

VERSION: 2

# VPD page format:
# <page #> <Length> <x> <x+1>... <x+n>

# NOTE: The order of records is IMPORTANT...
# The 'Unit serial number:' should be last (except for VPD data)
# i.e.
# Order is : Vendor ID, Product ID, Product Rev and serial number finally
# Zero, one or more VPD entries.
#
# Each 'record' is sperated by one (or more) blank lines.
# Each 'record' starts at column 1

Library: 0 CHANNEL: 0 TARGET: 0 LUN: 0
 Vendor identification: SPECTRA
 Product identification: PYTHON
 Product revision level: 5500
 Unit serial number: XYZZY
 NAA: 11:22:33:44:ab:cd:ef:00

Drive: 1 CHANNEL: 0 TARGET: 1 LUN: 0
 Vendor identification: QUANTUM
 Product identification: SDLT600
 Product revision level: 5500
 Max density: 0x46
 NAA: 11:22:33:44:ab:cd:ef:01
 Unit serial number: ZF7584364
 VPD: b0 04 00 02 01 00

Drive: 2 CHANNEL: 0 TARGET: 2 LUN: 0
 Vendor identification: QUANTUM
 Product identification: SDLT600
 Product revision level: 5500
 Max density: 0x46
 NAA: 11:22:33:44:ab:cd:ef:02
 Unit serial number: ZF7584366
 VPD: b0 04 00 02 01 00

[root@tara mhvtl]# cat library_contents
# Define how many tape drives you want in the vtl..
# The ‘XYZZY_…’ is the serial number assigned to
# this tape device.
Drive 1: ZF7584364
Drive 2: ZF7584366
# Place holder for the robotic arm. Not really used.
Picker 1:
# Media Access Port
# (mailslots, Cartridge Access Port, <insert your favourate name here>)
# Again, define how many MAPs this vtl will contain.
MAP 1:
MAP 2:
MAP 3:
MAP 4:
# And the ‘big’ on, define your media and in which slot contains media.
# When the rc script is started, all media listed here will be created
# using the default media capacity.
Slot 1: 800843S3
Slot 2: 800844S3
Slot 3: 800845S3
Slot 4: 800846S3
Slot 5: 800847S3
Slot 6: 800848S3
Slot 7: 800849S3
Slot 8: 800850S3
Slot 9: 800851S3
Slot 10: 800852S3
Slot 11: 800853S3
Slot 12: 800854S3
Slot 13: 800855S3
Slot 14: 800856S3
Slot 15: 800857S3
Slot 16: 800858S3
Slot 17: 800859S3
Slot 18: 800860S3
Slot 19: 800861S3
Slot 20: 800862S3
Slot 21: BIG990S3
Slot 22: BIG991S3
Slot 23: BIG992S3
Slot 24: BIG993S3
Slot 25: BIG994S3
Slot 26: BIG995S3
Slot 27: BIG996S3
Slot 28: BIG997S3
Slot 29: BIG998S3
Slot 30: BIG999S3
Slot 31: CLN001L1
Slot 32: CLN002L1

NOTE in the “device.conf” file the NAA entries – these are key!

With these changes done, jbconfig worked without missing a beat, and suddenly I had a 2 drive VTL running.

Great going, Mark!

While I’ve not yet tested, I suspect this fix will also ensure that the VTL can be configured on multiple storage nodes, which will be a fantastic improvement for library support work as well.

[Edit, 2009-11-18]

I’m pleased to say that the changes that have been made allow for the VTL to be created on more than one storage node. This presents excellent opportunities for debugging, testing and training:

LinuxVTL on server and storage node

Posted in Linux, NetWorker | Tagged: , , | 5 Comments »

NetWorker on Linux – Ditching ext3 for xfs

Posted by Preston on 2009-11-05

Recently when I made an exasperated posting about lengthy ext3 check times and looking forward to btrfs, Siobhán Ellis pointed out that there was already a filesystem available for Linux that met a lot of my needs – particularly in the backup space, where I’m after:

  • Being able to create large filesystems that don’t take exorbitantly long to check
  • Being able to avoid checks on abrupt system resets
  • Speeding up the removal of files when staging completes or large backups abort

That filesystem of course is XFS.

I’ve recently spent some time shuffling data around and presenting XFS filesystems to my Linux lab servers in place of ext3, and I’ll fully admit that I’m horribly embarrassed I hadn’t thought to try this out earlier. If anything, I’m stuck looking for the right superlative to describe the changes.

Case in point – I was (and indeed still am) doing some testing where I need to generate >2.5TB of backup data from a Windows 32-bit client for a single saveset. As you can imagine, not only does this take a while to generate, but it also takes a while to clear from disk. I had got about 400 GB into the saveset the first time I was testing and realised I’d made a mistake with the setup so I needed to stop and start again. On an ext3 filesystem, it took more than 10 minutes after cancelling the backup before the saveset had been fully deleted. It may have taken longer – I gave up waiting at that point, went to another terminal to do something else and lost track of how long it actually took.

It was around that point that I recalled having XFS recommended to me for testing purposes, so I downloaded the extra packages required to use XFS within CentOS and reformatting the ~3TB filesystem to XFS.

The next test that I ran aborted due to a (!!!) comms error 1.8TB through the backup. Guess how long it took to clear the space? No, seriously, guess – because I couldn’t log onto the test server fast enough to actually see the space clearing. The backup aborted, and the space was suddenly back again. That’s a 1.8TB file deleted in seconds.

That’s the way a filesystem should work.

I’ve since done some (in VMs) nasty power-cycle mid-operation tests and the XFS filesystems come back up practically instantaneously – no extended check sessions that make you want to cry in frustration.

If you’re backing up to disk on Linux, you’d be mad to use anything other than XFS as your filesystem. Quite frankly, I’m kicking myself that I didn’t do this years ago.

Posted in Linux, NetWorker | Tagged: , , , , , | 8 Comments »

Routine filesystem checks on disk backup units

Posted by Preston on 2009-09-28

On Linu, filesystems typically have two settings regarding getting complete checks on boot. These are:

  • Maximum number of mounts before a check
  • Interval between checks

The default settings, while reasonably suitable for smaller partitions, are very unsuitable for large partitions, such as what you find in disk backup units. In fact, if you don’t pay particular attention to these settings, you may find after a routine reboot that your backup server (or storage node) can take hours to become available. For instance, it’s not unheard of to see even sub-20TB DBU environments (as say, 10 x 2TB filesystems) take several hours to complete mandatory checks on filesystems after what should have just been a routine reboot.

There are two approaches that you can take to this:

  • If you want to leave the checks enabled, it’s reasonably imperative to ensure that at most only one disk backup unit filesystem will be checked at one time after a reboot; this will at least reduce the size of any check-on-reboot. Thus, ensure you:
    • Configure each filesystem so that it will have a different number of maximum mounts before check than any other filesystem, and,
    • Configure the interval (days) between checks for each filesystem to be a significantly different number.
  • If you don’t want periodic filesystem checks to ever interfere with the reboot process, you need to:
    • Ensure that following a non-graceful restart of the server the DBU filesystems are unmounted and checked before any new backup or recovery activities are done, and,
    • Ensure that there are processes – planned maintenance windows if you will – for manual running of the filesystem checks that are being skipped.

Neither option is particularly “attractive”. In the first case, you can still, if you cherish uptime or don’t need to reboot your backup server often, get into a situation where multiple filesystems need to be checked on reboot if they’ve all exceeded their days-between-checks parameter. In the second instance, you’re having to insert human driven processes into what should normally be a routine operating system function. In particular with the manual option, there must be a process in place to NetWorker shutdown + checking even in the middle of the night if an OS crash occurs.

Actually, the above list is a little limited – there’s a couple of other options that you can consider as well – though they’re a little more left of field:

  • Build into the change control process the timings for complete filesystem checks in case they happen, or
  • Build into the change control process or reboot procedure for the backup server/storage nodes the requirement to temporarily disable filesystem checks (using say, tune2fs) so that you know the reboot to be done won’t be costly in terms of time.

Personally, I’m looking forward to btrfs – in reality, a modern filesystem such as that should solve most, if not all, of the problems discussed above.

Posted in Linux, NetWorker | Tagged: , , , , | Comments Off

Last month’s top story

Posted by Preston on 2009-08-01

I thought that I might from now on try to do a brief comment at the start of each month on what the most popular story of the previous month was.

There is one caveat – as I aluded to in the previous “Top 5″ post, I have some posts that get hit an awful lot of times. So, the absolute most-referenced posts, being “Fixing NSR peer information”, “Parallelism in NetWorker” and “Changing saveset browse/retention time” are effectively disqualified for making it into consideration for a “Last month’s top story”.

For this inuagural entry, we have “Carry a jukebox with you (if you’re using Linux)“. Outside of the above 3 articles, this one was viewed the most – and, for what it’s worth, generated a lot of follow-through clicks going through to Mark Harvey’s linuxvtl web page.While not a production VTL, Mark’s Linux VTL software has already given me a great deal of efficiencies over this last month in my lab environment. I have versions of NetWorker on lab servers all with the VTL configured, making testing of a wide variety of options considerably easier than having physical tape libraries connected and powered on. I hope others are finding it similarly useful. One of the comments to the article was someone asking about a more complete set of instructions for getting the VTL up and running – I aim to have this done by the end of this weekend.

Posted in Aside, Linux | Tagged: , , | Comments Off

Aside – On second thoughts, escalate that please

Posted by Preston on 2009-07-14

I use Parallels quite a lot within my Mac environment, and recently tried to get Solaris/AMD 64-bit installed. Even on a Mac Pro system Solaris stubbornly refuses to install in 64-bit mode, picking the 32-bit kernel every time.

So after exhausting a lot of search options, I submitted a case to Parallels support – titled:

“Solaris installer does not recognise 64-bit CPU”

Overnight, I got the first email back from Parallels support, with this response:

Escalating this ticket to our next level of Support since the issue is regarding Linux.

I half-typed an email response to correct the engineer, but then I thought better of it. If I need to explain that Solaris isn’t Linux to a support engineer, then on second thoughts, I’d prefer to have my case escalated to an engineer who (hopefully) already knows this.

[2009-07-15 Edit]

The second level support engineer I got was much more savvy in the differences between operating systems and was able to answer my question. Solaris 64-bit Parallels support is being actively worked on, so hopefully I’ll see release notes for an update to the current version “soon” (my words, not theirs) mentioning added support for Solaris 64-bit guests.

Posted in Aside | Tagged: , , | Comments Off

Determining your NetWorker binary build details

Posted by Preston on 2009-05-10

Occasionally, depending on the issue you are having, EMC support or EMC engineering may request that you provide your NetWorker binary build details. This isn’t necessarily the same as the version information, since patches will obviously have different build details.

Usually they just say something along the lines of “can you run what filename and return the output?” or something along those lines. Well, what isn’t always a useful command depending on the Unix environment you’re on, and I’m even seeing some sites where it’s not installed (e.g., Solaris platforms where the /usr/ccs area doesn’t exist).

So, it’s handy to know how to retrieve this information without the benefit of what. It’s actually easy. For Unix, all you need to do is:

# strings /path/to/file | grep '@('

For example, if I wanted to know the build details for /usr/sbin/save on my laptop, I’d run:

[Sun May 10 07:12:30]
preston@archon ~ 
$ strings /usr/sbin/save | grep '@('
@(#) Product:      NetWorker
@(#) Release:      7.5.1.Build.269
@(#) Build number: 269
@(#) Build date:   Fri Mar 20 23:05:02 PDT 2009
@(#) Build arch.:  darwin
@(#) Build info:   DBG=0,OPT=-O2 -fno-strict-aliasing

This is all the information that support/engineering are going to be after when they’re wanting the build number of a binary, so knowing how to use strings and grep to retrieve it gives you a solution that will work on every Unix platform.

On Windows, you can readily find the build information by right-clicking the binary, choosing Properties, and then going to the “Version” tab. You’ll get something like the following:

NetWorker build details on Windows

NetWorker build details on Windows

You can see in the above screenshot that the first three information sections are “Build Date”, “Build Info” and “Build Number” – clicking on each of those will give you the information you need to provide.

Posted in NetWorker, Support | Tagged: , , , , | Comments Off

Basics – no_striped_recover

Posted by Preston on 2009-04-03

With the introduction of the advanced file type (adv_file) device in NetWorker, changes were made to support striped recoveries. This is a recovery where if all the savesets required to facilitate a recovery are online, NetWorker commences parallel reads, speeding up the process considerably. This applies both for file and tape based devices. Both in theory and in practice, it usually works great, but there is at least one key exception I’m aware of.

For many releases of NetWorker, striped recovery can fail on Linux if more media needs to be mounted than there are devices to read from. For instance, if you have a recovery that needs to read data from 4 tapes, but you only have 3 tape drives available, in many instances of NetWorker on Linux you’ll get the situation where NetWorker will mount 2 or 3 of the tapes, but then appear to just “hang” the recovery before it starts.

Thankfully, there’s actually a relatively easy solution.

Within the /nsr/debug directory, you can create the file:

no_striped_recover

At that point, NetWorker will revert to the traditional recovery style – reading in sequence from each volume, starting at the oldest saveset required and coming forward to the newest saveset required, pulling the requisite chunks of data from each saveset.

If you’re wondering, the content of the file is irrelevant; thus, you can simply:

# touch /nsr/debug/no_striped_recover

If the recovery is actually running, you’ll need to cancel it and run it again – note that you do not have to restart the NetWorker server though.

Posted in Basics, NetWorker | Tagged: , , , | Comments Off

Distribution plug – SME Server

Posted by Preston on 2009-02-07

Sometime in 2003 I installed my last (and possibly last ever) Linux desktop machine. Having been a fan of Linux for a long time, I had started to become increasingly disillusioned in its failings as a desktop OS, particularly when it came to things that should have been simple, such as plugging in an iPod, watching a movie, or synching a Palm Pilot. After seeing time and time again the lack of issues my partner had with Mac OS X, I finally took the plunge, bought an eMac, and have since sworn off any other OS for my desktop.

That being said, I’ve been using a Linux distribution called SME Server (previously known as e-smith) for over a decade now. This is a server distribution, but not the average server style distribution you may be thinking of. It’s not designed for use with a commercial database, nor as even a workgroup, let alone enterprise backup server, and certainly not as a high performance computing server.

It’s a workgroup server with one explicit function in mind: a single server that covers all the ‘basics’ in a small environment. It handles all of the following:

  • Internet connectivity – dialup, DSL or ethernet, with full proxying
  • Internet security – SPAM filtering, strong port lockdown, etc.
  • Users and groups
  • Fileserving with anti-virus options
  • IMAP and POP email
  • Port redirection/pass through
  • Web site hosting
  • DNS

Plus probably more that to this day I’ve not needed to discover. Almost all functions are readily controllable via its simple web interface, meaning that basic administration doesn’t even require system administration skills. (Being free, it’s also considerably cheaper than Windows or even Mac OS X.)

It’s damn simple – boot from the CD/ISO, agree to really install, and it will blow away the contents of the hard drive and create a machine in its image. If you’ve got two hard drives, it automatically configures mirroring; if you’ve only got one, it still configures mirroring so that if you add another hard drive later it can enable mirroring.

It can even run in VMware if you’re so inclined and you can dedicate a NIC to it for connectivity to a DSL modem (if necessary). With a bit of kludging, you can even install NetWorker on it if you need to (the same sort of kludging you need to do to install NetWorker on ESX).

If you’re in a situation where you’re needing to install some basic server (e.g., for a charity, for home, as part of a small consulting job), I’d thoroughly recommend that you give SME Server a good close look.

Posted in General thoughts, Linux | Tagged: , , | 3 Comments »

 
Follow

Get every new post delivered to your Inbox.