NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.
  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Archive for the ‘Policies’ Category

Expecting a more complete backup

Posted by Preston on 2009-03-30

As I point out in my book, there’s a lot of stuff that ends up in datacentres completely unprotected. That includes such components as:

  • Network switch configurations
  • Storage switch configurations
  • PABXs
  • etc.

By “unprotected”, I mean not regularly backed up and monitored within the centralised enterprise backup framework.

However, it also covers backups for databases that don’t have application modules. Over the last few years there’s been a lot of outcry over a lack of support for MySQL – personally I’d prefer more attention be given to PostgreSQL, but either way, these two open source databases also frequently get neglected in centralised backup solutions as well due to the lack of module support.

When it comes to backing up what I’d traditionally refer to black box devices – switches, PABXs, etc., you’re never going to get an application module. That doesn’t mean though that backups for these devices need to stay in the dark ages where every now and then someone logs on to it, retrieves the configuration, saves it to a fileserver somewhere and hopes that it doesn’t get deleted before the next backup – or that the backups for it are retained long enough.

To centralise and automate backups for ‘non-traditional’ components, you need to start exploring scripting languages. Sure, some devices now support ssh and scp, but even so, there’ll reach a point where a certain level of interractivity is required to backup a non-traditional device or database, and at that point you need to script.

One of my favourite languages for this purpose is a lesser known one (particularly outside of Unix circles) called expect. If you don’t want to follow that link yet, but need the elevator pitch for expect, it’s a language that allows you to script an interactive session with a program that would normally reject scripts and require you to manually type the commands. That is, it’s an automation tool.

As I advocate in my book, by using utilities like expect, you can design a solution such as the following:

  • Traditional clients receive standard backups
  • For non-traditional clients:
    • Define a special client (e.g., another instance of the backup servers’ client resource) that makes use of savepnpc for its backups;
    • The savepnpc component, will, for a pre-script, log on to the non-traditional devices, retrieve their configuration dumps, backups, etc.;
    • That retrieved data will then be saved as files on the backup server, preferably both in some human-readable format (where applicable), and also in the appropriate format for re-loading the configuration;
    • Once the savepnpc activities are complete, the local filesystems will be backed up normally using NetWorker, allowing the centralise and automated backup of non-traditional clients.

Similarly, the same can be achieved for non-supported databases such as MySQL or PostgreSQL:

  • Database server is configured with savepnpc for its backup command;
  • The pre-script for the backup server generates a snapshot or exports a dump of the database;
  • The exported or snapshot region is backed up as part of the traditional filesystem backup.

In essence, what I’m saying is there’s very little, if no reason, why you can’t automate and centralise your non-traditional backups in the same way that you use an enterprise class backup product to automate and centralise your traditional backups.

For example, let’s consider backing up a PostgreSQL database via expect, and integrating that backup with regular filesystem backups for the database server. In this case, it’s only a small database, and PostgreSQL supports hot exports*, so we’ll do the backup via the PostgreSQL pg_dump command.

The pg_dump command leverages the security of the database(s) being dumped; thus, if you have a standard PostgreSQL configuration that allows anyone on the local host to connect to any database, you don’t need any special scripting. But assuming you’re in an enterprise environment and you have password protected access to the database, you will need to supply a password to the pg_dump command for the database to be backed up. The pg_dump command however is one of those pesky commands that refuses to accept a password as a command line argument**, so we need to look at using expect to supply a password for us.

So you’d start with an expect script file, which for a simple database called “tododb”, might resemble the following:

#!/usr/bin/expect -f
spawn /usr/local/pgsql/bin/pg_dump -U backup_user -W -f /nsr/backups/tododb.dump
expect "Password: "
send "a9d8ea8d12b4b47db8bd833b8fade7b2\r"
sleep 120

(For the purposes of what we’re doing, we’ll call this file pgbackup.e and assume it’s in the /home/postgresql directory.)

So what does this do? First, it spawns (or executes) the pg_dump command. Then it waits for the “Password: ” prompt to be supplied by the pg_dump command, and when it receives that, it sends the encrypted password. (No, that’s not the real password I use!)

Because it’s only a small database, it then waits 2 minutes for the dump to complete before exiting.

You’d either add into, or wrap around this script, additional commands or scripts to say, confirm that the database dump has been completed successfully, or ensure that multiple copies are being kept on-line, etc., but for the purposes of brevity I’ll leave those options as exercises for the reader. Since NetWorker doesn’t provide any environment to the precmd or pstcmd in a savepnpc action, you will need to ensure that at bare minimum you setup an execution environment that configures the PATH and the PGDATA path correctly. This might resemble the following:

#!/bin/bash

export PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/pgsql/bin
export PGDATA=/usr/local/pgsql/data
/home/postgresql/pgbackup.e

For the purposes of this exercise, we’ll assume this is saved as /home/postgresql/pgbackup.sh.

Next, the client needs to be configured to use savepnpc. Assuming you wanted to do this for the client cerberus in the group Daily Databases, you would need to create in /nsr/res on cerberus a file named “Daily Databases.res”, with the following content:

type: savepnpc;
precmd: "/home/postgresql/pgbackup.sh";
pstcmd: "/bin/sleep 5";
timeout: "12:00:00";
abort precmd with group: No;

Note that the pstcmd content isn’t really important in this scenario, as we only need to do something before the backup runs; I don’t like to delete this entry though because in the past I’ve encountered issues with savepnpc backups that were missing either a precmd entry or a pstcmd entry (admittedly that was some time ago).

With this file in place, all you would need to do is set the backup command for the client cerberus in the Daily Databases group to savepnpc; NetWorker will handle the rest.

For more in-depth coverage of choosing what to backup, extending your backups beyond traditional filesystems and databases, and making risk vs cost decisions on backup strategies, check out my book.


* I.e., can generate an export from an in-use database that can be recovered from later. Not all databases support this.

** Something the programmers of it should be applauded for.

Posted in General thoughts, NetWorker, Policies | Tagged: , , , , , | 7 Comments »

Does your backup administrator have a say in change control?

Posted by Preston on 2009-03-05

…and if not, why?

A common mistake made in many companies is the failure to include the backup administrator (or, if there is a team, the team leader for data protection) in the change control approval process.

Typically the sorts of roles involved in change control include:

  • CIO or other nominated “final say” manager.
  • Tech writing the change request.
  • Tech’s manager approving the change request.
  • Network team.

Obviously there’s exceptions, and many companies will have variances – for instance, in most consulting companies, a sales manager will also get to have a say in change control, since interruptions to sales processes at the wrong time can break a deal.

Too infrequently included in change control is the backup administrator, or the team responsible for backup administration. The common sense approach to data protection would seem to suggest this is lunacy. After all, if a change fails, surely one potential remedy will be to recover from backup?

The error is three-fold:

  • Implicit assumption that any issue is recoverable from;
  • Implicit assumption that the backup system is always available;
  • Implicit assumption that what you need backed up is backed up.

Out of all of those assumptions, perhaps only the last is forgivable. As I point out in my book, and many have pointed out before me, it’s always better to backup a little too much than not quite enough. Thus, in a reasonable environment that has been properly configured, systems should be protected.

The three-fold assumptions error can actually be sumarised more succinctly though – assuming that having a backup system is a blank cheque on data recovery.

Common issues I’ve seen caused by failures to include backup administrators in change control include:

  • Having major changes timed to occur at the same time as scheduled down-time in the backup environment;
  • Kicking off full backups of large systems prior to changes without notification to the backup administrators, swamping media availability;
  • Scheduling changes to occur just prior to the next backup, making possible the maximal amount of data loss within the periodic backup frequency;
  • Not running fresh, full backups of version-critical database content after upgrades, and thus suffering significant outages later when a cross-version recovery is required;
  • Not checking version compatibility for applications or operating systems, resulting in “upgrades” that can’t be backed up;
  • Wasting backup administrators time searching for reasons why failures occurred because change outages ran during the backups.

To be blunt, any of the above scenarios that occur without pre-change signoff are inexcusable and represent a communications flaw within an organisation.

Any change that has potential to impact on or be impacted by the backup system should be subject to approval, or at the least, notification by the backup administrators. The logical consequence of this rule is: any change that has anything to do with IT systems should logically impact on or be impacted by the backup system.

Note that by impact on, I don’t mean just cause a deleterious effect to the backup system, but also more simply, require resources from the backup system (e.g., for the purposes of recovery, or even additional resources for more backups).

All of this falls into establishing policies surrounding the backup system, and I’m not talking what backs up when – but rather, implications that companies must face as a result of having backup systems in place. Helping organisations understand those policies is a major focus of my book.

Posted in General thoughts, NetWorker, Policies | Tagged: , | 1 Comment »

Is your backup server fast enough?

Posted by Preston on 2009-02-26

Is your backup server a modern, state of the art machine with high speed disk, significant IO throughput capabilities and ample RAM so as to not be a bottleneck in your environment?

If not, why?

Given the nature of what it does – support systems via backup and recovery – your backup server is, by extension, “part of” your most critical production server(s). I’m not saying that your backup server should be more powerful than any of your production servers, but what I do want to say is that your backup server shouldn’t be a restricting agent in relation to the performance requirements of those production servers.

Let me give you an example – the NetWorker index region. Using Unix for convenience, we’re talking about /nsr/index. This region should either be on equally high speed drives as your fastest production system drives, or on something that is still suitably fast.

For instance, in much smaller companies, I’ve often seen the production servers have SCSI drives or SCSI JBODs, but the backup server just be a machine with a couple of mirrored SATA drives.

In larger companies, you’ll have the backup server connected to the SAN with the rest of the production systems, but while the production systems will get access to 15,000 RPM SCSI drives, the backup server will get instead 7,200 RPM SATA drives (or worse, previously, 5,400 RPM ATA drives).

This is a flawed design process for one very important reason – for every file you backup, you need to generate and maintain index data. That is, NetWorker server disk IO occurs in conjunction with backups*.

More importantly, when it comes time to do a recovery, and indices must be accessed, do you want to pull index records for say, 20,000,000 files from slow disk drives or fast disk drives?

(Now, as we move towards flash drives for critical performance systems, I’m not going to suggest that if you’re using flash storage for key systems you should also use it for backup systems. There is always a price point at which you have to start scaling back what you want vs what you need. However, in those instances I’d suggest that if you can afford flash drives for critical production systems, you can afford 15,000 RPM SCSI drives for the backup servers’ /nsr/index region.)

Where cost for higher speed drives becomes an issue, another option is to scale back the speed of the individual drives but use more spindles, even if the actual space used on each drive is less than the capacity of the drive**.

In that case for instance, you might have 15,000 RPM drives for your primary production servers, but the backup servers’ /nsr/index region might reside on 7,200 RPM SATA drives successfully, so long as they’re arrayed (no pun intended) in such a way that there’s sufficient spindles to make reading back data fast. Equally then, in such a situation, hardware RAID (or software RAID on systems that have sufficient CPUs and cores that it equals or exceeds hardware RAID performance) will allow for faster processing of data for writing (e.g., RAID-5 or RAID-3).

In the end, your backup server should be like a butler (or a personal assistant, if you prefer the term) – always there, always ready and able to assist with whatever it is you want done, but never, ever an impediment.


* I see this as a similar design flaw to say, using 7,200 RPM drives as a copy-on-write snapshot area for 15,000 RPM drives.
** Ah, back in the ‘old’ days, where a database might be spread across 40 x 2GB drives, using only 100 MB from each drive!

Posted in NetWorker, Policies | Tagged: , , | 2 Comments »

Tape libraries and the hidden speeds

Posted by Preston on 2009-02-25

There was recently a discussion on the NetWorker mailing list that was sparked by someone asking what sort of LTO-4 library would be recommended by the community.

This led to some very interesting and useful feedback to the person posing the question. A lot of people had feedback about different libraries they’d used – both good and bad – and questions to ask, such as slot count and CAP/Mail slot size. I felt it was also important to weigh in on media movement speed, as I think this is often something which is disregarded when evaluating libraries – even though it often can play a factor in backup throughput, usability and perceived performance.

Rather than summarise myself, here’s what I had to say on the topic then:

Too often people worry about the speed and capacity of the media, and forget about the incidental factors, such as robotic movement times and even load/seek time on the media. These can play an important factor in backup and  – more importantly, really – recovery schedules. When it comes to measuring backup performance, the sequence of “returning to slot, picking next tape, placing in drive” can actually start to make a significant impact on what I refer to as your overnight “backup bandwidth”. If it takes say, 70 seconds for one library to do it and your drives write at 160MB/s, then that’s a 10GB interruption to your backups. If another library can do the same thing in 30 seconds, that’s just a 4.7GB interruption to your backups. (I’m deliberately excluding load/unload times of the media, because in a realistic comparison it would be the same drives in both libraries…)  Repeat that say, 30 times a night, and suddenly you’re deciding whether you can afford to lose 300GB in backup time a night or 141GB in backup time a night. For bigger sites, these numbers can actually become very important.

If you are considering a new tape library, be sure to ask about not only the “easy” numbers, such as how much it costs, how much maintenance costs, and how fast the drives read/write, but also the more challenging numbers – how much time it takes to move media around.

Posted in General thoughts, NetWorker, Policies | Tagged: | Comments Off on Tape libraries and the hidden speeds

Audit is not a 4-letter word

Posted by Preston on 2009-02-20

As a system administrator, I loathed being audited. Not because I feared that it would expose holes in the security or policies of my systems, but rather because for the most part, auditing was usually conducted by incompetant staff at big name auditing/taxation companies. Now, I have no doubt that when it comes to their original auditing domains, namely taxation and accounting, such companies do usually offer excellent services.

For the most part though I’ve found that for anything outside of absolute basic system administration reviews, such companies offer poor feedback that’s often erroneous to the point of being farcical. (For example, having a password field of ‘x’ in /etc/passwd pointed out as being “insecure” having failed to note the use of shadow password files…)

So, having undoubtedly just annoyed quite a few people, I’ll go on to explain why auditing shouldn’t be a terrible experience if you’re in the storage and data protection domain. More importantly, I’ll explain how auditing can be changed from an unpleasant experience where it’s necessary to explain to management they wasted their money, to one where you, and your company, get value out of it.

The best auditing is conducted by experts in the field. Not the field of auditing, but the field of what you want audited. So, in order to get a decent and useful audit of your storage and data protection systems, you need to follow these rules:

  1. It should be done from someone outside your company.
  2. It should be done by someone who won’t be assigned any work as a result of the audit.
  3. It should be done by someone with creditionals (e.g., registered partners of companies, or like-companies for the products you’re using).

This isn’t to say that whomever does the audit should never get any further work from your company, but rather, if they make recommendations that you have to buy X, Y and Z to resolve the issues they’ve highlighted, they’re doing it out of honesty because they won’t get to sell them to you.

Moving on, there’s a few more rules you should also follow in order to get a successful audit:

  1. You must assign a champion within your company who has sufficient authority to ensure that the staff conducting the audit get access and feedback they require.
  2. You must provide direction to the auditing company – that is, outline what you need investigated and the structure of the results you want. However, this can be dangerous if mishandled, so most importantly follow the next rule…
  3. You must provide freedom for the auditing company to expand beyond your direction to encompass and point out other issues that you may not have anticipated in your directional statement.

Finally, the audit process should start with a brainstorming/whiteboarding session, and the results should be presented in a similar session.

There’s more to auditing than the above, but if you step away from the ‘regular’ auditing companies that can offer little assistance in storage and data protection, you will actually get a quality result.

Posted in NetWorker, Policies | Tagged: | Comments Off on Audit is not a 4-letter word

Things not to virtualise: backup servers and storage nodes

Posted by Preston on 2009-02-13

Introduction

When it comes to servers, I love virtualisation. No, not to the point where I’d want to marry virtualisation, but it is something I’m particularly keen about. I even use it at home – I’ve gone from 3 servers, one for databases, one as a fileserver, and one as an internet gateway down to one, thanks to VMware Server.

Done rightly, I think the average datacentre should be able to achieve somewhere in the order of 75% to 90% virtualisation. I’m not talking high performance computing environments – just your standard server farms. Indeed, having recently seen a demo for VMware’s Site Recovery Manager (SRM), and having participated in many site failover tests, I’ve become a bigger fan of the time and efficiency savings available through virtualisation.

That being said, I think backup servers fall into that special category of “servers that shouldn’t be virtualised”. In fact, I’d go so far as to say that even if every other machine in your server environment is virtual, your backup server still shouldn’t be a virtual machine.

There are two key reasons why I think having a virtualised backup server is a Really Bad Idea, and I’ll outline them below:

Dependency

In the event of a site disaster, your backup server should be at least equally the first server that is rebuilt. That is, you may start the process of getting equipment ready for restoration of data, but the backup server needs to be up and running in order to achieve data recovery.

If the backup server is configured as a guest within a virtual machine server, it’s hardly going to be the first machine to be configured is it? The virtual machine server will need to be built and configured first, then the backup server after this.

In this scenario, there is a dependency that results in the build of the backup server becoming a bottleneck to recovery.

I realise that we try to avoid scenarios where the entire datacentre needs to be rebuilt, but this still has to remain a factor in mind – what do you want to be spending time on when you need to recover everything?

Performance

Most enterprise class virtualisation systems offer the ability to set performance criteria on a per machine basis – that is, in addition to the basics you’d expect such as “this machine gets 1 CPU and 2GB of RAM”, you can also configure options such as limiting the number of MHz/GHz available to each presented CPU, or guaranteeing performance criteria.

Regardless though, when you’re a guest in a virtual environment, you’re still sharing resources. That might be memory, CPU, backplane performance, SAN paths, etc., but it’s still sharing.

That means at some point, you’re sharing performance. The backup server, which is trying to write data out to the backup medium (be that tape or disk), is potentially either competing with for, or at least sharing backplane throughput with the machines that is backing up.

This may not always make a tangible impact. However, debugging such an impact when it does occur becomes much more challenging. (For instance, in my book, I cover off some of the performance implications of having a lot of machines access storage from a single SAN, and how the performance of any one machine during backup is no longer affected just by that machine. The same non-trivial performance implications come into play when the backup server is virtual.)

In Summary

One way or the other, there’s a good reason why you shouldn’t virtualise your backup environment. It may be that for a small environment, the performance impact isn’t an issue and it seems logical to virtualise. However, if you are in a small environment, it’s likely that your failover to another site is likely to be a very manual process, in which case you’ll be far more likely to hit the dependency issue when it comes time for the full site recovery.

Equally, if you’re a large company that has a full failover site, then while the dependency issue may not be as much of a problem (due to say, replication, snapshots, etc.), there’s a very high chance that backup and recovery operations are very time critical, in which case the performance implications of having a backup server share resources with other machines will likely make a virtual backup server an unpalatable solution.

A final request

As someone who has done a lot of support, I’d make one special request if you do decide to virtualise your backup server*.

Please, please make sure that any time you log a support call with your service provider you let them know you’re running a virtual backup server. Please.


* Much as I’d like everyone to do as I suggest, I (a) recognise this would be a tad boring and (b) am unlikely at any point soon or in the future to become a world dictactor, and thus wouldn’t be able to issue such an edict anyway, not to mention (c) can occasionally be fallible.

Posted in General thoughts, NetWorker, Policies | Tagged: , , | 6 Comments »

Offsite your bootstrap reports

Posted by Preston on 2009-02-09

I can’t stress the importance enough of getting your bootstrap reports offsite. If you don’t have a bootstrap report available and you have to rebuild your NetWorker server, then you may potentially have to scan through a lot of media to find your most recent backup.

It’s all well and good having your bootstrap report emailed to your work email address, but what happens if whatever takes out your backup server takes out your mail server as well?

There’s two things you should therefore do with your bootstrap reports:

  • Print them out and send them offsite with your media – offsite media storage companies will usually store other records for you as well, so there’s a good chance yours will, even if it is for a slightly extra fee. If you send your bootstraps offsite with your media, then in the event of a disaster recovery, a physical printout of your bootstrap report should also come back when you recall your media.
  • Email them to an external, secure email address – I recommend using a free, secure mail service, such as say, Google Mail. Doing so keeps electronic copies of the bootstrap available for easy access in the event of a disaster where internet access is still achievable even if local mail isn’t possible. Of course, the password for this account should be (a) kept secure and (b) changed every time someone who knows it leaves the company.

(Hint: if for some reason you need to generate a bootstrap report outside of the regular email, always remember you can at any time run: mminfo -B).

Obviously this should be done in conjunction with local storage methods – local email, and local printouts.

Posted in NetWorker, Policies, Scripting | Tagged: , , | Comments Off on Offsite your bootstrap reports

Recommended reading

Posted by Preston on 2009-02-04

I thought I’d just interrupt the regular flow of NetWorker commentary to recommend that you check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. (If you follow that link across to Amazon, you’ll be able to peruse some of the content of the book through Amazon’s nifty preview feature.)

The book distills much of my experiences with enterprise data protection and is focused on helping companies better understand the placement of backup within their environment, as well as getting an idea of procedures and policies that need to be in place to turn a backup solution into a backup system. This covers off a breadth of topics including, but not limited to:

  • Enterprise backup concepts overview
  • Roles of staff within the backup environment – not just IT staff!
  • Where backup fits within an environment
  • Total backup solutions – determining what should be backed up, and how
  • Recovery and disaster recovery guidelines and preparedness
  • How to test
  • Performance tuning guidelines and recommendations

If you’re after a bit more information on the book, you can also check out the accompanying site, Enterprise Systems Backup.

Regardless of which backup solution you use, you’ll find valuable information in the book.

Posted in General thoughts, Policies | Tagged: | 2 Comments »

How important is it to clone?

Posted by Preston on 2009-01-29

This isn’t a topic that’s restricted just to NetWorker. It really does apply to any backup product that you’re using, regardless of the terminology involved. (E.g., for NetBackup, we’re talking duplication).

When talking to a broad audience I don’t like to make broad generalisations, but in the case of cloning, I will, and it’s this:

If your production systems backups aren’t being cloned, your backup system isn’t working.

Yes, that’s a very broad generalisation, and I tend to hear a lot of reasons why backups can’t be cloned/duplicated – time factors, cost factors, even assertions that it isn’t necessary. There may even be instances where this actually is correct – but thus far, I’ve not been convinced by anyone who isn’t cloning their production systems backups that they don’t need to.

I always think of backups as insurance – it’s literally what they are. In fact, my book is titled on that premise. So, on that basis, if you’re not cloning, it’s like taking out an insurance policy from a company that in turn doesn’t have an underwriter – i.e., they can’t guarantee being able to deliver on the insurance if you need to make a claim.

Would you really take out insurance with a company that can’t provide a guarantee they can honour a legitimate claim?

So, let’s disect the common arguments as to why cloning typically isn’t done:

Money

This is the most difficult one, and to me it speaks that the business, overall, doesn’t appreciate the role of backup. It means that the IT department is solely responsible for sourcing funding from its own budget to facilitate backup.

It means the company doesn’t get backup.

Backup is not an IT function. It’s a corporate governance function, or an operating function. It’s a function that belongs to every department. Returning to insurance, therefore, it’s something that must be funded by every department, or rather, the company as a whole. The finance department, for instance, doesn’t solely provide, out of its own departmental budget, the funding for insurance for a company. Funding for such critical, company wide expenditure comes from the entire company operating budget.

So, if you don’t have the money to clone, you have the hardest challenge – you need to convince the business that it, not IT, is responsible for backup budget, and cloning is part of that budget.

Time/Backup Window

If you’re not cloning because of the time it takes to do so, or the potential increase to the backup window (or that the backup window is already too long), then you’ve got a problem.

Typically such a problem has one of two solutions:

  • Revisit the environment – are there architectural changes that can be made to improve the processes? Are there procedural changes that can be made to improve the processes? Are backup windows arbitrary rather than meaningful? Consider the environment at hand – it may be that the solution is there, waiting to be implemented.
  • Money – sometimes the only way to make the time available is to spend money on the environment. If you’re worried about being able to spend money on the environment, revisit the previous comment on money.

Backup to another site

This is probably the most insidious reason that might be invoked for not needing to clone. It goes something like this:

We backup our production datacentre to storage/media in the business continuance/disaster recovery site. Therefore we don’t need to clone.

This argument disturbs me. It’s false for two very, very important reasons:

  • If your storage/media fails in the business continuance/disaster recovery site, you’ve lost  your historical backups anyway. E.g., think Sarbanes-Oxley.
  • If your production site fails, you only have one copy of your data left – on the backups. Not good.

In summary

There are true business imperatives why you should be cloning. At least for production systems, your backups should never represent a single point of failure to your environment, and need to be developed and maintained on the premise that they represent insurance. As such, not having a backup of your backup may be one of the worst business decisions that you could make.

Non-group cloning

If you’re looking to manage cloning outside of NetWorker groups but not wanting to write scripts, I’d suggest you check out IDATA Tools, a suite of utilities I helped to design and continue to write; included in the tools suite is a utility called sslocate, which is expressly targetted at assisting with manual cloning operations.

Posted in Policies | Tagged: , , | 10 Comments »

NetWorker 7.5 and Mac OS X

Posted by Preston on 2009-01-25

Released in December 2008, NetWorker 7.5 represents an incremental increase in NetWorker functionality mainly aimed at the following three things:

  • Virtualisation support – better VCBs, better awareness of virtual infrastructure, visualisation of virtual infrastructure, etc.;
  • Better integration with third party authority systems – e.g., LDAP, etc.;
  • Support for IPv6.

It’s the support for IPv6 that poses a particular challenge for Mac OS X clients. A bug currently exists with the NetWorker client for OS X that causes the startup and shutdown of the NetWorker daemons to take somewhere in the order of 5 minutes for the average machine. Given that IPv6 is enabled by default on Mac OS X, this means that a very high proportion of Mac OS X clients that are upgraded will experience this problem.

This doesn’t pose a problem in normal operations; this startup and shutdown normally occurs in the background for machine boot/reboot/shutdown, and thus doesn’t impact the time taken for a machine to start or shutdown – however, where it does pose an inconvenience is for an administrator working on the command line who needs to debug or test issues on a Mac OS X client. Indeed, the shutdown takes long enough that at least 50% of the time it times out and the nsrexecd processes need to be manually killed.

There are three options that administrators can choose to perform:

  • Delay deployment of NetWorker 7.5 on Mac OS X until the release of 7.5 SP1, where this issue is going to be fixed. (A 7.4.x client will communicate successfully with a 7.5 server.)
  • Accept the shutdown/startup delay for the time being and document it so that backup and system administrators are aware of the implications for the time being.
  • Disable IPv6 on affected clients until the release of NetWorker 7.5 SP1. This can be done using the command: ip6 -x  on the affected Mac OS X machines.

This isn’t a big inconvenience, but it’s one to be aware of.

Posted in NetWorker, Policies | Tagged: , , | Comments Off on NetWorker 7.5 and Mac OS X