Or, “In 5 years time will we reflect on VTLs as an example of a bad direction in data protection?”
Many people are convinced that VTLs are the bees knees – they offer backup to disk while still working within the bounds of a tape library (or libraries), are frequently considered to be “easier to conceptualise”, and are generally considered by many to be a good thing.
Therefore I want to preface what I’m about to discuss with the following:
- The company I work for sells VTLs
- I have actively proposed and recommended VTLs in particular scenarios
- I will continue to actively propose and recommend VTLs in particular scenarios
Thus, I am not “anti-VTL” as such – I see them as representing valid usage in today’s enterprise backup market, though I don’t see them as a be-all and end-all replacement to traditional backup to disk. I don’t see them going away within the next few years either.
What I do see them as is a solution to symptoms, not problems. Indeed, I see VTLs as fundamentally inelegant. This is not to say that backup to disk is anything but inelegant itself; rather, the former is an inelegant triage option, and the latter is an inelegant solution.
Or to put it another way, I believe the world would be a better place if VTL did not exist if and only if disk backup worked as it should. This is regardless of whichever backup product you’re working in.
What is missing in disk backup
To elucidate my point that VTL is a solution for symptoms, not problems, I first need to elaborate on why VTLs are sometimes currently required – and to do that, I need to explain what’s wrong with disk backup.
- Since 99% of solutions still require tape (nothing says “off-site, off-line copy” better than tape), there is a temptation to try to keep an entire solution as “all tape”.
- It’s too easy to put all your eggs in one basket – far too often sites that deploy disk backup do so on their production array (even if it is a dedicated set of LUNs comprising of disks that aren’t used for production data); this introduces array-level performance issues as a secondary concern, but most importantly, introduces significant potential for cascading failures as a result of insufficient redundancy.
- Filesystem performance:
- Depending on the operating system and file system used, fragmentation over time can cause performance issues.
- Depending on the operating system and file system used, checking a multi-terabyte filesystem for consistency after crashes may be operationally unfeasible.
- Unintelligent management:
- Media management has grown out of working mainly with tape; for instance, it took until 6.5 for NetBackup to support failing over a backup from one disk device to another (when the first one fills). NetWorker still isn’t there yet.
- Disk is everywhere; indeed, spare disk is everywhere. Disk backup fails to take advantage of any distributed processing and storage that would be available within even a moderate organisation. I.e., for the average organisation, DAS isn’t going away any time soon. So why not actually intelligently make use of it?
- Access is still available to the contents of the disk backup filesystem, both to other applications, and other users. Perhaps more frustratingly, this is still required, but equally creates problems that should not exist.
- Disk systems remain fundamentally more flexible than they are being used for in backup. It’s like having a Ferrari, but only ever driving around in 1st gear – or saying you’re good at ten pin bowling, even though you never play without bumper guards on the lanes. (I’d suggest that deduplication backup systems are the first good example of making intelligent and original use of backup to disk.)
- Clients are typically unable to retrieve the data stored on disk backup without the presence of the backup server. While there are authorisation/security issues that must be considered, it’s wrong that disk backup requires the backup server active to be able to readily retrieve the data. Furthermore, this creates operational demands on the backup server that should not exist.
What many of these issues come down to is the following:
- Use of traditional OS filesystems introduces fundamental limitations to disk backup.
- Coming at disk backup from the long-term perspective of tape adds what I’d call “programmer baggage”.
- Psychologically for some people it is easier to accept, “you need virtual tape and tape within your environment” than it is to accept “you need disk backup and tape within your environment“. Or rather, a quality and potentially expensive array that’s presenting itself as tape seems a better investment than a quality and potentially expensive array that should only be used by the backup system. Crazy, but true. (Even more so, reluctance to purchase disk backup that is highly redundant with RAID, hot spares, etc., occurs often. Purchase of VTLs as “black boxes” at stated capacity that employ the same, if not more levels of RAID and hot spares, is seen as “OK” by the same who would quibble about RAID and hot spares for disk backup.)
I would argue that the issues above are not with the theoretical architecture and usage of disk backup, but with the actual implemented architecture and treatment of disk backup.
How to move disk backup forward
It’s clear that disk backup as an implementation, regardless of backup system or platform, has issues that hopefully over time will be addressed. Note however that I say hopefully, not probably.
So what needs to be done with disk backup?
- Culturally, those who would shy from purchasing arrays (particularly those with redundancy) for the purpose of disk backup, but would happily sign a cheque for a VTL with a given/stated capacity need to, ahem, get over it. Just as it was necessary 10 years ago for the cultural shift to accepting that backup is necessary, there needs to be a cultural shift to understanding that it’s “six of one, half a dozen of the other”.
- Backup vendors need to:
- Ditch antiquated models of media management that are inherited from dealing with tape when dealing with disk. I’d argue that the deduplication products are the first true sign that this can be done.
- Side-step the inherent limitations of filesystems and either implement their own, or come up with suitable raw-disk options that include appropriate accessibility tools (or liaise with operating system vendors to get filesystem variants designed exclusively for mass data storage needs).
- Rearchitect their products to support massively distributed disk backup media. I liken this almost to the inverse of “the cloud”. Products like Mozy for instance (good, stable products for home users) backup to the cloud – the internet. The future of backup for enterprise though is not in the cloud, but in the earth. Let’s call earth based storage a paradigm where storage transcends individual operating system and filesystem boundaries and makes use of capacity no matter where it is within the logical bounds of an organisation.
- Administrators and managers need to stop treating disk backup as “regular storage” that can be pinched and borrowed from. Did your backups fail because someone dropped a 1TB copy of a database onto the backup-to-disk area just because it was a nice big area? Guess what, that’s not the fault of the disk backup system.
What that means for VTL
Ultimately what this means for VTL (in my opinion), is that VTL is a solution to the problems inherent with the current state of implemented architecture for disk backup, not an alternative or better solution to the theoretical architecture of disk backup.
If disk backup were enhanced to reach a level of intelligent management and control that it is fundamentally capable of (being disk) it should erase the need for VTLs.
Back to the original question
Back then to our original question – will we, in 5 years time, reflect on VTLs as an example of a bad direction in data protection?
Yes, and no. Yes, we will because there’ll be a better understanding by that stage that VTLs are about triage. No, we won’t, because I don’t see disk backup architecturally reaching a point in 5 years time that it achieves everything it needs to in order to erase the need for VTLs.
Check back in 10 years though.