NetWorker Blog

Commentary from a long term NetWorker consultant and Backup Theorist

  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Enterprise Systems Backup and Recovery

    If you find this blog interesting, and either have an interest in or work in data protection/backup and recovery environments, you should check out my book, Enterprise Systems Backup and Recovery: A Corporate Insurance Policy. Designed for system administrators and managers alike, it focuses on features, policies, procedures and the human element to ensuring that your company has a suitable and working backup system rather than just a bunch of copies made by unrelated software, hardware and processes.
  • This blog has moved!

    This blog has now moved to nsrd.info/blog. Please jump across to the new site for the latest articles (and all old archived articles).
  •  


     


     

  • Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

Basics – Parallelism in NetWorker

Posted by Preston on 2009-02-17

Hi!

The text for this article has moved, and is now at the permanent NetWorker Information Hub Site. Please read it here.

Advertisements

19 Responses to “Basics – Parallelism in NetWorker”

  1. VJ said

    Hi

    Thanks for your inputs. really helpful.

    Now, i am in a different situation like I need to upgrade my NetWorker 7.3.2 running in RHEL server to 7.5 version or 7.4.4 as now 7.3.2 end of support. 7.3.2 was very stable for me, please guide me on this, which version should i use for easier migration and licensing transfers along with stable like 7.3.2

    Thanks a lot

    cheers
    VJ

    • Preston said

      Without knowing your site I can’t be 100% certain, but as a general rule, I’m not recommending 7.5 at the moment. I want to see a full service pack come out for it. On the other hand, 7.4.4 has proved to be a very stable version, and I’ve got a lot of customers either on it, and quite happy with it, or about to go to it.

  2. VJ said

    Thanks Preston. Your suggestions are valuable. My site is having one RHEL 4 32 bit server with Quantum library managing 350 clients (solaris 8,10,RHEL and windows servers).

    Again, How is it difficult from migrating 7.3.2 to 7.4.4 with preserving all configurations related to huge /nsr/mm and /nsr/index directories

    Thanks a million.

  3. Preston said

    It’s actually quite simple. The one recommendation I always make with Linux versions of NetWorker is use rpm –erase to actually erase the NetWorker packages and then install the new versions. There is a long running bug in NetWorker’s installation scripts in the RPM version of its packages that causes the /etc/init.d/networker script to be removed when you do an RPM upgrade. This is inconvenient, so it’s best to do an erase followed by an install, rather than the upgrade.

    As with all upgrades, you should read the upgrade notes in case there’s anything specific to your site, but generally speaking for 7.3.x -> 7.4.x at most you should just have to do a full backup of your backup server, uninstall the old packages and install the new packages. (The media database and indices are compatible across the two versions.)

  4. VJ said

    Thanks.
    I have planned the following steps
    1. backup of entire /nsr filesystems that includes /nsr/mm and /nsr/index directories along with /etc/init.d/networker start script
    2. rpm erase of 7.3.2 version
    3. rpm installation of 7.4.4
    4. copy back of /nsr/mm and /nsr/index so that i can bring back the entire configuration of clients/schedules/groups

    Is this steps OK ?

    My only worry is /nsr/index is 350 gb of space,no space left out to copy this directory for backup.

  5. Preston said

    Erasing NetWorker’s packages doesn’t erase the configuration files (/nsr/mm), the media database (/nsr/mm) or the indices (/nsr/index). So Step (4) isn’t required.

    EMC strongly recommend against just “copying” these files (even though I’ve done it when I’ve found it necessary) in case something goes wrong. You should start by doing a full bootstrap backup (savegrp -O groupName) where “groupName” is the name of a group that has every client in it – this just backs up the media database, indices and res database. That way you’ve got a full backup to recover from _if_ (and only if) you have problems. Alternatively, just doing a full backup of the backup server before the upgrade will backup the media database, res database and the index for the backup servers. Indices for other clients can be recovered as required if necessary, but again, these aren’t deleted by the uninstall process.

  6. Semion said

    Thanks man for this input , for past 2 weeks i got 50% delay of backups , freaking legato support could not told me i need to increase server parralelism , now after i fixed it it works .

    • Preston said

      You’re welcome! I think like any company, EMC’s customers can have both good and not so good support experiences. In the case of parallelism, it’s sometimes (unfortunately) one of those things that gets “assumed” to be OK and move on. However, like name resolution it can be the case that if you don’t look for it first, you may spend some time chasing more complex problems that actually aren’t there.

  7. […] for number 1, a topic I’m completely unsurprised to see at the top, we have Basics – Parallelism in NetWorker. Not because it’s difficult, but because there’s no absolute rules, parallelism is a […]

  8. rajat said

    hi iam running with legato 7.4 networker with one as a media server and other as a dedicated storage node .
    all the drives are visible on the storage node.
    on setting up the parllalism on server and storage node( one of the client) to 24 and target session to 6 its not taking 24 savesets whereas the same has been tested with other clients with same configration. they aretaking 24save sets..this client is only taking 10 savesets..
    please lemme noe waht could be the problem behind this

    • Preston said

      How many unique savesets are there on the client where you’re not getting that number of simultaneous savesets? I.e., is the client actually physically configured in such a way that allows it to generate that many savesets? (I’m afraid I’m not quite understanding whether your problem is that an individual client isn’t generating 24 savesets, or that a particular tape drive is not writing 24 savesets.)

      • rajat said

        hi

        actually its only a single server where clients have been made according to day wise suppose for the same server clients are been made as DB-pool monday for monday backup for the same client database ,DB-pool tuesday for tuesday backup
        on friday ie DB pool friday it has sucessfully taken 24 savesets . the client server is the same for monday to sunday pool..

        if with the same setting its has taken 24 savesets then how can its getting restricted to fixed ie (10) parellelism for monday client on same server which is a dedicated storage node

      • Preston said

        Unfortunately I’m still not able to understand your configuration from your description. You say at the start it’s a “single server”, but then you refer to it being a dedicated storage node as well. (I’m also concerned by my take on your pool description. Having different pools for each day of the week sounds arbitrarily complex.)

        Are these filesystem backups or database backups? There’s a variety of reasons why a machine may backup on different days with different levels of parallelism. For instance, database servers, where different streams are allocated based on the amount of changes to be backed up (e.g., when doing incremental backups), may generate more or fewer savesets.

        For filesystem backups, if the backups are incrementals and there’s very little change happening you could also find that some backups complete fast enough that they don’t appear for very long in monitoring and therefore aren’t readily observable.

        (Hint: When describing NetWorker problems, it’s best to stick to NetWorker nomenclature. That is – there’s only one server, the backup server, and all other machines are either clients, storage nodes, or dedicated storage nodes. Mixing the terms makes the description ambiguous.)

  9. Bob Payne said

    Just read your stuff on //-ism. Something not right on our site. A group starts with 10 clients, all with client //-ism of 2. Device target sessions=20. Pool Max //-ism of 20. savegrp //-ism=0.
    Networker asks for all 10 drives in the library to be loaded. That’s no good because a different Storage Node wants some tape drives later. But anyway I can’t work out why 10 drives are required. I think it should want only 2. We get round it by selecting only 4 drives (LTO4) for the pool. Puzzlement.

    • Preston said

      That certainly seems odd behaviour on the outset – though FWIW having 20 target sessions to each drive will result in “massive” level of multiplexing being done to tape, which will have a significant detriment to complete filesystem recovery performance, etc.

      Everything (bar one) in your list of parallelism settings seems relatively normal. The one that strikes me as odd is the Pool Max parallelism – maybe that’s interacting with the device target sessions in an odd way. That’s used to set an upper limit on the number of parallel sessions writing to media in the pool, and perhaps the way it interacts with the other settings is (given the high target sessions per drive) is to try to enforce that setting first, thus spreading the sessions out across multiple drives. I’d suggest eliminating that setting first – tuning it down to zero, to see what happens. To be honest, I’ve not yet found a day-to-day situation that requires use of the pool max parallelism setting, so it may be that you’re overcomplicating it, given I’ve accomplished similar configurations (albeit with slightly smaller device target sessions) without needing to use it.

      Also check the server parallelism setting, which you haven’t mentioned. It should be a multiple of desired device target sessions (at minimum). So if you want to get 20 streams going to 2 drives you’d want it at 40, etc. I’m presuming based on your description though that you’ve got it set to 20?

      • Bob Payne said

        Thanks for the suggestion about pool max //le-ism. Reduced it to zero, but unfortunately made no difference. And you’re right about its usefulness.
        BTW, I assume you meant server //le-ism 120 which is what it is set to.

      • Preston said

        Actually I did assume server parallelism of 20 … i.e., I thought your intent was to ensure that one drive would run at 20 target sessions, not that all 10 would.

        In theory, everything you’ve described then means that we should see the ramp-up of target sessions you want. Out of curiosity, have you considered stopping NetWorker and clearing out the nsr/tmp directory to see whether that makes a difference? (Obviously when you do this, you’ll lose any state of groups that have been aborted or failed that you want to restart, so be sure to do it at a suitable time.) It may be that there’s something lingering in that state directory that’s causing problems for you. Also, what version of NetWorker are you using?

  10. Bob Payne said

    Networker server version 7.4.4.7 on Solaris. We saw a media db corruption on 7.4.4 that 7.4.4.6 didn’t fix. Was fixed on 7.4.4.7, but that’s another story.
    /nsr/tmp is removed on every Networker restart.

    • Preston said

      In theory everything you’re describing should work, and I can’t recall/find any bugs that would be causing this off-hand. I wonder whether it’s a level-of-parallelism issue – I’d suggest dropping to 16 or below where you’re currently using 20, and adjusting the server parallelism appropriately based on the current metric you’re using, to see whether that makes a difference. If not, it might be worthwhile logging a case with EMC.

Sorry, the comment form is closed at this time.

 
%d bloggers like this: