Over the last few days we noticed that the duplication jobs which write the data from the disks to the tapes have been failing.
Below information is available in the details window:
02/24/2015 06:00:00 - Info nbjm (pid=1075) starting backup job (jobid=637264) for client backups-3102.hyd.deshaw.com, policy __DSSU_POLICY_3102-DSTU1, schedule 3102-DSTU1
02/24/2015 06:00:00 - Info nbjm (pid=1075) requesting NO_STORAGE_UNIT resources from RB for backup job (jobid=637264, request id:{447CA662-BBBC-11E4-9F25-001999EDC755})
02/24/2015 06:00:00 - requesting resource backups-3100.hyd.deshaw.com.NBU_CLIENT.MAXJOBS.backups-3102.hyd.deshaw.com
02/24/2015 06:00:00 - requesting resource backups-3100.hyd.deshaw.com.NBU_POLICY.MAXJOBS.__DSSU_POLICY_3102-DSTU1
02/24/2015 06:00:00 - granted resource backups-3100.hyd.deshaw.com.NBU_CLIENT.MAXJOBS.backups-3102.hyd.deshaw.com
02/24/2015 06:00:00 - granted resource backups-3100.hyd.deshaw.com.NBU_POLICY.MAXJOBS.__DSSU_POLICY_3102-DSTU1
02/24/2015 06:00:02 - estimated 0 kbytes needed
02/24/2015 06:00:02 - begin Parent Job
02/24/2015 06:00:02 - begin Disk Staging: Start Notify Script
02/24/2015 06:00:27 - Info RUNCMD (pid=23196) started
02/24/2015 06:00:27 - Info RUNCMD (pid=23196) exiting with status: 0
Operation Status: 0
02/24/2015 06:00:27 - end Disk Staging: Start Notify Script; elapsed time 0:00:25
02/24/2015 06:00:27 - begin Disk Staging: Execute Script
02/24/2015 06:00:37 - started process bpbrm (pid=24518)
02/24/2015 06:12:31 - end writing
Operation Status: 191
02/24/2015 06:12:31 - end Disk Staging: Execute Script; elapsed time 0:12:04
02/24/2015 06:12:31 - begin Disk Staging: Stop On Error
Operation Status: 0
02/24/2015 06:12:31 - end Disk Staging: Stop On Error; elapsed time 0:00:00
02/24/2015 06:12:31 - begin Disk Staging: End Notify Script
Operation Status: 40
02/24/2015 06:22:31 - end Disk Staging: End Notify Script; elapsed time 0:10:00
Operation Status: 191
02/24/2015 06:22:31 - end Parent Job; elapsed time 0:22:29
no images were successfully processed (191)
I confirmed that the free tapes are available and the drives are accessible from the host as catalog backups (which write data directly to tapes) are completing successfully.
And in the error log of the master server:
1427108879 1 2 16 backups-3102.hyd.deshaw.com 0 0 0 *NULL* bpduplicate Error occurred during initialization, check master configuration file
1427108879 1 2 16 backups-3102.hyd.deshaw.com 0 0 0 *NULL* bpduplicate Error occurred during initialization, check master configuration file
1427108879 1 2 16 backups-3102.hyd.deshaw.com 0 0 0 *NULL* bpduplicate Error occurred during initialization, check master configuration file
1427108879 1 2 16 backups-3102.hyd.deshaw.com 0 0 0 *NULL* bpduplicate Error occurred during initialization, check master configuration file
1427108879 1 2 16 backups-3102.hyd.deshaw.com 0 0 0 *NULL* bpduplicate Error occurred during initialization, check master configuration file
1427108879 1 2 16 backups-3102.hyd.deshaw.com 0 0 0 *NULL* bpduplicate Error occurred during initialization, check master configuration file
1427108879 1 2 16 backups-3102.hyd.deshaw.com 0 0 0 *NULL* bpduplicate Error occurred during initialization, check master configuration file
1427108879 1 2 16 backups-3102.hyd.deshaw.com 0 0 0 *NULL* bpduplicate Error occurred during initialization, check master configuration file
I used to see the below messages in the admin logs of the master server. I fixed the configuration by updating the attribute value and also restarted both the master and media servers after which these messages have gone away.
15:19:14.453 [4529] <2> readStruct: Missing required data field on line 31.
15:19:14.453 [4529] <2> ParseConfigExA: Badly formed configuration entry on line 31: FORCE_RESTORE_MEDIA_SERVER = backups-3102.hyd.deshaw.com
I restarted the duplication job again but it failed with the same error:
<truncate>
1427107801 1 4 4 backups-3100.hyd.deshaw.com 641630 641630 0 backups-3102.hyd.deshaw.com nbpem job with jobid=641628 restarted as jobid=641630
1427107801 1 1 4 backups-3100.hyd.deshaw.com 0 0 0 *NULL* nbjm backup job submission request (jobid=641630) for client backups-3102.hyd.deshaw.com, policy __DSSU_POLICY_3102-DSTU1, schedule 3102-DSTU1
1427107801 1 4 4 unknown 641630 641630 0 backups-3102.hyd.deshaw.com nbjm started backup job for client backups-3102.hyd.deshaw.com, policy __DSSU_POLICY_3102-DSTU1, schedule 3102-DSTU1 on storage unit
1427108878 1 2 16 backups-3102.hyd.deshaw.com 0 0 0 *NULL* bpduplicate Error occurred during initialization, check master configuration file
1427108879 1 2 16 backups-3102.hyd.deshaw.com 0 0 0 *NULL* bpduplicate Error occurred during initialization, check master configuration file
1427108879 1 2 16 backups-3102.hyd.deshaw.com 0 0 0 *NULL* bpduplicate Error occurred during initialization, check master configuration file
1427108879 1 2 16 backups-3102.hyd.deshaw.com 0 0 0 *NULL* bpduplicate Error occurred during initialization, check master configuration file
1427108948 1 2 4 backups-3100.hyd.deshaw.com 0 0 0 *NULL* nbpem cleaning media DB(s)
1427109784 1 68 4 backups-3100.hyd.deshaw.com 641630 641630 0 backups-3102.hyd.deshaw.com nbpem CLIENT backups-3102.hyd.deshaw.com POLICY __DSSU_POLICY_3102-DSTU1 SCHED 3102-DSTU1 EXIT STATUS 191 (no images were successfully processed)
1427109784 1 4 16 backups-3100.hyd.deshaw.com 641630 641630 0 backups-3102.hyd.deshaw.com nbpem backup of client backups-3102.hyd.deshaw.com exited with status 191 (no images were successfully processed)
1427109784 1 2 4 backups-3100.hyd.deshaw.com 0 0 0 *NULL* nbpem running session_notify
1427110148 1 2 4 backups-3100.hyd.deshaw.com 0 0 0 *NULL* nbpem cleaning media DB(s)
<truncate>
I checked through the admin logs and bptm logs but couldn’t find any other error messages.
Any thoughts on this? Let me know if I have to be checking any specific log files for these duplication failures.
It was working fine for soo many months and we are seeing this issue only since last few days.
OS - Sun Solaris v5.10
Water mark level - High is set to 98% and low is set to 80% (this is the default setting in our environment and it works fine)
-Ram