Friday, October 25, 2013

sysibmadm.dbmcfg gives error sql0444n reason code 4

whilst developping a script to collect database manager and database configuration values from all our servers, we ran into an issue on 2 of our servers recently migrated to DB2 9.7 fp8.

when we accessed sysibmadm.dbmcfg we received error:
SQL0444N  Routine "*_GET_CFG" (specific name "DBM_GET_CFG") is implemented
with code in library or path ".../sqllib/bin/routine/db2dbroutext", function
"*m_get_cfg" which cannot be accessed.  Reason code: "4".  SQLSTATE=42724

if we accessed sysibmadm.dbcfg , this administrative view worked fine,

when we compared the two views / routines behind ...

db2 "select routinename,FENCED from syscat.routines where ROUTINENAME like '%GET_CFG'"

ROUTINENAME                                                                                                                      FENCED
-------------------------------------------------------------------------------------------------------------------------------- ------
DBM_GET_CFG                                                                                                                      Y
DB_GET_CFG                                                                                                                       N

  2 record(s) selected.

we noticed that the DBMCFG routing is fenced and the DBCFG one is not fenced.

after some time we investigated the path to the library ( as you can find in the doc for reason code 4 )
and we noticed the real path( not symlink in sqllib ) was accessible to the instance owner, but not to the fenced user.

i.e <db2 instance owner home folder>/sqllib/bin/routine/db2dbroutext = OK
    <db2_install_path>/bin =NOK

granting rights for the groups and for all users to the path containing the db2 binaries,
chmod 775 db2_install_path

after this the fenced user could access the library path containing the routine definition,
and we could use sysibmadm.dbmcfg 



Thursday, October 24, 2013

fcm_parallelism + sql6031n

after monitoring our DPF database for our datawarehouse I noticed that a lot of wait time was spent in fcm communication, so I decided to try out the new fcm_parallelism parameter that exists since DB2 9.7 fp6

FCM parallelism support added

to test this out I performed the following actions:
db2 get dbm cfg | grep -i fcm_parallelism
 Inter-node comm. parallelism          (FCM_PARALLELISM) = 1

db2 attach to <instance>
db2 update dbm cfg using FCM_PARALLELISM 2
DB20000I  The UPDATE DATABASE MANAGER CONFIGURATION command completed
successfully.
SQL1362W  One or more of the parameters submitted for immediate modification
were not changed dynamically. Client changes will not be effective until the
next time the application is started or the TERMINATE command has been issued.
Server changes will not be effective until the next DB2START command.

as stated above you have to restart the instance to activate this parameter,
so I force all connections and stopped db2:
db2 force applications all
> db2stop
24/10/2013 14:08:43     0   0   SQL1064N  DB2STOP processing was successful.
24/10/2013 14:08:45     2   0   SQL1064N  DB2STOP processing was successful.
24/10/2013 14:08:45     3   0   SQL1064N  DB2STOP processing was successful.
24/10/2013 14:08:45     4   0   SQL1064N  DB2STOP processing was successful.
24/10/2013 14:08:46     6   0   SQL1064N  DB2STOP processing was successful.
24/10/2013 14:08:46     7   0   SQL1064N  DB2STOP processing was successful.
24/10/2013 14:08:47     5   0   SQL1064N  DB2STOP processing was successful.
24/10/2013 14:08:47     8   0   SQL1064N  DB2STOP processing was successful.
24/10/2013 14:08:49     1   0   SQL1064N  DB2STOP processing was successful.
SQL1064N  DB2STOP processing was successful.


when I tried to restart with fcm_parallelism set to 2 I received the following error,

> db2start
SQL6031N  Error in the db2nodes.cfg file at line number "6".  Reason code "12".

> db2 "? SQL6031N"

SQL6031N  Error in the db2nodes.cfg file at line number "<line>". Reason
      code "<reason-code>".

Explanation:

The statement cannot be processed because of a problem with the
db2nodes.cfg file, as indicated by the following reason codes:

12

         The port value at line "<line>" of the db2nodes.cfg file in the
         sqllib directory is not in the valid port range defined for
         your DB2 instance id in the services file (/etc/services on
         UNIX-based systems).

12

         Ensure that you only use port values that are specified in the
         services file (/etc/services file on UNIX-based systems) for
         your instance.

So I sent a request to our sysadmins to add extra ports to the /etc/services file,
hopefully this will allow us to exploit the fcm parallelism...

previously we had 9 ports defined in /etc/services,
AIX team increased this to 18,
then db2 will start with  fcm_parallelism =2




Wednesday, October 16, 2013

DB2ATS + Transaction log full + backup fails

I had the backup team contact me because a backup had failed because of a transaction log full.

when I investigated this transaction log issue, the connection was held by a job launch by the DB2ATS ( db2 automatic task scheduler )

I tried to follow the action suggested in the db2diag.log trying to force the connection mentioned there,
but for some reason you are not allowed to force ATS connections ( I feel this is not good , If I decide to kill a connection it's my responsibility :)).

I tried deactivating the automated task scheduler using db2set

db2set DB2_ATS_ENABLE=NO

however this did not solve my problem,

after some time I decided to see if db2pd could help me,
and rememberd some searches we did in the past for deadlatch situations when a database was hanging,
the database was actually hanging but I was out of options.

so I launched db2pd -latches
and I saw a huge list of latches:

all of them of this type:
SQLO_LT_sqlerFmpRow__ipcLatch

I did a search on google an luckily a support link came out;
IC76825: THREADED DB2FMP PROCESS LOOPS IN ITS SIGNAL HANDLER WHEN IT RECEIVES NESTED SIGNALS

in there they said to check for a db2fmp process consuming CPU,

so I did a ps -ef | grep db2fmp | grep <instance owner>

and then did a db2fmpterm <pid of the db2fmp of my instance>

after this the database got into a more normal state, all of the latches had dissappeared,
and backup worked as it should ...



Tuesday, October 15, 2013

the networker config file

the networker config file

  1. for backups
    1. all you need from the db2 command line is the backup server and backup client ( if you schedule your backup from the networker server or want to use nsrdb2sv /nsrdasv, you need more parameters)
      1. NSR_SERVER=backupservername
      2. NSR_CLIENT=nameoftheclientdefined in networker
      3. it's better to have a different log and data volume to avoid tape mount conflict that blcok the backup from completion 
    2. if you forget an '@' sign you will get error sql2062n reason code 3
  2. for log archiving
    1. don't forget @ in front of the config file when configuring logarchopt1
    2. if you change the config file, the new values are not taken up immediately,it's better to restart the db2vend processes (i.e. kill them )

Friday, October 11, 2013

nsrdb2rlog does not work with error message related to file privileges

nsrdb2rlog doe snot work with error message related to privileges



nsrdb2rlog -a dbname -d . -C 1 -E 13867 -S 13856 -s networker_server
Unable to create temporary configuration file please check permissions


solution:

use truss to see which system call cause the error:
truss nsrdb2rlog -a dbname -d . -C 1 -E 13867 -S 13856 -s networker_server


near the end of the output you can see that a certain file is not being created due to privileges issues,
then you can correct the privileges on the folder to make it writable for the user you sue for the nsrdb2rlog command



DB2 restore seems to hang using backup saved on networker

DB2 restore seems to hang using backup saved on networker.

check that the nsrladb is setup correctly ( no reference to a different server than the server you are trying to restore on)

solution:

  1. stop networker client
  2. rename /nsr/res/nsrladb
  3. restart networker client
now the restore command should start processing.



Tuesday, September 24, 2013

nsrdb2rlog => Initialized session but failed to get object : due to file system full

when using nsrdb2rlog to retrieve logfiles from another host on a system, make sure the directory you write to is not full , all you get is a generic error which does not point to the file system full
nsrdb2log -a <database name> -C logchain -d <destination> -E last logfile -N <node/member/partition number> -S start logfile -s <networker server>
with the logfiles stripped of extension.LOG and prefix S000..
Initialized session but failed to get object

in my case this was the home folder of the instance owner of the instance I restored into...