db2start SQL1652N File I/O error occurred
(2012-09-01 23:32:37)
标签:
sql1652nfilei/oerroroccurreddb2start |
分类: DB2 |
周六,上午
应用反映一个DB2库帮忙起一下,起个库嘛,偷懒直接电话给个值班同事做下好了,结果值班同事说起不来,报IO错误,没办法还是VPN上去看看吧,
DIA8402C A disk error has
occurred.
DIA8402C A disk error has
occurred.
DIA8402C A disk error has
occurred.
DIA8402C A disk error has
occurred.
DIA8402C A disk error has
occurred.
UID
PID PPID
C STIME
TTY TIME
CMD
root
84826 122644 0 00:11:09
pts/1 0:00
-ksh
$ db2start
SQL1652N File I/O error occurred.
第一反应卷没有挂上,或者文件系统满了
询问了下应用有什么改动,说是前两天下了几个卷,重启了下机器,但没有涉及到数据库用的文件系统;
df -k 查看正常;
-------------------
$ pwd
/home/db2inst1
$ df -k |grep home
/dev/hd1
9502720
392048 96%
82963
4% /home
/dev/db2instlv 4718592
3077448 35%
537
1% /home/db2inst1
$
查看DB2LOG,
-------------------
db2diag.log
2012-09-01-11.57.17.673406-300 E3226241A401
LEVEL: Warning (OS)
PID : 36832
TID : 1
PROC : db2start
INSTANCE: db2inst1
NODE : 000
FUNCTION: DB2 UDB, oper system services, sqloopenp,
probe:80
CALLED : OS, -,
unspecified_system_function
OSERR : EIO (5) "I/O error"
DATA #1 : File name, 56 bytes
/home/db2inst1/sqllib/log/db2start.20120901115717.errlog
2012-09-01-11.57.17.674608-300 I3226643A361
LEVEL: Severe
PID : 36832
TID : 1
PROC : db2start
INSTANCE: db2inst1
NODE : 000
FUNCTION: DB2 UDB, base sys utilities, sqleIssueStartStop,
probe:60
RETCODE : ZRC=0x860F0003=-2045837309=SQLO_DERR "disk error
occurred (DOS)"
2012-09-01-11.57.17.677343-300 I3227005A307
LEVEL: Severe
PID : 36832
TID : 1
PROC : db2start
INSTANCE: db2inst1
NODE : 000
FUNCTION: DB2 UDB, base sys utilities, sqleIssueStartStop,
probe:61
MESSAGE :
/home/db2inst1/sqllib/log/db2start.20120901115717.errlog
2012-09-01-11.57.17.690167-300 E3227313A460
LEVEL: Warning (OS)
PID : 36832
TID : 1
PROC : db2start
INSTANCE: db2inst1
NODE : 000
FUNCTION: DB2 UDB, oper system services, sqlowrite,
probe:60
MESSAGE : ZRC=0x860F0003=-2045837309=SQLO_DERR "disk error
occurred (DOS)"
CALLED : OS, -,
unspecified_system_function
OSERR : EIO (5) "There is an input or
output error."
2012-09-01-11.57.17.690975-300 I3227774A592
LEVEL: Error
PID : 36832
TID : 1
PROC : db2start
INSTANCE: db2inst1
NODE : 000
FUNCTION: DB2 UDB, oper system services, sqlowrite,
probe:200
MESSAGE : ZRC=0x860F0003=-2045837309=SQLO_DERR "disk error
occurred (DOS)"
DATA #1 : File handle, PD_TYPE_SQO_FILE_HDL, 8 bytes
0x0FFFFFFFFFFFCCB0 : 0000 0003 0000 0000
........
DATA #2 : unsigned integer, 8 bytes
98
DATA #3 : signed integer, 8 bytes
-1
DATA #4 : signed integer, 4 bytes
5
2012-09-01-11.57.17.692943-300 E3228367A343
LEVEL: Warning (OS)
PID : 36832
TID : 1
PROC : db2start
INSTANCE: db2inst1
NODE : 000
FUNCTION: DB2 UDB, oper system services, sqlonewsize2,
probe:103
CALLED : OS, -,
unspecified_system_function
OSERR : EIO (5) "There is an input or
output error."
2012-09-01-11.57.17.693382-300 I3228711A359
LEVEL: Severe
PID : 36832
TID : 1
PROC : db2start
INSTANCE: db2inst1
NODE : 000
FUNCTION: DB2 UDB, bsu security, sqlex_write_log_record,
probe:69
RETCODE : ZRC=0x860F0003=-2045837309=SQLO_DERR "disk error
occurred (DOS)"
2012-09-01-11.57.17.693781-300 I3229071A358
LEVEL: Error
PID : 36832
TID : 1
PROC : db2start
INSTANCE: db2inst1
NODE : 000
FUNCTION: DB2 UDB, bsu security, sqlex_write_log_record,
probe:70
RETCODE : ZRC=0x860F0003=-2045837309=SQLO_DERR "disk error
occurred (DOS)"
2012-09-01-11.57.17.694186-300 I3229430A303
LEVEL: Error
PID : 36832
TID : 1
PROC : db2start
INSTANCE: db2inst1
NODE : 000
MESSAGE : Audit error. sqlcode is:
DATA #1 : Hexdump, 4 bytes
0x0FFFFFFFFFFFD08C : FFFF FBEE
....
------------------
一堆报错,但对定位错误原因没什么帮助;这里鄙视下DB2的log,东西写的最多,也最乱,没有重点,关键信息给的又不足,需要向ORACLE好好学习;
百度了下,参考解决方法:
db2iupdt 更新实例,如果不行
1. truss db2start 看能发现问题不,
2.
db2trc on -t -f db2trc.dmp
db2start
db2trc off
db2trc flw -t db2trc.dmp db2trc.flw
db2trc fmt db2trc.dmp db2trc.fmt
分析flw fmt文件
感觉和更新实例没什么关系;先用truss分析下;
$ truss db2start
truss: 0915-015 Cannot create subject process.
wait4all: i: 0, status: 589833, pid: 40428, created: 0
$ id
uid=1002(db2inst1) gid=802(db2iadm1)
groups=801(db2fadm1)
$
恩,db2inst1用户不能进行truss,改root来做
先得到db2inst1 shell会话的pid
$ ps -f
db2inst1 53830
84826 0 00:11:14
pts/1 0:00
-ksh
#as root
s85a/# id
uid=0(root) gid=0(system)
groups=2(bin),3(sys),7(security),8(cron),10(audit),11(lp)
s85a/# truss
-deaf -o /tmp/truss.out -p 53830
#as db2inst1
db2start
...
查看truss结果
分析truss.out,发现Lockfile等一些err,但报的是file handle,定位不了;
继续试试db2trc
#as db2inst1
$ db2trc on
-t -f db2trc.dmp
Could not create the trace file "db2trc.dmp".
$
这里比较奇怪了。怎么生成不了。
查看了下有这个文件,但大小为0;
会不会是写不了文件,试了下果然
$ > aaa
There is an input or output error.
ksh[2]: aaa: 0403-005 Cannot create the specified file.
$
$ mkdir abc
mkdir: 0653-358 Cannot create abc.
abc: There is an input or output error.
$
估计是文件系统损坏了,打电话给主机值班同事查下,也给应用维护说了下,幸好也是个开发环境;
没多久,主机同事反馈确实是文件卷有问题,可能有检测到坏块什么的,修复重新挂载现在已经好了。
试了下可以正常写入;开始起库:报告,数据库未能正常启动,但核心进程启动成功。。。。
报错内容没记下来,再restart次后,数据库启动成功,测试了下读写都恢复正常
db2 restart db zmccdev ;
总结:
1. 学习使用truss 命令跟踪进程 ,以往是直接truss或tusc
命令的pid,如果命令执行很快则没法监控, 可以通过ps -f 得到pid,
然后另一个SESSION中进行truss -deaf -o /tmp/truss.out -p
pid,回到第一个session中执行命令;
2. 学习使用db2trc 命令;