上一篇测试了通过pg_resetxlog来清理WAL,我们还可以通过它来恢复pg_control。
pg_control在$PGDATA/global下,很小,很不起眼。但一旦这个文件被损坏,PG就启不来。
cd $PGDATA/global
mv pg_control ./..
看日志,可以看到以下信息:
PANIC: could not open control file
"global/pg_control": No such file or directory
LOG: checkpointer process (PID 5989) was
terminated by signal 6: Aborted
LOG: terminating any other active server
processes
WARNING: terminating connection because of crash
of another server process
DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted
shared memory.
HINT: In a moment you should be able to reconnect
to the database and repeat your command.
LOG: archiver process (PID 5993) exited with exit
code 1
LOG: all server processes terminated;
reinitializing
pg_controldata
pg_controldata: could not open file
"/database/pgdata/global/pg_control" for reading: No such file or
directory
这时我们只能去恢复pg_control了。
命令不难,难的是如何设置参数值!
-x:
A safe
value for the next transaction ID (-x) can be determined by looking
for the numerically
largest file name in the directory pg_clog under the data
directory, adding one, and then multiplying by 1048576. Note that
the file names are in hexadecimal. It is usually easiest to specify
the option value in hexadecimal too. For example, if 0011 is the
largest entry in pg_clog, -x 0x1200000 will work (five trailing
zeroes provide the proper multiplier).
cd $PGDATA/pg_clog
ls -l
-rw------- 1 postgres postgres 8.0K Mar 7
14:58 0000
根据上述规则,-x = 0000+1跟5个0=0x000100000 (2进制换算成16进制,*1048576 =
加5个0)
#################################################################################################
-m:
A safe
value for the next multitransaction ID (-m) can be determined by
looking for the numerically largest file name in the directory
pg_multixact/offsets under the data directory, adding one,and then
multiplying by 65536. As above, the file names are in hexadecimal,
so the easiest way to do this is to specify the option value in
hexadecimal and add four zeroes
cd $PGDATA/pg_multixact/offsets
ls -l
-rw------- 1 postgres postgres 8.0K Mar 7
14:58 0000
所以:-m = 0000+1 跟4个0 = 0x00010000 (2进制换算成16进制,*65536 = 加4个0)
#################################################################################################
-O:
A safe
value for the next multitransaction offset (-O) can be determined
by looking for the numerically largest file name in the directory
pg_multixact/members under the data directory, adding one, and then
multiplying by 65536. As above, the file names are in hexadecimal,
so the easiest way to do this is to specify the option value in
hexadecimal and add four zeroes
cd $PGDATA/pg_multixact/members
ls -l
-rw------- 1 postgres postgres 8.0K Feb 28 09:04 0000
所以:-O = 0000+1 跟4个0 = 0x00010000 (2进制换算成16进制,*65536 = 加4个0)
#################################################################################################
-l:
TheWAL
starting address (-l) should be larger than anyWAL segment file
name currently existing
in the directory pg_xlog under the data directory. These names are
also in hexadecimal and have
three parts. The first part is the “timeline ID” and should usually
be kept the same. Do not choose a value larger than 255 (0xFF) for
the third part; instead increment the second part and reset the
third part to 0. For example, if 00000001000000320000004A is the
largest entry in pg_xlog, -l 0x1,0x32,0x4B will work; but if the
largest entry is 000000010000003A000000FF, choose -l 0x1,0x3B,0x0
or more.
cd $PGDATA/pg_xlog
ls -l
total 129M
-rw------- 1 postgres postgres 56 Feb 28 10:40
00000002.history
-rw------- 1 postgres postgres 64M Mar 7 15:03
000000030000000D0000002D
-rw------- 1 postgres postgres 64M Mar 7 15:21
000000030000000D0000002E
drwx------ 2 postgres postgres 32K Mar 7 15:21
archive_status
这里的算法,根据“larger than anyWAL segment file name currently
existing
in the directory pg_xlog under the data directory”,
所以:
TLI (timeline ID)=0x00000003(同第一段)
FILE = 0x0000000D (同第二段)
SEG = 0x0000002E + 1 = 0x0000002F
-l = 0x3,0xD,0x2F (这里前置的0都去处了)
#################################################################################################
-e和-o不清楚怎么设置,官方文档也没有介绍,幸亏提到is not critical,所以就不管了。
接下来就开始恢复了。
1,cd $PGDATA/global
touch pg_control
2,pg_resetxlog -x 0x100000 -m 0x10000 -O 0x10000 -l 0x3,0xD,0x2F
-f $PGDATA
pg_resetxlog: pg_control
exists but is broken or unknown version; ignoring it
Transaction log reset
3,看下control内容
pg_controldata
pg_control version
number:
922
Catalog version
number:
201204301
Database system
identifier:
5852524061837064188
Database cluster
state:
shut down
pg_control last
modified:
Thu 07 Mar 2013 05:02:24 PM CST
Latest checkpoint
location:
D/BC000020
Prior checkpoint
location:
0/0
Latest checkpoint's REDO
location:
D/BC000020
Latest checkpoint's
TimeLineID:
3
Latest checkpoint's
full_page_writes: off
Latest checkpoint's
NextXID:
0/1048576
Latest checkpoint's
NextOID:
10000
Latest checkpoint's
NextMultiXactId: 65536
Latest checkpoint's
NextMultiOffset:
65536
Latest checkpoint's
oldestXID:
2296015872
Latest checkpoint's
oldestXID's DB: 0
Latest checkpoint's
oldestActiveXID: 0
Time of latest
checkpoint:
Thu 07 Mar 2013 05:02:24 PM CST
Minimum recovery ending
location:
0/0
Backup start
location:
0/0
Backup end
location:
0/0
End-of-backup record
required:
no
Current wal_level
setting:
minimal
Current max_connections
setting:
100
Current max_prepared_xacts
setting: 0
Current max_locks_per_xact
setting: 64
Maximum data
alignment:
8
Database block
size:
8192
Blocks per segment of large
relation: 131072