从Gaussian output 输出文件提取坐标并创建gjf输入文件和xyz文件_ChemiAndy

以前写过这个脚本，最近看了看，原来的脚本有些问题，所以改了改。发现对shell, awk, sed的语法有些生疏了。所以总结了一下，写到下面。

思路：Gaussian计算中，每个构型都会输出一个如下标准去向坐标信息：

Standard orientation:

---------------------------------------------------------------------

Center Atomic Atomic Coordinates (Angstroms)

Number Number Type X Y Z

---------------------------------------------------------------------

1 16 0 -0.568290 0.000015 0.000019

2 8 0 -1.446681 -0.000139 1.264413

3 8 0 -1.448231 -0.000138 -1.263295

4 8 0 0.347143 -1.263543 -0.000530

5 8 0 0.346767 1.263798 -0.000529

6 8 0 2.807224 -0.000071 -0.000060

7 1 0 2.121824 0.714647 -0.000154

8 1 0 2.121048 -0.714150 -0.000155

---------------------------------------------------------------------

Rotational constants (GHZ): 4.8965931 1.8909475 1.8838013

如果是优化计算或者扫描势能面，会有很多个Standard orientation块。

如下脚本是提取最后一个坐标信息，并提取原子数目，总电荷，自旋多重度等信息，然后生成xyz文件和gjf文件。其中gjf文件的计算指令部分仅简单设置为“MP2/aug-cc-pvdz”，可根据需要修改

代码：

#!/bin/bash
#
#
目的：从高斯输出文件中提取坐标信息生成xyz文件和输入文件
# 用法： gout2gjf
xxxx.out 后面为高斯输出文件名作为参数
#
# ChemiAndy,
2014.03.11
#

  
# get the prefix of the
gaussian output file

NAME=`echo
$1|sed
"s/\..*$//"`

  
# Find CHARGE and
MULTIPlicity, get the first value in case more than one result
found

CHARGE=`awk
'/Charge =/{print $3}'
$1 | awk
'NR==1'`

MULTIP=`awk
'/Charge =/{print $6}'
$1 | awk
'NR==1'`

  
# Find how many atoms
by reading the first string of the last line

NATOM=`grep "NAtoms="
$1 | tail -1 |
awk '{print
$2}' `

  
echo
"$NATOM in inputfile $1 "
echo
"Charge:  
$CHARGE"
echo
"Multiplicity:    
$MULTIP"

  
# Find the block of
the coordination, delete extra lines, and
# Change the element
index into their NAMEs, to be improved
sed
'/Standard orientation:/,/Rotational
constants/!d' $1 |
grep -v -E
'I|C|N|D|R|\-\-\-'
| tail -$NATOM |
awk '{gsub(17,"Cl",$2);
gsub(16,"S",$2); gsub("1","H",$2); gsub(6,"C",$2); gsub(7,"N",$2);
gsub(8,"O",$2);  printf "%5s %10.6f %10.6f
%10.6f\n", $2, $4, $5, $6}'
> $NAME.tmp

  
# Create xyz
file
cat
$NAME.tmp | sed "1i
${NATOM}\n${NAME}" >
$NAME.xyz
echo
"$NAME.xyz created."

  
cat
$NAME.tmp | sed -e "1i
%chk=${NAME}.chk\n%mem=1024MB\n#MP2\/aug-cc-pvdz\n\n${NAME}\n\n${CHARGE}
${MULTIP}" -e '$a
\\n' >
$NAME.gjf
echo
"$NAME.gjf created."

  
# Clean tmp
file
rm
$NAME.tmp
echo
"$NAME.tmp deleted."
echo
"End!"

提取文件前缀名称，是把文件名中第一个点及其之后的内容替换为空. 正则模式：\..*$ 其中第一个点\.代表文件名中的点，其中的第二个点表示任意字符，*表示任意个数 $表示直到行尾；几个技巧：

提取两匹配行之间的内容有两种方式，比如提取含关键词AAA和关键词BBB之间的所有行

(1) sed -n '/AAA/,/BBB/p' /AAA/,/BBB/是匹配AAA和BBB之间的行, p打印行，-n取消默认输出

(2) sed '/AAA/,/BBB/!d' d删除所有匹配行, !d删除匹配行之外的所有行
在文件前插入内容xxx的行： sed '1i xxx', \n表换行
在sed命令中引用shell变量：使用双引号，并以${xxx}形式表示变量

从Gaussian output 输出文件提取坐标并创建gjf输入文件和xyz文件

杂谈