加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

Moses:Build Baseline System (phrase-based translation system)

(2013-04-09 23:09:50)
标签:

mt

moses

it

分类: 机器翻译

 

Moses:Build Baseline System (phrase-based translation system)

1. corpus preparation(语料预处理)

(1)     tokenization(分词)

executed: tokenizer.perl

command: Usage ./tokenizer.perl (-l [en|de|...]) (-threads 4) < textfile > tokenizedfile\n

Example:

~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en < ~/corpus/training/news-commentary-v7.fr-en.en > ~/corpus/news-commentary-v7.fr-en.tok.en

(2)     truecasing(处理大小写等)P82

executed: train-truecaser.perl

command: train-truecaser.perl --model MODEL --corpus CASED

Example:

~/mosesdecoder/scripts/recaser/train-truecaser.perl --model ~/corpus/truecase-model.en --corpus ~/corpus/news-commentary-v7.fr-en.tok.en

executed: truecase.perl

Command: truecase.perl --model MODEL < IN > OUT

Example:

~/mosesdecoder/scripts/recaser/truecase.perl --model ~/corpus/truecase-model.en \

< ~/corpus/news-commentary-v7.fr-en.tok.en > ~/corpus/news-commentary-v7.fr-en.true.en

(3)     cleaning(去除过长或者过短句子)P164

executed : clean-corpus-n.perl

command: clean-corpus-n.perl corpus l1 l2 clean-corpus min max [lines retained file]

example:

~/mosesdecoder/scripts/training/clean-corpus-n.perl ~/corpus/news-commentary-v7.fr-en.true fr en ~/corpus/news-commentary-v7.fr-en.clean 1 80

2.  Language Model Training(训练语言模型)P180

SRILM

Execute: Ngram-count

Command: ngram-count –order n -text corpus -lm corpus.lm

Example:

~/srilm/lm/bin/i686/ngram-count -order 5 -unk -interpolate –wbdiscount \

-text ~/alignment-corpus/corpus.lowercased.en -lm corpus.lm

3. Training the translation system(训练) P161 P263

Executed:train-model.perl

Command: train-model.perl --root-dir . --f de --e en --corpus corpus/euro >& LOG

Example:

/home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/scripts/training/train-model.perl -root-dir train -corpus /home/robert/MTTOOLS/workspace/ZangHan/train/2013-xbmu+xmu-politics.clean -f ti -e ch -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:5:/home/robert/MTTOOLS/workspace/ZangHan/LM/FBIS-ch.lm -external-bin-dir /home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/tools > training.out

4. Tuning(调参)P192

Excuted:mert-moses.pl

Command: mert-moses.pl

Example:

/home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/scripts/training/mert-moses.pl /home/robert/MTTOOLS/workspace/ZangHan/dev/ti-src /home/robert/MTTOOLS/workspace/ZangHan/dev/ref /home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/bin/moses /home/robert/MTTOOLS/workspace/ZangHan/working/train/model/moses.ini --mertdir /home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/bin/ > mert.out

5. Testing(解码)

(1)     Filter(根据测试集过滤短语表)P78

Excuted:filter-model-given-input.pl

Command: filter-model-given-input.pl filter-dir config input-file -Binarizer binarizer

Example:

/home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/scripts/training/filter-model-given-input.pl filtered-test mert-work/moses.ini /home/robert/MTTOOLS/workspace/ZangHan/dev/ti-src -Binarizer /home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/bin/processPhraseTable

(2)     Decoder(解码)P262

Excuted:moses

Command: moses –f moses.ini < inputfile > outputfile 2> log.out

Example:

/home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/bin/moses -f filtered-test/moses.ini < /home/robert/MTTOOLS/workspace/ZangHan/dev/ti-src > /home/robert/MTTOOLS/workspace/ZangHan/working/test.translated.ch 2> /home/robert/MTTOOLS/workspace/ZangHan/working/test.out

(3)     BleuBleu评测)P79

Excuted: multi-bleu.perl

Command: multi-bleu.pl [-lc] reference < mt-output

Example:

/home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/scripts/generic/multi-bleu.perl -lc /home/robert/MTTOOLS/workspace/ZangHan/dev/ref < /home/robert/MTTOOLS/workspace/ZangHan/working/test.translated.ch

 

 

更详细具体的参数,见manual-moses.pdf, http://www.statmt.org/moses/

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有