Moses:Build Baseline System (phrase-based translation system)
(2013-04-09 23:09:50)
标签:
mtmosesit |
分类: 机器翻译 |
Moses:Build Baseline System (phrase-based translation system)
1.
(1)
executed: tokenizer.perl
command: Usage ./tokenizer.perl (-l [en|de|...]) (-threads 4) < textfile > tokenizedfile\n
Example:
~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l en <
~/corpus/training/news-commentary-v7.fr-en.en
>
(2)
executed: train-truecaser.perl
command: train-truecaser.perl --model MODEL --corpus CASED
Example:
~/mosesdecoder/scripts/recaser/train-truecaser.perl --model ~/corpus/truecase-model.en --corpus ~/corpus/news-commentary-v7.fr-en.tok.en
executed: truecase.perl
Command: truecase.perl --model MODEL < IN > OUT
Example:
~/mosesdecoder/scripts/recaser/truecase.perl --model ~/corpus/truecase-model.en \
< ~/corpus/news-commentary-v7.fr-en.tok.en > ~/corpus/news-commentary-v7.fr-en.true.en
(3)
executed : clean-corpus-n.perl
command: clean-corpus-n.perl corpus l1 l2 clean-corpus min max [lines retained file]
example:
~/mosesdecoder/scripts/training/clean-corpus-n.perl ~/corpus/news-commentary-v7.fr-en.true fr en ~/corpus/news-commentary-v7.fr-en.clean 1 80
2.
SRILM
Execute: Ngram-count
Command: ngram-count –order n -text corpus -lm corpus.lm
Example:
~/srilm/lm/bin/i686/ngram-count -order 5 -unk -interpolate –wbdiscount \
-text ~/alignment-corpus/corpus.lowercased.en -lm corpus.lm
3.
Executed:train-model.perl
Command: train-model.perl --root-dir . --f de --e en --corpus corpus/euro >& LOG
Example:
/home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/scripts/training/train-model.perl -root-dir train -corpus /home/robert/MTTOOLS/workspace/ZangHan/train/2013-xbmu+xmu-politics.clean -f ti -e ch -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:5:/home/robert/MTTOOLS/workspace/ZangHan/LM/FBIS-ch.lm -external-bin-dir /home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/tools > training.out
4.
Excuted:mert-moses.pl
Command: mert-moses.pl
Example:
/home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/scripts/training/mert-moses.pl /home/robert/MTTOOLS/workspace/ZangHan/dev/ti-src /home/robert/MTTOOLS/workspace/ZangHan/dev/ref /home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/bin/moses /home/robert/MTTOOLS/workspace/ZangHan/working/train/model/moses.ini --mertdir /home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/bin/ > mert.out
5.
(1)
Excuted:filter-model-given-input.pl
Command: filter-model-given-input.pl filter-dir config input-file -Binarizer binarizer
Example:
/home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/scripts/training/filter-model-given-input.pl filtered-test mert-work/moses.ini /home/robert/MTTOOLS/workspace/ZangHan/dev/ti-src -Binarizer /home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/bin/processPhraseTable
(2)
Command: moses –f moses.ini < inputfile > outputfile 2> log.out
Example:
/home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/bin/moses -f filtered-test/moses.ini < /home/robert/MTTOOLS/workspace/ZangHan/dev/ti-src > /home/robert/MTTOOLS/workspace/ZangHan/working/test.translated.ch 2> /home/robert/MTTOOLS/workspace/ZangHan/working/test.out
(3)
Excuted: multi-bleu.perl
Command: multi-bleu.pl [-lc] reference < mt-output
Example:
/home/robert/MTTOOLS/mosesdecoder-RELEASE-1.0/scripts/generic/multi-bleu.perl -lc /home/robert/MTTOOLS/workspace/ZangHan/dev/ref < /home/robert/MTTOOLS/workspace/ZangHan/working/test.translated.ch
更详细具体的参数,见manual-moses.pdf,