一段批处理脚本: 将多个不同目录下的pdf文件转成txt文件
(2013-07-13 14:02:23)
标签:
文件脚本目录多个ie |
分类: 小问题小技巧 |
(1) 文件夹组织如下:
root------- src
#-
用于存放待转换pdf文件;
解决了文件名带空格的难题;
(2) 用法:
PS: 其中用到pdf转txt的python工具:
C:\PDFMiner\tools\pdf2txt.py
需事先安装 pdfminer, 免费工具.
脚本如下:
@echo off
rem
********************************************************************************
rem * step 1: set pdf path
--- pdf.path
*
rem *
set destenation path ---
dst.path // save the text file
*
rem *
set toolpath path
--- tool.path
// tools path
*
rem *
set file exetend
--- extend
// search file
*
rem *
delete old file "tempdir.txt"
all searched file (*.extend)
*
rem *
*
rem * step 2: search all files which extend
is %extend% in %src.path%
*
rem *
and its sub-folders by using
command "for" with options
*
rem *
"/r", then save the filename
to text file "tempdir.txt"
*
rem *
with full path
*
rem *
*
rem * step 3: call python program to
translate pdf files to text file
*
rem *
and save them to the folder
(%dst.path%) with same file
*
rem *
name.
*
rem * Created by Juking at 2013-7-13,
*
rem
********************************************************************************
rem Step 1: set the globle variables.
set pdf.path=.\PDF
set dst.path=.\Result
set tool.path=C:\PDFMiner\tools
set extend=pdf
del tempdir.txt
rem Step 2: search files ,save these file
name in tempdir.txt ,
rem use notation ";" for delims=; in step
3.
for /r "%pdf.path%" %%a in (*.%extend%) do (
rem Step 3: pdf to text, save to
%dst.path%
for /f "delims=;" %%i in (.\tempdir.txt) do
(
要安装 PDFMiner