TensorFlow中tf.train.Saver无法restore数据
(2017-09-19 19:48:21)
标签:
tensorflowtf.train.saverrestore拒绝访问 |
分类: 深度学习 |
使用TensorFlow创建一个tf.train.Saver的对象,想将保存在路径celebA_64_96_96下的模型通过restore函数载入,载入的代码如下:
with tf.Session() as sess:
saver.restore(sess,
"D:/celebA_64_96_96")
File
"C:\Python35\lib\site-packages\tensorflow\python\client\session.py",
line 1139, in _do_call
return fn(*args)
File
"C:\Python35\lib\site-packages\tensorflow\python\client\session.py",
line 1121, in _run_fn
status,
run_metadata)
File "C:\Python35\lib\contextlib.py", line
66, in __exit__
next(self.gen)
File
"C:\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py",
line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
[[Node: save_1/RestoreV2_47 =
RestoreV2[dtypes=[DT_FLOAT],
_device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0,
save_1/RestoreV2_47/tensor_names,
save_1/RestoreV2_47/shape_and_slices)]]
[[Node: save_1/RestoreV2_20/_3
= _Recv[client_terminated=false,
recv_device="/job:localhost/replica:0/task:0/gpu:0",
send_device="/job:localhost/replica:0/task:0/cpu:0",
send_device_incarnation=1,
tensor_name="edge_102_save_1/RestoreV2_20", tensor_type=DT_FLOAT,
_device="/job:localhost/replica:0/task:0/gpu:0"]()]]
File "test_share.py", line 27, in
saver.restore(sess,
"D:\\celebA_64_96_96")
File
"C:\Python35\lib\site-packages\tensorflow\python\training\saver.py",
line 1548, in restore
{self.saver_def.filename_tensor_name: save_path})
File
"C:\Python35\lib\site-packages\tensorflow\python\client\session.py",
line 789, in run
run_metadata_ptr)
File
"C:\Python35\lib\site-packages\tensorflow\python\client\session.py",
line 997, in _run
feed_dict_string,
options, run_metadata)
File
"C:\Python35\lib\site-packages\tensorflow\python\client\session.py",
line 1132, in _do_run
target_list, options,
run_metadata)
File
"C:\Python35\lib\site-packages\tensorflow\python\client\session.py",
line 1152, in _do_call
raise type(e)(node_def,
op, message)
[[Node: save_1/RestoreV2_47 =
RestoreV2[dtypes=[DT_FLOAT],
_device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0,
save_1/RestoreV2_47/tensor_names,
save_1/RestoreV2_47/shape_and_slices)]]
[[Node: save_1/RestoreV2_20/_3
= _Recv[client_terminated=false,
recv_device="/job:localhost/replica:0/task:0/gpu:0",
send_device="/job:localhost/replica:0/task:0/cpu:0",
send_device_incarnation=1,
tensor_name="edge_102_save_1/RestoreV2_20", tensor_type=DT_FLOAT,
_device="/job:localhost/replica:0/task:0/gpu:0"]()]]
File "test_share.py", line 26, in
saver =
tf.train.Saver()
File
"C:\Python35\lib\site-packages\tensorflow\python\training\saver.py",
line 1139, in __init__
self.build()
File
"C:\Python35\lib\site-packages\tensorflow\python\training\saver.py",
line 1170, in build
restore_sequentially=self._restore_sequentially)
File
"C:\Python35\lib\site-packages\tensorflow\python\training\saver.py",
line 691, in build
restore_sequentially,
reshape)
File
"C:\Python35\lib\site-packages\tensorflow\python\training\saver.py",
line 407, in _AddRestoreOps
tensors =
self.restore_op(filename_tensor, saveable, preferred_shard)
File
"C:\Python35\lib\site-packages\tensorflow\python\training\saver.py",
line 247, in restore_op
[spec.tensor.dtype])[0])
File
"C:\Python35\lib\site-packages\tensorflow\python\ops\gen_io_ops.py",
line 640, in restore_v2
dtypes=dtypes,
name=name)
File
"C:\Python35\lib\site-packages\tensorflow\python\framework\op_def_library.py",
line 767, in apply_op
op_def=op_def)
File
"C:\Python35\lib\site-packages\tensorflow\python\framework\ops.py",
line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File
"C:\Python35\lib\site-packages\tensorflow\python\framework\ops.py",
line 1269, in __init__
self._traceback =
_extract_stack()
[[Node: save_1/RestoreV2_47 =
RestoreV2[dtypes=[DT_FLOAT],
_device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0,
save_1/RestoreV2_47/tensor_names,
save_1/RestoreV2_47/shape_and_slices)]]
[[Node: save_1/RestoreV2_20/_3
= _Recv[client_terminated=false,
recv_device="/job:localhost/replica:0/task:0/gpu:0",
send_device="/job:localhost/replica:0/task:0/cpu:0",
send_device_incarnation=1,
tensor_name="edge_102_save_1/RestoreV2_20", tensor_type=DT_FLOAT,
_device="/job:localhost/replica:0/task:0/gpu:0"]()]]
但是运行的时候报错如下:
2017-09-19 19:41:52.328625: W
c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\util\tensor_slice_reader.cc:95]
Could not open D:\celebA_64_96_96: Unknown: NewRandomAccessFile
failed to Create/Open: D:\celebA_64_96_96-2 : 拒绝访问。
2017-09-19 19:41:52.338641: W
c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158]
Data loss: Unable to open table file D:\celebA_64_96_96: Unknown:
NewRandomAccessFile failed to Create/Open: D:\celebA_64_96_96-2 :
拒绝访问。
2017-09-19 19:41:52.346355: W
c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158]
Data loss: Unable to open table file D:\celebA_64_96_96: Unknown:
NewRandomAccessFile failed to Create/Open: D:\celebA_64_96_96 :
拒绝访问。
Traceback (most recent call last):
tensorflow.python.framework.errors_impl.DataLossError: Unable
to open table file D:\celebA_64_96_96-2: Unknown:
NewRandomAccessFile failed to Create/Open: D:\celebA_64_96_96-2 :
\udcbe\u073e\udcf8\udcb7\udcc3\udcce\u02a1\udca3
; Input/output error
During handling of the above exception, another exception
occurred:
Traceback (most recent call last):
tensorflow.python.framework.errors_impl.DataLossError: Unable
to open table file D:\celebA_64_96_96: Unknown: NewRandomAccessFile
failed to Create/Open: D:\celebA_64_96_96:
\udcbe\u073e\udcf8\udcb7\udcc3\udcce\u02a1\udca3
; Input/output error
Caused by op 'save_1/RestoreV2_47', defined at:
DataLossError (see above for traceback): Unable to open table
file D:\celebA_64_96_96: Unknown: NewRandomAccessFile failed to
Create/Open: D:\celebA_64_96_96:
\udcbe\u073e\udcf8\udcb7\udcc3\udcce\u02a1\udca3
; Input/output error
刚开始以为是文件权限不够,但是在给了大量的文件权限后,依然报错如上所述,经过搜索后,找到解决方案:
在tf.train.Saver的API中对于restore函数的第二个参数save_path的描述如下:
The argument
is typically a value previously returned from
a call,
or a call to
save_path
save()
latest_checkpoint()
.翻译过来就是:save_path参数一般是之前从save()调用返回的数值,或者调用latest_checkpoint()的参数。这里描述的比较清楚的是,save_path是save函数调用的返回值,那么检查save函数的返回值,官方API描述如下:
A string: path at which the variables were saved. If the saver is
sharded, this string ends with: '-?????-of-nnnnn' where 'nnnnn' is
the number of shards created. If the saver is empty, returns
None.
它是一个字符串:变量保存的路径。如果saver被共享,那么这个字符串以'-?????-of-nnnnn'结尾,其中'nnnnn'是创建的分片的数量。如果saver是空的,那么返回None。
所以从这里看出save_path直接到模型名的,因此调用restore的save_path,是路径加上模型名,这个模型名的字符串,在save_path中,打开check_point文件,可见model_checkpoint_path字段,其后面就是模型的名称,我这里是DCGAN.model-9495,因此上述报错的代码修改如下:
with tf.Session() as sess:
saver.restore(sess,
"D:/celebA_64_96_96/DCGAN.model-9495")
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
checkpoint_dir,
latest_filename=None
如果我们要更加方便的restore数据该怎么办呢?其实TensorFlow提供了在指定文件夹路径下查询对应的checkpoint的文件的API,其使用如下,假设checkpoint文件路径为my_model_path
saver = tf.train.Saver()
ckpt =
tf.train.get_checkpoint_state(my_model_path)
对于tf.train.get_checkpoint_state的API官方定义描述如下:
get_checkpoint_state(
)
其源码位于tensorflow/python/training/saver.py.
从checkpoint文件中返回checkpoint状态原型
如果checkpoint文件包含一个合法的checkpoint状态原型,那么就返回它。
参数:
checkpoint_dir: checkpoint的目录
latest_filename:
(可选)checkpoint文件的名称,默认为checkpoint
返回:
如果状态可用,那么返回一个checkpoint状态,否则返回None
抛出异常:
ValueError:
如果checkpoint访问不存在model_checkpoint_path集合
参考资料: