TensorFlow中tf.train.Saver无法restore数据_马小李

http://blog.sina.com.cn/u/1822488043

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

TensorFlow中tf.train.Saver无法restore数据

(2017-09-19 19:48:21)

标签：

tensorflow

tf.train.saver

restore

拒绝访问

分类：深度学习

使用TensorFlow创建一个tf.train.Saver的对象，想将保存在路径celebA_64_96_96下的模型通过restore函数载入，载入的代码如下:

with tf.Session() as sess:

saver.restore(sess, "D:/celebA_64_96_96")

但是运行的时候报错如下：

2017-09-19 19:41:52.328625: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\util\tensor_slice_reader.cc:95] Could not open D:\celebA_64_96_96: Unknown: NewRandomAccessFile failed to Create/Open: D:\celebA_64_96_96-2 : 拒绝访问。

2017-09-19 19:41:52.338641: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Data loss: Unable to open table file D:\celebA_64_96_96: Unknown: NewRandomAccessFile failed to Create/Open: D:\celebA_64_96_96-2 : 拒绝访问。

2017-09-19 19:41:52.346355: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Data loss: Unable to open table file D:\celebA_64_96_96: Unknown: NewRandomAccessFile failed to Create/Open: D:\celebA_64_96_96 : 拒绝访问。

Traceback (most recent call last):

File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1139, in _do_call

return fn(*args)

File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1121, in _run_fn

status, run_metadata)

File "C:\Python35\lib\contextlib.py", line 66, in __exit__

next(self.gen)

File "C:\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status

pywrap_tensorflow.TF_GetCode(status))

tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file D:\celebA_64_96_96-2: Unknown: NewRandomAccessFile failed to Create/Open: D:\celebA_64_96_96-2 : \udcbe\u073e\udcf8\udcb7\udcc3\udcce\u02a1\udca3

; Input/output error

[[Node: save_1/RestoreV2_47 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_47/tensor_names, save_1/RestoreV2_47/shape_and_slices)]]

[[Node: save_1/RestoreV2_20/_3 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_102_save_1/RestoreV2_20", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "test_share.py", line 27, in

saver.restore(sess, "D:\\celebA_64_96_96")

File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1548, in restore

{self.saver_def.filename_tensor_name: save_path})

File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 789, in run

run_metadata_ptr)

File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 997, in _run

feed_dict_string, options, run_metadata)

File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1132, in _do_run

target_list, options, run_metadata)

File "C:\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _do_call

raise type(e)(node_def, op, message)

tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file D:\celebA_64_96_96: Unknown: NewRandomAccessFile failed to Create/Open: D:\celebA_64_96_96: \udcbe\u073e\udcf8\udcb7\udcc3\udcce\u02a1\udca3

; Input/output error

Caused by op 'save_1/RestoreV2_47', defined at:

File "test_share.py", line 26, in

saver = tf.train.Saver()

File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1139, in __init__

self.build()

File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1170, in build

restore_sequentially=self._restore_sequentially)

File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 691, in build

restore_sequentially, reshape)

File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 407, in _AddRestoreOps

tensors = self.restore_op(filename_tensor, saveable, preferred_shard)

File "C:\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 247, in restore_op

[spec.tensor.dtype])[0])

File "C:\Python35\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 640, in restore_v2

dtypes=dtypes, name=name)

File "C:\Python35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op

op_def=op_def)

File "C:\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 2506, in create_op

original_op=self._default_original_op, op_def=op_def)

File "C:\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1269, in __init__

self._traceback = _extract_stack()

DataLossError (see above for traceback): Unable to open table file D:\celebA_64_96_96: Unknown: NewRandomAccessFile failed to Create/Open: D:\celebA_64_96_96: \udcbe\u073e\udcf8\udcb7\udcc3\udcce\u02a1\udca3

; Input/output error

刚开始以为是文件权限不够，但是在给了大量的文件权限后，依然报错如上所述，经过搜索后，找到解决方案：

在tf.train.Saver的API中对于restore函数的第二个参数save_path的描述如下：

The save_path argument is typically a value previously returned from a save() call, or a call to latest_checkpoint().

翻译过来就是：save_path参数一般是之前从save()调用返回的数值，或者调用latest_checkpoint()的参数。这里描述的比较清楚的是，save_path是save函数调用的返回值，那么检查save函数的返回值，官方API描述如下：

A string: path at which the variables were saved. If the saver is sharded, this string ends with: '-?????-of-nnnnn' where 'nnnnn' is the number of shards created. If the saver is empty, returns None.

它是一个字符串:变量保存的路径。如果saver被共享，那么这个字符串以'-?????-of-nnnnn'结尾，其中'nnnnn'是创建的分片的数量。如果saver是空的，那么返回None。

所以从这里看出save_path直接到模型名的，因此调用restore的save_path，是路径加上模型名，这个模型名的字符串，在save_path中，打开check_point文件，可见model_checkpoint_path字段，其后面就是模型的名称，我这里是DCGAN.model-9495，因此上述报错的代码修改如下：

with tf.Session() as sess:

saver.restore(sess, "D:/celebA_64_96_96/DCGAN.model-9495")

如果我们要更加方便的restore数据该怎么办呢?其实TensorFlow提供了在指定文件夹路径下查询对应的checkpoint的文件的API，其使用如下，假设checkpoint文件路径为my_model_path

saver = tf.train.Saver()

ckpt = tf.train.get_checkpoint_state(my_model_path)

if ckpt and ckpt.model_checkpoint_path:

saver.restore(sess, ckpt.model_checkpoint_path)

对于tf.train.get_checkpoint_state的API官方定义描述如下：

get_checkpoint_state(

checkpoint_dir,

latest_filename=None

)

其源码位于tensorflow/python/training/saver.py.

从checkpoint文件中返回checkpoint状态原型

如果checkpoint文件包含一个合法的checkpoint状态原型，那么就返回它。

参数:

checkpoint_dir: checkpoint的目录

latest_filename: （可选）checkpoint文件的名称，默认为checkpoint

如果状态可用，那么返回一个checkpoint状态，否则返回None

抛出异常:

ValueError: 如果checkpoint访问不存在model_checkpoint_path集合

参考资料：

https://stackoverflow.com/questions/43644893/windows-tensorflow-could-not-restore-checkpoint-access-is-denied

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：TensorFlow中的Nan值的陷阱

后一篇：解决kindle阅读pdf乱码问题

新浪BLOG意见反馈留言板　欢迎批评指正