【异常描述】 这个程序跑了n天了,没有问题,从1.4号开始出现各种失败,这是其中一种:
2015-01-15 09:43:12,250 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2015-01-15 09:43:12,252 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2015-01-15 09:43:12,412 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-01-15 09:43:12,471 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-01-15 09:43:12,471 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2015-01-15 09:43:12,482 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2015-01-15 09:43:12,482 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1416472102935_157304, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@540984b)
2015-01-15 09:43:12,580 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2015-01-15 09:43:12,932 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2015-01-15 09:43:12,933 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2015-01-15 09:43:12,997 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /data1/usercache/hdfs/appcache/application_1416472102935_157304,/data2/usercache/hdfs/appcache/application_1416472102935_157304,/data3/usercache/hdfs/appcache/application_1416472102935_157304,/data4/usercache/hdfs/appcache/application_1416472102935_157304,/data5/usercache/hdfs/appcache/application_1416472102935_157304,/data6/usercache/hdfs/appcache/application_1416472102935_157304,/data7/usercache/hdfs/appcache/application_1416472102935_157304,/data8/usercache/hdfs/appcache/application_1416472102935_157304,/data9/usercache/hdfs/appcache/application_1416472102935_157304,/data10/usercache/hdfs/appcache/application_1416472102935_157304,/data11/usercache/hdfs/appcache/application_1416472102935_157304
2015-01-15 09:43:13,090 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2015-01-15 09:43:13,091 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2015-01-15 09:43:13,161 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2015-01-15 09:43:13,161 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
2015-01-15 09:43:13,162 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
2015-01-15 09:43:13,162 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
2015-01-15 09:43:13,163 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
2015-01-15 09:43:13,163 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: job.local.dir is deprecated. Instead, use mapreduce.job.local.dir
2015-01-15 09:43:13,164 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
2015-01-15 09:43:13,286 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2015-01-15 09:43:13,698 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2015-01-15 09:43:13,965 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://hadoop-tfchinaso/user/changzhijun/workspace/ods/access/stage/000000_0:33565387+11398775
2015-01-15 09:43:13,993 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: Cannot seek after EOF
2015-01-15 09:43:13,994 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Cannot seek after EOF
at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1230)
at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:50)
at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:123)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:167)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
2015-01-15 09:43:14,004 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
2015-01-15 09:43:14,010 WARN [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete hdfs://hadoop-tfchinaso/user/changzhijun/workspace/ods/access/ods/_temporary/1/_temporary/attempt_1416472102935_157304_m_000002_0
2015-01-15 09:43:14,115 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2015-01-15 09:43:14,116 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2015-01-15 09:43:14,116 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
【解决方法】
问题分析:通过提示基本可以断定是读文件失败,而读文件失败可能是文件本身问题,但重试读取的话,会去读不同的备份,所以可能性很低,再者就是可能是网络问题,在读取过程中网络问题导致数据读取失败。
最后解决方法:多重复几次就通过了!!!
最终的解决方法:解决集群的稳定性。
下面是成功和失败的截图:
http://s7/mw690/001bBJ2Ggy6Pc40K6uG16&690
http://s7/mw690/001bBJ2Ggy6Pc410DSmd6&690
http://s12/mw690/001bBJ2Ggy6Pc41k9l1ab&690
加载中,请稍候......