Table of Contents
现象
猜测原因
重现
分析日志
继续搜索
结果
Home Database Mysql Tutorial hadoop重启Namenode时,appTokens报FileNotFoundException

hadoop重启Namenode时,appTokens报FileNotFoundException

Jun 07, 2016 pm 04:37 PM
f hadoop Restart

现象 报错如下 Application application_1405852606905_0014 failed 3 times due to AM Container for appattempt_1405852606905_0014_000003 exited with exitCode: -1000 due to: RemoteTrace: java.io.FileNotFoundException: File does not exist: hdfs:

现象

报错如下

Application application_1405852606905_0014 failed 3 times due to AM Container for appattempt_1405852606905_0014_000003 exited with exitCode: -1000 due to: RemoteTrace: java.io.FileNotFoundException: File does not exist: hdfs://mycluster:8020/user/kpi/.staging/job_1405852606905_0014/appTokens at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:809)
Copy after login

同时注意到是因为每次重启nodemanager才发生。
首先用关键词“apptokens FileNotFoundException”在google和issue搜索没找到相关的问题。

猜测原因

可能找不到的原因:1.客户端没上传成功 2.上传成功了,但后面不知道给谁删了

重现

既然在网上找不到,尝试在测试环境重现这个问题,运行一个sleep job

cd /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce;
hadoop jar hadoop-mapreduce-client-*-tests.jar sleep -Dmapred.job.queue.name=sleep -m5 -r5 -mt 60000 -rt 30000 -recordt 1000
Copy after login

重启nodemanage后会发现报错。

分析日志

但发现找不到AM的日志,哪里去了?我们的hadoop环境都配置了“日志聚集”(yarn.log-aggregation-enable),失败的任务就把日志删了(可能是bug),尝试关掉后,从crontainer日志找到AM日志。
同时还可以看ResourceManager,NameNode,HDFS审计日志(hdfs-audit.log)
从AM日志可以看到第一次尝试好像是成功的,从HDFS审计日志发现了删除staging的目录

cmd=delete  src="http://fatkun.com/user/kpi/.staging/job_1405852606905_0013
Copy after login
">

到此可以确认目录是被删除了,导致后面的job失败,但谁删了这个目录?

继续搜索

代码很多,需要定位一下那里操作.staging这个目录,确定谁删了这个目录。在issue搜索“staging delete”,看有没有相关的操作代码。 同时阅读代码发现了org.apache.hadoop.mapreduce.v2.app.MRAppMaster.cleanupStagingDir()方法,对照日志,可以确定是这个方法删除了staging目录。

    public synchronized void stop() {
...
//这里判断了是不是AM的最后一次尝试,如果是才清理
        if(isLastAMRetry) {
          cleanupStagingDir();
        } 
...
  }
Copy after login

这个逻辑还算正常, 继续找isLastAMRetry是怎么来的

  public void shutDownJob() {
...
      //We are finishing cleanly so this is the last retry
      isLastAMRetry = true;
      // Stop all services
      // This will also send the final report to the ResourceManager
      LOG.info("Calling stop for all the services");
      MRAppMaster.this.stop();
...
  }
Copy after login

发现调用了shutDownJob,会把isLastAMRetry设置为true,调用shutDownJob是因为接收到JobFinishEvent事件。
我们多了一些信息,偷懒在issue继续搜索一下,看有没有人解决了。
这次找到issue了,https://issues.apache.org/jira/browse/MAPREDUCE-5086

阅读patch,发现之前忽略了RM报的一个错误。

org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Application doesn't exist in cache appattempt_1405852606905_0014_000001
Copy after login

结果

重启nodemanager导致RM的appattempt cache数组删除,JobImpl返回了InternalError,AM认为出错了就没必要重试了,直接置isLastRetry=true。
修改方式是加了一个状态,表明这是“RM重启”了(注意这里不是nodemanager重启,有一些关联),还可以继续重试。具体修改阅读patch https://issues.apache.org/jira/browse/MAPREDUCE-5086

最后,由于patch修改的版本和我们用的版本不一致,还得需要用我们使用的版本依照它的思路改一遍。

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to restart Samsung s24Ultra phone? How to restart Samsung s24Ultra phone? Feb 09, 2024 pm 09:54 PM

When using the Samsung S24 Ultra mobile phone, you may occasionally encounter some problems or need to reset the device. In this case, restarting the phone is a common solution. However, it may be confusing if you don't know much about the steps. However, don’t worry, I will show you how to restart your Samsung S24 Ultra phone properly. How to restart the Samsung s24 Ultra 1. Bring up the control menu to shut down: Slide down from the top of the Samsung screen to bring up the shortcut tool menu, click the power icon (a combination of arc and vertical line) to bring up the shutdown and restart selection interface, click Just restart; 2. Use the key combination to shut down: long press the volume-key plus the power key to bring up the shutdown and restart selection menu, click to select shutdown. By pressing and holding

F5 refresh key not working in Windows 11 F5 refresh key not working in Windows 11 Mar 14, 2024 pm 01:01 PM

Is the F5 key not working properly on your Windows 11/10 PC? The F5 key is typically used to refresh the desktop or explorer or reload a web page. However, some of our readers have reported that the F5 key is refreshing their computers and not working properly. How to enable F5 refresh in Windows 11? To refresh your Windows PC, just press the F5 key. On some laptops or desktops, you may need to press the Fn+F5 key combination to complete the refresh operation. Why doesn't F5 refresh work? If pressing the F5 key fails to refresh your computer or you are experiencing issues on Windows 11/10, it may be due to the function keys being locked. Other potential causes include the keyboard or F5 key

How to solve the computer prompt 'reboot and select proper boot device' How to solve the computer prompt 'reboot and select proper boot device' Jan 15, 2024 pm 02:00 PM

Reinstalling the system may not be a foolproof solution, but after reinstalling, I found that when the computer is turned on, it will display white text on a black background, and then give a prompt: rebootandselectproperbootdevice, what is going on? Such a prompt is usually caused by a boot error. In order to help everyone, the editor has brought you a solution. Computer use is becoming more and more popular, and computer failures are becoming more and more common. No, recently some users encountered a black screen when turning on the computer, and prompted Reboot and Select Proper Boot device, and the computer system could not start normally. What's going on? How to solve it? The user is confused. Next, the editor will follow

How to restart nginx How to restart nginx Jul 27, 2023 pm 05:21 PM

How to restart nginx: 1. Restart Nginx on Linux and use systemd to manage the Nginx service. It will restart Nginx and read any new configuration changes. 2. Restart Nginx on Windows. Nginx will be reloaded and any configuration changes will be applied. , without having to completely stop and start the server; 3. Restart Nginx on your Mac, which will restart Nginx and apply any new configuration changes, etc.

Python script to restart computer Python script to restart computer Sep 08, 2023 pm 05:21 PM

Restarting your computer is a common task that we often perform to troubleshoot problems, install updates, or apply system changes. While there are many ways to restart your computer, using a Python script provides automation and convenience. In this article, we will explore how to create a Python script that can restart your computer with a simple execution. We will first discuss the importance of restarting your computer and the benefits it brings. We will then delve into the implementation details of the Python script, explaining the necessary modules and functionality involved. Throughout this article, we will provide detailed explanations and code snippets to ensure clear understanding. Importance of Restarting Your Computer Restarting your computer is a basic troubleshooting step that can

Java Errors: Hadoop Errors, How to Handle and Avoid Java Errors: Hadoop Errors, How to Handle and Avoid Jun 24, 2023 pm 01:06 PM

Java Errors: Hadoop Errors, How to Handle and Avoid When using Hadoop to process big data, you often encounter some Java exception errors, which may affect the execution of tasks and cause data processing to fail. This article will introduce some common Hadoop errors and provide ways to deal with and avoid them. Java.lang.OutOfMemoryErrorOutOfMemoryError is an error caused by insufficient memory of the Java virtual machine. When Hadoop is

What is the correct way to restart a service in Linux? What is the correct way to restart a service in Linux? Mar 15, 2024 am 09:09 AM

What is the correct way to restart a service in Linux? When using a Linux system, we often encounter situations where we need to restart a certain service, but sometimes we may encounter some problems when restarting the service, such as the service not actually stopping or starting. Therefore, it is very important to master the correct way to restart services. In Linux, you can usually use the systemctl command to manage system services. The systemctl command is part of the systemd system manager

Solve the problem of restarting in a loop after entering the password in win10 Solve the problem of restarting in a loop after entering the password in win10 Dec 29, 2023 pm 09:53 PM

When we inadvertently perform some wrong operations, or there are certain errors in the system itself, we may be unable to enter the desktop after entering the password and keep restarting. At this time we can repair it in safe mode. Let’s take a look at the specific methods below. Win10 cannot enter the desktop after entering a password and keeps restarting. Solution 1. First, press and hold "shift" on the keyboard and click the power button in the lower right corner, then choose to restart the computer until the repair interface appears and then release the "shift" key. 2. If there is no power button in the lower right corner, you can also use the power button of the computer host, but you need to restart it three times or more in a row. 3. After the repair interface appears, we click "View advanced repair options". 4. Select "Troubleshoot". 5

See all articles