[原创] Hadoop 2.x的DistributedCache无法工作的问题
转载请注明出处: http://www.codelast.com/ 现象:和 这个 帖子描述的一样,简单说来就是,在Hadoop 2.x上,用新的DistributedCache的API,在mapper中会获取不到这个cache文件。 下面就详细地描述一下新旧API的用法区别以及解决办法。 『1』 旧API 将HDFS文
转载请注明出处:http://www.codelast.com/
现象:和这个帖子描述的一样,简单说来就是,在Hadoop 2.x上,用新的DistributedCache的API,在mapper中会获取不到这个cache文件。
下面就详细地描述一下新旧API的用法区别以及解决办法。
『1』旧API
将HDFS文件添加到distributed cache中:
Configuration conf = job.getConfiguration(); DistributedCache.addCacheFile(new URI(inputFileOnHDFS), conf); // add file to distributed cache
其中,inputFileOnHDFS是一个HDFS文件的路径,也就是你要用作distribute cache的文件的路径,例如 /user/codelast/123.txt
在mapper的setup()方法中:
Configuration conf = context.getConfiguration(); Path[] localCacheFiles = DistributedCache.getLocalCacheFiles(conf); readCacheFile(localCacheFiles[0]);
其中,readCacheFile()是我们自己的读取cache文件的方法,可能是这样做的(仅举个例子):
private static void readCacheFile(Path cacheFilePath) throws IOException { BufferedReader reader = new BufferedReader(new FileReader(cacheFilePath.toUri().getPath())); String line; while ((line = reader.readLine()) != null) { //TODO: your code here } reader.close(); }
文章来源:http://www.codelast.com/
『2』新API
上面的代码中,addCacheFile() 方法和 getLocalCacheFiles() 都已经被Hadoop 2.x标记为 @Deprecated 了。
因此,有一套新的API来实现同样的功能,这个链接里有示例,我在这里再详细地写一下。
将HDFS文件添加到distributed cache中:
job.addCacheFile(new Path(inputFileOnHDFS).toUri());
在mapper的setup()方法中:
Configuration conf = context.getConfiguration(); URI[] localCacheFiles = context.getCacheFiles(); readCacheFile(localCacheFiles[0]);
其中,readCacheFile()是我们自己的读取cache文件的方法,可能是这样做的(仅举个例子):
private static void readCacheFile(URI cacheFileURI) throws IOException { BufferedReader reader = new BufferedReader(new FileReader(cacheFileURI.getPath())); String line; while ((line = reader.readLine()) != null) { //TODO: your code here } reader.close(); }
但是就像文章开头的那个链接里所描述的问题一样,你可能会发现 context.getCacheFiles() 总是返回null,也就是你无法读到cache文件。
这个问题有可能是这个bug造成的,你可以对比一下你的Hadoop版本。
文章来源:http://www.codelast.com/
『3』解决办法
(1)打patch
(2)升级Hadoop版本
(3)使用旧的DistributedCache API,经测试OK
文章来源:http://www.codelast.com/
原文地址:[原创] Hadoop 2.x的DistributedCache无法工作的问题, 感谢原作者分享。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Java Errors: Hadoop Errors, How to Handle and Avoid When using Hadoop to process big data, you often encounter some Java exception errors, which may affect the execution of tasks and cause data processing to fail. This article will introduce some common Hadoop errors and provide ways to deal with and avoid them. Java.lang.OutOfMemoryErrorOutOfMemoryError is an error caused by insufficient memory of the Java virtual machine. When Hadoop is

With the advent of the big data era, data processing and storage have become more and more important, and how to efficiently manage and analyze large amounts of data has become a challenge for enterprises. Hadoop and HBase, two projects of the Apache Foundation, provide a solution for big data storage and analysis. This article will introduce how to use Hadoop and HBase in Beego for big data storage and query. 1. Introduction to Hadoop and HBase Hadoop is an open source distributed storage and computing system that can

As the amount of data continues to increase, traditional data processing methods can no longer handle the challenges brought by the big data era. Hadoop is an open source distributed computing framework that solves the performance bottleneck problem caused by single-node servers in big data processing through distributed storage and processing of large amounts of data. PHP is a scripting language that is widely used in web development and has the advantages of rapid development and easy maintenance. This article will introduce how to use PHP and Hadoop for big data processing. What is HadoopHadoop is

When it comes to gaming laptops from ASUS, the first thing that comes to mind is Republic of Gamers. But in addition to the high-end gaming laptops in the Republic of Gamers, ASUS also has mainstream gaming laptops in the Flying Fortress series. Let’s take a look at them together. How about ASUS Flying Fortress 7, which is the most popular among gamers. ASUS Flying Fortress 7 ASUS' Flying Fortress series notebooks are positioned to play large-scale games easily. They focus on sturdiness and durability. They have always been a popular choice among gamers and students. Basic configuration First, let’s take a look at some of the core configurations of this Flying Fortress 7. From the configuration table, the most noteworthy is AMD Ryzen73750H+NVIDIA GeForceGTX1660Ti

Java big data technology stack: Understand the application of Java in the field of big data, such as Hadoop, Spark, Kafka, etc. As the amount of data continues to increase, big data technology has become a hot topic in today's Internet era. In the field of big data, we often hear the names of Hadoop, Spark, Kafka and other technologies. These technologies play a vital role, and Java, as a widely used programming language, also plays a huge role in the field of big data. This article will focus on the application of Java in large

Is it considered original after modifying md5? In the Internet era, creating original content has become an important value and resource. However, what follows is questioning of originality and infringement. In order to prevent piracy and plagiarism, many people try to use different methods to protect their original works. One of the common methods is to use the MD5 algorithm to modify the work to achieve the effect of "algorithm protection". MD5 (MessageDigestAlgorithm5) is a commonly used message digest algorithm, which can

1: Install JDK1. Execute the following command to download the JDK1.8 installation package. wget--no-check-certificatehttps://repo.huaweicloud.com/java/jdk/8u151-b12/jdk-8u151-linux-x64.tar.gz2. Execute the following command to decompress the downloaded JDK1.8 installation package. tar-zxvfjdk-8u151-linux-x64.tar.gz3. Move and rename the JDK package. mvjdk1.8.0_151//usr/java84. Configure Java environment variables. echo'

As the amount of data continues to increase, large-scale data processing has become a problem that enterprises must face and solve. Traditional relational databases can no longer meet this demand. For the storage and analysis of large-scale data, distributed computing platforms such as Hadoop, Spark, and Flink have become the best choices. In the selection process of data processing tools, PHP is becoming more and more popular among developers as a language that is easy to develop and maintain. In this article, we will explore how to leverage PHP for large-scale data processing and how
