hadoop 配置机架感知-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

hadoop 配置机架感知

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:31 PM

hadoop Perception Configuration

周海汉?2013.7.24 http://abloz.com 假如设备链接层次分3层，第一层交换机d1下面连多个交换机rk1,rk2,rk3,rk4,. 每个交换机对应一个机架。 d1(rk1(hs11,hs12,),rk2(hs21,hs22,), rk3(hs31,hs32,),rk4(hs41,hs42,),) 可以用程序或脚本完成由host到设备的映射

周海汉?2013.7.24

http://abloz.com

假如设备链接层次分3层，第一层交换机d1下面连多个交换机rk1,rk2,rk3,rk4,…. 每个交换机对应一个机架。

d1(rk1(hs11,hs12,…),rk2(hs21,hs22,…), rk3(hs31,hs32,…),rk4(hs41,hs42,…),…)

可以用程序或脚本完成由host到设备的映射。比如，用python，生成一个topology.py：

然后在core-site.xml中配置

topology.script.file.name
/home/hadoop/hadoop-1.1.2/conf/topology.py
The script name that should be invoked to resolve DNS names to
NetworkTopology names. Example: the script would take host.foo.bar as an
argument, and return /rack1 as the output.

python机架脚本：

[hadoop@hs11 conf]$ cat topology.py
#!/usr/bin/env python

”’
This script used by hadoop to determine network/rack topology. It
should be specified in hadoop-site.xml via topology.script.file.name
Property.
topology.script.file.name
/home/hadoop/hadoop-1.1.2/conf/topology.py

To generate dict:
for i in range(xx):
#print “\”hs%d\”:\”/rk%d/hs%d\”,”%(i,(i-1)/10,i)

print “\”hs%d\”:\”/rk%d\”,”%(i,(i-1)/10)

Andy 2013.7.23
”’

import sys
from string import join

DEFAULT_RACK = ‘/rk0′;

RACK_MAP = {
“hs11″:”/rk1″,
“hs12″:”/rk1″,
“hs13″:”/rk1″,
“hs14″:”/rk1″,
“hs15″:”/rk1″,
“hs16″:”/rk1″,
“hs17″:”/rk1″,
“hs18″:”/rk1″,
“hs19″:”/rk1″,
“hs20″:”/rk1″,
“hs21″:”/rk2″,
“hs22″:”/rk2″,
“hs23″:”/rk2″,
“hs24″:”/rk2″,
“hs25″:”/rk2″,
“hs26″:”/rk2″,
“hs27″:”/rk2″,
“hs28″:”/rk2″,
“hs29″:”/rk2″,
“hs30″:”/rk2″,
“hs31″:”/rk3″,
“hs32″:”/rk3″,
“hs33″:”/rk3″,
“hs34″:”/rk3″,
“hs35″:”/rk3″,
“hs36″:”/rk3″,
“hs37″:”/rk3″,
“hs38″:”/rk3″,
“hs39″:”/rk3″,
“hs40″:”/rk3″,
“hs41″:”/rk4″,
“hs42″:”/rk4″,
“hs43″:”/rk4″,
“hs44″:”/rk4″,
“hs45″:”/rk4″,
“hs46″:”/rk4″,

…

“10.10.20.11″:”/rk1″,
“10.10.20.12″:”/rk1″,
“10.10.20.13″:”/rk1″,
“10.10.20.14″:”/rk1″,
“10.10.20.15″:”/rk1″,
“10.10.20.16″:”/rk1″,
“10.10.20.17″:”/rk1″,
“10.10.20.18″:”/rk1″,
“10.10.20.19″:”/rk1″,
“10.10.20.20″:”/rk1″,
“10.10.20.21″:”/rk2″,
“10.10.20.22″:”/rk2″,
“10.10.20.23″:”/rk2″,
“10.10.20.24″:”/rk2″,
“10.10.20.25″:”/rk2″,
“10.10.20.26″:”/rk2″,
“10.10.20.27″:”/rk2″,
“10.10.20.28″:”/rk2″,
“10.10.20.29″:”/rk2″,
“10.10.20.30″:”/rk2″,
“10.10.20.31″:”/rk3″,
“10.10.20.32″:”/rk3″,
“10.10.20.33″:”/rk3″,
“10.10.20.34″:”/rk3″,
“10.10.20.35″:”/rk3″,
“10.10.20.36″:”/rk3″,
“10.10.20.37″:”/rk3″,
“10.10.20.38″:”/rk3″,
“10.10.20.39″:”/rk3″,
“10.10.20.40″:”/rk3″,
“10.10.20.41″:”/rk4″,
“10.10.20.42″:”/rk4″,
“10.10.20.43″:”/rk4″,
“10.10.20.44″:”/rk4″,
“10.10.20.45″:”/rk4″,
“10.10.20.46″:”/rk4″,

…
}

if len(sys.argv)==1:
print DEFAULT_RACK
else:
print join([RACK_MAP.get(i, DEFAULT_RACK) for i in sys.argv[1:]],” “)

原来这个程序我返回的是

“hs11″:”/rk1/hs11″,

结果执行mapreduce程序时报如下错误：

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there’s no reduce operator
Starting Job = job_201307241502_0003, Tracking URL = http://hs11:50030/jobdetails.jsp?jobid=job_201307241502_0003
Kill Command = /home/hadoop/hadoop-1.1.2/libexec/../bin/hadoop job? -kill job_201307241502_0003
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2013-07-24 18:38:11,854 Stage-1 map = 100%,? reduce = 100%
Ended Job = job_201307241502_0003 with errors
Error during job, obtaining debugging information…
Job Tracking URL: http://hs11:50030/jobdetails.jsp?jobid=job_201307241502_0003
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0:? HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

通过http://hs11:50030/jobdetails.jsp?jobid=job_201307241502_0002?可以看到：

Job initialization failed:

java.lang.NullPointerException

at?org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2751)
at?org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:578)
at?org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:750)

at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3775)

at?org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:90)
at?java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at?java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

原来系统在配置机架敏感时，并不需要在脚本中返回设备ns或hostname，系统会自动添加。改为上面的topology.py后，系统执行正确。

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks ago By DDD

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1662

CakePHP Tutorial

1419

Laravel Tutorial

1313

PHP Tutorial

1262

C# Tutorial

1235

Related knowledge

How to set up Git configuration in PyCharm Feb 20, 2024 am 09:47 AM

Title: How to correctly configure Git in PyCharm In modern software development, the version control system is a very important tool, and Git, as one of the popular version control systems, provides developers with powerful functions and flexible operations. As a powerful Python integrated development environment, PyCharm comes with support for Git, allowing developers to manage code versions more conveniently. This article will introduce how to correctly configure Git in PyCharm to facilitate better development during the development process.

The working principle and configuration method of GDM in Linux system Mar 01, 2024 pm 06:36 PM

Title: The working principle and configuration method of GDM in Linux systems In Linux operating systems, GDM (GNOMEDisplayManager) is a common display manager used to control graphical user interface (GUI) login and user session management. This article will introduce the working principle and configuration method of GDM, as well as provide specific code examples. 1. Working principle of GDM GDM is the display manager in the GNOME desktop environment. It is responsible for starting the X server and providing the login interface. The user enters

The perfect combination of PyCharm and PyTorch: detailed installation and configuration steps Feb 21, 2024 pm 12:00 PM

PyCharm is a powerful integrated development environment (IDE), and PyTorch is a popular open source framework in the field of deep learning. In the field of machine learning and deep learning, using PyCharm and PyTorch for development can greatly improve development efficiency and code quality. This article will introduce in detail how to install and configure PyTorch in PyCharm, and attach specific code examples to help readers better utilize the powerful functions of these two. Step 1: Install PyCharm and Python

Understand Linux Bashrc: functions, configuration and usage Mar 20, 2024 pm 03:30 PM

Understanding Linux Bashrc: Function, Configuration and Usage In Linux systems, Bashrc (BourneAgainShellruncommands) is a very important configuration file, which contains various commands and settings that are automatically run when the system starts. The Bashrc file is usually located in the user's home directory and is a hidden file. Its function is to customize the Bashshell environment for the user. 1. Bashrc function setting environment

How to configure workgroup in win11 system Feb 22, 2024 pm 09:50 PM

How to configure a workgroup in Win11 A workgroup is a way to connect multiple computers in a local area network, which allows files, printers, and other resources to be shared between computers. In Win11 system, configuring a workgroup is very simple, just follow the steps below. Step 1: Open the "Settings" application. First, click the "Start" button of the Win11 system, and then select the "Settings" application in the pop-up menu. You can also use the shortcut "Win+I" to open "Settings". Step 2: Select "System" In the Settings app, you will see multiple options. Please click the "System" option to enter the system settings page. Step 3: Select "About" In the "System" settings page, you will see multiple sub-options. Please click

Simple and easy-to-understand PyCharm configuration Git tutorial Feb 20, 2024 am 08:28 AM

PyCharm is a commonly used integrated development environment (IDE). In daily development, using Git to manage code is essential. This article will introduce how to configure Git in PyCharm and use Git for code management, with specific code examples. Step 1: Install Git First, make sure Git is installed on your computer. If it is not installed, you can go to [Git official website](https://git-scm.com/) to download and install the latest version of Git

How to configure and install FTPS in Linux system Mar 20, 2024 pm 02:03 PM

Title: How to configure and install FTPS in Linux system, specific code examples are required. In Linux system, FTPS is a secure file transfer protocol. Compared with FTP, FTPS encrypts the transmitted data through TLS/SSL protocol, which improves Security of data transmission. In this article, we will introduce how to configure and install FTPS in a Linux system and provide specific code examples. Step 1: Install vsftpd Open the terminal and enter the following command to install vsftpd: sudo

How to install and configure DRBD on CentOS7 system? Tutorial on implementing high availability and data redundancy! Feb 22, 2024 pm 02:13 PM

DRBD (DistributedReplicatedBlockDevice) is an open source solution for achieving data redundancy and high availability. Here is the tutorial to install and configure DRBD on CentOS7 system: Install DRBD: Open a terminal and log in to the CentOS7 system as administrator. Run the following command to install the DRBD package: sudoyuminstalldrbd Configure DRBD: Edit the DRBD configuration file (usually located in the /etc/drbd.d directory) to configure the settings for DRBD resources. For example, you can define the IP addresses, ports, and devices of the primary node and backup node. Make sure there is a network connection between the primary node and the backup node.

See all articles