Home Database Mysql Tutorial HDFS文件命令

HDFS文件命令

Jun 07, 2016 pm 04:41 PM
hdfs linux Order document

HDFS在设计上仿照Linux下的文件操作命令,所以对熟悉Linux文件命令的小伙伴很好上手。另外在Hadoop DFS中没有pwd概念,所有都需要全路径。(本文基于版本2.5 CDH 5.2.1) 列出命令列表、格式和帮助,以及选择一个非参数文件配置的namenode。 hdfs dfs -usageh

HDFS在设计上仿照Linux下的文件操作命令,所以对熟悉Linux文件命令的小伙伴很好上手。另外在Hadoop DFS中没有pwd概念,所有都需要全路径。(本文基于版本2.5 CDH 5.2.1)
列出命令列表、格式和帮助,以及选择一个非参数文件配置的namenode。

1

2

3

4

5

hdfs dfs -usage

hadoop dfs -usage ls

hadoop dfs -help

-fs <local>      specify a namenode

hdfs dfs -fs hdfs://test1:9000 -ls /</local>

Copy after login

——————————————————————————–
-df [-h] [path …] :
Shows the capacity, free and used space of the filesystem. If the filesystem has
multiple partitions, and no path to a particular partition is specified, then
the status of the root partitions will be shown.

1

2

3

$ hdfs dfs -df

Filesystem                 Size   Used     Available  Use%

hdfs://test1:9000  413544071168  98304  345612906496    0%

Copy after login

——————————————————————————–
-mkdir [-p] path … :
Create a directory in specified location.

-p Do not fail if the directory already exists

-rmdir dir … :
Removes the directory entry specified by each directory argument, provided it is
empty.

1

2

3

4

hdfs dfs -mkdir /tmp

hdfs dfs -mkdir /tmp/txt

hdfs dfs -rmdir /tmp/txt

hdfs dfs -mkdir -p /tmp/txt/hello

Copy after login

——————————————————————————–
-copyFromLocal [-f] [-p] localsrc … dst :
Identical to the -put command.

-copyToLocal [-p] [-ignoreCrc] [-crc] src … localdst :
Identical to the -get command.

-moveFromLocal localsrc …
Same as -put, except that the source is deleted after it’s copied.

-put [-f] [-p] localsrc …
Copy files from the local file system into fs. Copying fails if the file already
exists, unless the -f flag is given. Passing -p preserves access and
modification times, ownership and the mode. Passing -f overwrites the
destination if it already exists.

-get [-p] [-ignoreCrc] [-crc] src … localdst :
Copy files that match the file pattern src to the local name. src is kept.
When copying multiple files, the destination must b/e a directory. Passing -p
preserves access and modification times, ownership and the mode.

-getmerge [-nl] src localdst :
Get all the files in the directories that match the source file pattern and
merge and sort them to only one file on local fs. src is kept.

-nl Add a newline character at the end of each file.

-cat [-ignoreCrc] src … :
Fetch all files that match the file pattern src and display their content on
stdout.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

#通配符? * {} []

hdfs dfs -cat /tmp/*.txt

Hello, Hadoop

Hello, HDFS

hdfs dfs -cat /tmp/h?fs.txt

Hello, HDFS

hdfs dfs -cat /tmp/h{a,d}*.txt

Hello, Hadoop

Hello, HDFS

hdfs dfs -cat /tmp/h[a-d]*.txt

Hello, Hadoop

Hello, HDFS

echo "Hello, Hadoop" > hadoop.txt

echo "Hello, HDFS" > hdfs.txt

dd if=/dev/zero of=/tmp/test.zero bs=1M count=1024

    1024+0 records in

    1024+0 records out

    1073741824 bytes (1.1 GB) copied, 0.93978 s, 1.1 GB/s

hdfs dfs -moveFromLocal /tmp/test.zero /tmp

hdfs dfs -put *.txt /tmp

Copy after login

——————————————————————————–
-ls [-d] [-h] [-R] [path …] :
List the contents that match the specified file pattern. If path is not
specified, the contents of /user/currentUser will be listed. Directory entries
are of the form:
permissions – userId groupId sizeOfDirectory(in bytes)
modificationDate(yyyy-MM-dd HH:mm) directoryName

and file entries are of the form:
permissions numberOfReplicas userId groupId sizeOfFile(in bytes)
modificationDate(yyyy-MM-dd HH:mm) fileName

-d Directories are listed as plain files.
-h Formats the sizes of files in a human-readable fashion rather than a number
of bytes.
-R Recursively list the contents of directories.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

hdfs dfs -ls /tmp

hdfs dfs -ls -d /tmp

hdfs dfs -ls -h /tmp

  Found 4 items

  -rw-r--r--   3 hdfs supergroup         14 2014-12-18 10:00 /tmp/hadoop.txt

  -rw-r--r--   3 hdfs supergroup         12 2014-12-18 10:00 /tmp/hdfs.txt

  -rw-r--r--   3 hdfs supergroup        1 G 2014-12-18 10:19 /tmp/test.zero

  drwxr-xr-x   - hdfs supergroup          0 2014-12-18 10:07 /tmp/txt

hdfs dfs -ls -R -h /tmp

  -rw-r--r--   3 hdfs supergroup         14 2014-12-18 10:00 /tmp/hadoop.txt

  -rw-r--r--   3 hdfs supergroup         12 2014-12-18 10:00 /tmp/hdfs.txt

  -rw-r--r--   3 hdfs supergroup        1 G 2014-12-18 10:19 /tmp/test.zero

  drwxr-xr-x   - hdfs supergroup          0 2014-12-18 10:07 /tmp/txt

  drwxr-xr-x   - hdfs supergroup          0 2014-12-18 10:07 /tmp/txt/hello

Copy after login

——————————————————————————–
-checksum src … :
Dump checksum information for files that match the file pattern src to stdout.
Note that this requires a round-trip to a datanode storing each block of the
file, and thus is not efficient to run on a large number of files. The checksum
of a file depends on its content, block size and the checksum algorithm and
parameters used for creating the file.

1

2

hdfs dfs -checksum /tmp/test.zero

  /tmp/test.zero    MD5-of-262144MD5-of-512CRC32C   000002000000000000040000f960570129a4ef3a7e179073adceae97

Copy after login

——————————————————————————–
-appendToFile localsrc … dst :
Appends the contents of all the given local files to the given dst file. The dst
file will be created if it does not exist. If localSrc is -, then the input is
read from stdin.

1

2

3

4

hdfs dfs -appendToFile *.txt hello.txt

hdfs dfs -cat hello.txt

  Hello, Hadoop

  Hello, HDFS

Copy after login

——————————————————————————–
-tail [-f] file :
Show the last 1KB of the file.

1

2

3

4

5

hdfs dfs -tail -f hello.txt

#waiting for output. then Ctrl + C

#another terminal

hdfs dfs -appendToFile - hello.txt

#then type something

Copy after login

——————————————————————————–
-cp [-f] [-p | -p[topax]] src …
Copy files that match the file pattern src to a destination. When copying
multiple files, the destination must be a directory. Passing -p preserves status
[topax] (timestamps, ownership, permission, ACLs, XAttr). If -p is specified
with no arg, then preserves timestamps, ownership, permission. If -pa is
permission. Passing -f overwrites the destination if it already exists. raw
namespace extended attributes are preserved if (1) they are supported (HDFS
only) and, (2) all of the source and target pathnames are in the /.reserved/raw
hierarchy. raw namespace xattr preservation is determined solely by the presence
(or absence) of the /.reserved/raw prefix and not by the -p option.
-mv src … dst :
Move files that match the specified file pattern src to a destination dst.
When moving multiple files, the destination must be a directory.
-rm [-f] [-r|-R] [-skipTrash] src … :
Delete all files that match the specified file pattern. Equivalent to the Unix
command “rm src”

-skipTrash option bypasses trash, if enabled, and immediately deletes src
-f If the file does not exist, do not display a diagnostic message or
modify the exit status to reflect an error.
-[rR] Recursively deletes directories
-stat [format] path … :
Print statistics about the file/directory at path in the specified format.
Format accepts filesize in blocks (%b), group name of owner(%g), filename (%n),
block size (%o), replication (%r), user name of owner(%u), modification date
(%y, %Y)

1

2

3

4

5

6

7

8

9

10

11

hdfs dfs -stat /tmp/hadoop.txt

    2014-12-18 02:00:08

hdfs dfs -cp -p -f /tmp/hello.txt /tmp/hello.txt.bak

hdfs dfs -stat /tmp/hadoop.txt.bak

hdfs dfs -rm /tmp/not_exists

    rm: `/tmp/not_exists': No such file or directory

echo $?

    1

hdfs dfs -rm -f /tmp/123321123123123

echo $?

0

Copy after login

——————————————————————————–
-count [-q] path … :
Count the number of directories, files and bytes under the paths
that match the specified file pattern. The output columns are:
DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME or
QUOTA REMAINING_QUOTA SPACE_QUOTA REMAINING_SPACE_QUOTA
DIR_COUNT FILE_COUNT CONTENT_SIZE FILE_NAME

-du [-s] [-h] path … :
Show the amount of space, in bytes, used by the files that match the specified
file pattern. The following flags are optional:

-s Rather than showing the size of each individual file that matches the
pattern, shows the total (summary) size.
-h Formats the sizes of files in a human-readable fashion rather than a number
of bytes.

Note that, even without the -s option, this only shows size summaries one level
deep into a directory.

The output is in the form
size name(full path)

1

2

3

4

5

6

7

8

9

10

11

hdfs dfs -count /tmp

           3            3         1073741850 /tmp

hdfs dfs -du /tmp

    14          /tmp/hadoop.txt

    12          /tmp/hdfs.txt

    1073741824  /tmp/test.zero

    0           /tmp/txt

hdfs dfs -du -s /tmp

    1073741850  /tmp

hdfs dfs -du -s -h /tmp

    1.0 G  /tmp

Copy after login

——————————————————————————–
-chgrp [-R] GROUP PATH… :
This is equivalent to -chown … :GROUP …

-chmod [-R] MODE[,MODE]… | OCTALMODE PATH… :
Changes permissions of a file. This works similar to the shell’s chmod command
with a few exceptions.

-R modifies the files recursively. This is the only option currently
supported.
MODE Mode is the same as mode used for the shell’s command. The only
letters recognized are ‘rwxXt’, e.g. +t,a+r,g-w,+rwx,o=r.
OCTALMODE Mode specifed in 3 or 4 digits. If 4 digits, the first may be 1 or
0 to turn the sticky bit on or off, respectively. Unlike the
shell command, it is not possible to specify only part of the
mode, e.g. 754 is same as u=rwx,g=rx,o=r.

If none of ‘augo’ is specified, ‘a’ is assumed and unlike the shell command, no
umask is applied.

-chown [-R] [OWNER][:[GROUP]] PATH… :
Changes owner and group of a file. This is similar to the shell’s chown command
with a few exceptions.

-R modifies the files recursively. This is the only option currently
supported.

If only the owner or group is specified, then only the owner or group is
modified. The owner and group names may only consist of digits, alphabet, and
any of [-_./@a-zA-Z0-9]. The names are case sensitive.

WARNING: Avoid using ‘.’ to separate user name and group though Linux allows it.
If user names have dots in them and you are using local file system, you might
see surprising results since the shell command ‘chown’ is used for local files.

-touchz path … :
Creates a file of zero length at path with current time as the timestamp of
that path. An error is returned if the file exists with non-zero length

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

hdfs dfs -mkdir -p /user/spark/tmp

hdfs dfs -chown -R spark:hadoop /user/spark

hdfs dfs -chmod -R 775 /user/spark/tmp

hdfs dfs -ls -d /user/spark/tmp

    drwxrwxr-x   - spark hadoop          0 2014-12-18 14:51 /user/spark/tmp

hdfs dfs -chmod +t /user/spark/tmp

#user:spark

    hdfs dfs -touchz /user/spark/tmp/own_by_spark

#user:hadoop

useradd -g hadoop hadoop

su - hadoop

id

    uid=502(hadoop) gid=492(hadoop) groups=492(hadoop)

hdfs dfs -rm /user/spark/tmp/own_by_spark

rm: Permission denied by sticky bit setting: user=hadoop, inode=own_by_spark

#使用超级管理员(dfs.permissions.superusergroup = hdfs),可以无视sticky位设置

Copy after login

——————————————————————————–
-test -[defsz] path :
Answer various questions about path, with result via exit status.
-d return 0 if path is a directory.
-e return 0 if path exists.
-f return 0 if path is a file.
-s return 0 if file path is greater than zero bytes in size.
-z return 0 if file path is zero bytes in size, else return 1.

1

2

3

4

5

6

hdfs dfs -test -d /tmp

echo $?

    0

hdfs dfs -test -f /tmp/txt

echo $?

    1

Copy after login

——————————————————————————–
-setrep [-R] [-w] rep path … :
Set the replication level of a file. If path is a directory then the command
recursively changes the replication factor of all files under the directory tree
rooted at path.
-w It requests that the command waits for the replication to complete. This
can potentially take a very long time.

1

2

3

4

5

6

7

hdfs fsck /tmp/test.zero -blocks -locations

    Average block replication:  3.0

hdfs dfs -setrep -w 4  /tmp/test.zero

    Replication 4 set: /tmp/test.zero

    Waiting for /tmp/test.zero .... done

hdfs fsck /tmp/test.zero -blocks

    Average block replication:  4.0

Copy after login
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to use docker desktop How to use docker desktop Apr 15, 2025 am 11:45 AM

How to use Docker Desktop? Docker Desktop is a tool for running Docker containers on local machines. The steps to use include: 1. Install Docker Desktop; 2. Start Docker Desktop; 3. Create Docker image (using Dockerfile); 4. Build Docker image (using docker build); 5. Run Docker container (using docker run).

How to view the docker process How to view the docker process Apr 15, 2025 am 11:48 AM

Docker process viewing method: 1. Docker CLI command: docker ps; 2. Systemd CLI command: systemctl status docker; 3. Docker Compose CLI command: docker-compose ps; 4. Process Explorer (Windows); 5. /proc directory (Linux).

What computer configuration is required for vscode What computer configuration is required for vscode Apr 15, 2025 pm 09:48 PM

VS Code system requirements: Operating system: Windows 10 and above, macOS 10.12 and above, Linux distribution processor: minimum 1.6 GHz, recommended 2.0 GHz and above memory: minimum 512 MB, recommended 4 GB and above storage space: minimum 250 MB, recommended 1 GB and above other requirements: stable network connection, Xorg/Wayland (Linux)

vscode cannot install extension vscode cannot install extension Apr 15, 2025 pm 07:18 PM

The reasons for the installation of VS Code extensions may be: network instability, insufficient permissions, system compatibility issues, VS Code version is too old, antivirus software or firewall interference. By checking network connections, permissions, log files, updating VS Code, disabling security software, and restarting VS Code or computers, you can gradually troubleshoot and resolve issues.

Can vscode be used for mac Can vscode be used for mac Apr 15, 2025 pm 07:36 PM

VS Code is available on Mac. It has powerful extensions, Git integration, terminal and debugger, and also offers a wealth of setup options. However, for particularly large projects or highly professional development, VS Code may have performance or functional limitations.

What is vscode What is vscode for? What is vscode What is vscode for? Apr 15, 2025 pm 06:45 PM

VS Code is the full name Visual Studio Code, which is a free and open source cross-platform code editor and development environment developed by Microsoft. It supports a wide range of programming languages ​​and provides syntax highlighting, code automatic completion, code snippets and smart prompts to improve development efficiency. Through a rich extension ecosystem, users can add extensions to specific needs and languages, such as debuggers, code formatting tools, and Git integrations. VS Code also includes an intuitive debugger that helps quickly find and resolve bugs in your code.

How to run java code in notepad How to run java code in notepad Apr 16, 2025 pm 07:39 PM

Although Notepad cannot run Java code directly, it can be achieved by using other tools: using the command line compiler (javac) to generate a bytecode file (filename.class). Use the Java interpreter (java) to interpret bytecode, execute the code, and output the result.

What is the main purpose of Linux? What is the main purpose of Linux? Apr 16, 2025 am 12:19 AM

The main uses of Linux include: 1. Server operating system, 2. Embedded system, 3. Desktop operating system, 4. Development and testing environment. Linux excels in these areas, providing stability, security and efficient development tools.

See all articles