HDFS的常见Shell操作
直接在命令行中输入hdfs dfs,可以查看dfs后面可以跟的所有参数。
注意:这里面的[]表示是可选项,<>表示是必填项
[hadoop@hadoop81 hadoop]$ hdfs dfs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>]
[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] <path> ...]
[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-v] [-x] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
[-head <file>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]
Generic options supported are:
-conf <configuration file> specify an application configuration file
-D <property=value> define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port> specify a ResourceManager
-files <file1,...> specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...> specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...> specify a comma-separated list of archives to be unarchived on the compute machines
The general command line syntax is:
command [genericOptions] [commandOptions]
[hadoop@hadoop81 hadoop]$
- -ls:查询指定路径信息
首先看第一个ls命令
查看hdfs根目录下的内容
[hadoop@hadoop81 ~]$ hdfs dfs -ls hdfs://cluster1/
Found 5 items
drwxr-xr-x - hadoop supergroup 0 2023-04-04 09:38 hdfs://cluster1/hbase
drwxr-xr-x - hadoop supergroup 0 2022-03-14 17:23 hdfs://cluster1/hdptest
drwxr-xr-x - hadoop supergroup 0 2022-03-26 22:23 hdfs://cluster1/system
drwxrwx--- - hadoop supergroup 0 2022-03-14 11:05 hdfs://cluster1/tmp
drwxr-xr-x - hadoop supergroup 0 2022-03-14 11:05 hdfs://cluster1/user
其实后面hdfs的url这一串内容在使用时默认是可以省略的,因为hdfs在执行的时候会根据HDOOP_HOME自动识别配置文件core-site.xml中的fs.defaultFS属性
所以这样简写也是可以的
[hadoop@hadoop81 hadoop]$ hdfs dfs -ls /
Found 5 items
drwxr-xr-x - hadoop supergroup 0 2023-04-04 09:38 /hbase
drwxr-xr-x - hadoop supergroup 0 2022-03-14 17:23 /hdptest
drwxr-xr-x - hadoop supergroup 0 2022-03-26 22:23 /system
drwxrwx--- - hadoop supergroup 0 2022-03-14 11:05 /tmp
drwxr-xr-x - hadoop supergroup 0 2022-03-14 11:05 /user
- -put:从本地上传文件
接下来我们向hdfs中上传一个文件,使用Hadoop中的README.txt,直接上传到hdfs的根目录即可
[hadoop@hadoop81 hadoop]$ ls
bin etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share
[hadoop@hadoop81 hadoop]$ hdfs dfs -put README.txt /
上传成功之后没有任何提示,注意,没有提示就是最好的结果
确认一下刚才上传的文件
[hadoop@hadoop81 hadoop]$ hdfs dfs -ls /
Found 6 items
-rw-r--r-- 3 hadoop supergroup 1366 2023-09-13 16:08 /README.txt
drwxr-xr-x - hadoop supergroup 0 2023-04-04 09:38 /hbase
drwxr-xr-x - hadoop supergroup 0 2022-03-14 17:23 /hdptest
drwxr-xr-x - hadoop supergroup 0 2022-03-26 22:23 /system
drwxrwx--- - hadoop supergroup 0 2022-03-14 11:05 /tmp
drwxr-xr-x - hadoop supergroup 0 2022-03-14 11:05 /user
在这里可以发现使用hdfs中的ls查询出来的信息和在linux中执行ll查询出来的信息是类似的
在这里能看到这个文件就说明刚才的上传操作是成功的
- -cat:查看HDFS文件内容
文件上传上去以后,我们还想查看一下HDFS中文件的内容,很简单,使用cat即可
[hadoop@hadoop81 hadoop]$ hdfs dfs -cat /README.txt
For the latest information about Hadoop, please visit our website at:
http://hadoop.apache.org/core/
and our wiki, at:
http://wiki.apache.org/hadoop/
This distribution includes cryptographic software. The country in
which you currently reside may have restrictions on the import,
possession, use, and/or re-export to another country, of
encryption software. BEFORE using any encryption software, please
check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to
see if this is permitted. See <http://www.wassenaar.org/> for more
information.
The U.S. Government Department of Commerce, Bureau of Industry and
Security (BIS), has classified this software as Export Commodity
Control Number (ECCN) 5D002.C.1, which includes information security
software using or performing cryptographic functions with asymmetric
algorithms. The form and manner of this Apache Software Foundation
distribution makes it eligible for export under the License Exception
ENC Technology Software Unrestricted (TSU) exception (see the BIS
Export Administration Regulations, Section 740.13) for both object
code and source code.
The following provides more details on the included cryptographic
software:
Hadoop Core uses the SSL libraries from the Jetty project written
by mortbay.org.
- -get:下载文件到本地
如果我们想把hdfs中的文件下载到本地linux文件系统中需要怎么做呢?使用get即可实现
[hadoop@hadoop81 hadoop]$ hdfs dfs -get /README.txt
get: `README.txt': File exists
注意:这样执行报错了,提示文件已存在,我这条命令的意思是要把HDFS中的README.txt下载当前目录中,但是当前目录中已经有这个文件了,要么换到其它目录,要么给文件重命名
[hadoop@hadoop81 hadoop]$ hdfs dfs -get /README.txt README.txt.bak
[hadoop@hadoop81 hadoop]$ ls
bin etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt README.txt.bak sbin share
- -mkdir [-p]:创建文件夹
后期我们需要在hdfs中维护很多文件,所以就需要创建文件夹来进行分类管理了
下面我们来创建一个文件夹,hdfs中使用mkdir命令
[hadoop@hadoop81 hadoop]$ hdfs dfs -mkdir /test
[hadoop@hadoop81 hadoop]$ hdfs dfs -ls /
Found 7 items
-rw-r--r-- 3 hadoop supergroup 1366 2023-09-13 16:08 /README.txt
drwxr-xr-x - hadoop supergroup 0 2023-04-04 09:38 /hbase
drwxr-xr-x - hadoop supergroup 0 2022-03-14 17:23 /hdptest
drwxr-xr-x - hadoop supergroup 0 2022-03-26 22:23 /system
drwxr-xr-x - hadoop supergroup 0 2023-09-13 16:15 /test
drwxrwx--- - hadoop supergroup 0 2022-03-14 11:05 /tmp
drwxr-xr-x - hadoop supergroup 0 2022-03-14 11:05 /user
如果要递归创建多级目录,还需要再指定-p参数
[hadoop@hadoop81 hadoop]$ hdfs dfs -mkdir -p /abc/def/ghi
[hadoop@hadoop81 hadoop]$ hdfs dfs -ls /
Found 8 items
-rw-r--r-- 3 hadoop supergroup 1366 2023-09-13 16:08 /README.txt
drwxr-xr-x - hadoop supergroup 0 2023-09-13 16:16 /abc
drwxr-xr-x - hadoop supergroup 0 2023-04-04 09:38 /hbase
drwxr-xr-x - hadoop supergroup 0 2022-03-14 17:23 /hdptest
drwxr-xr-x - hadoop supergroup 0 2022-03-26 22:23 /system
drwxr-xr-x - hadoop supergroup 0 2023-09-13 16:15 /test
drwxrwx--- - hadoop supergroup 0 2022-03-14 11:05 /tmp
drwxr-xr-x - hadoop supergroup 0 2022-03-14 11:05 /user
想要递归显示所有目录的信息,可以在ls后面添加-R参数
[hadoop@hadoop81 hadoop]$ hdfs dfs -ls -R /
-rw-r--r-- 3 hadoop supergroup 1366 2023-09-13 16:08 /README.txt
drwxr-xr-x - hadoop supergroup 0 2023-09-13 16:16 /abc
drwxr-xr-x - hadoop supergroup 0 2023-09-13 16:16 /abc/def
drwxr-xr-x - hadoop supergroup 0 2023-09-13 16:16 /abc/def/ghi
drwxr-xr-x - hadoop supergroup 0 2023-04-04 09:38 /hbase
drwxr-xr-x - hadoop supergroup 0 2022-04-01 22:03 /hbase/.hbck
drwxr-xr-x - hadoop supergroup 0 2023-04-04 09:57 /hbase/.tmp
drwxr-xr-x - hadoop supergroup 0 2023-04-04 09:57 /hbase/.tmp/data
drwxr-xr-x - hadoop supergroup 0 2023-04-04 10:59 /hbase/.tmp/data/default
-rw-r--r-- 3 hadoop supergroup 0 2023-04-17 10:31 /hbase/.tmp/hbase-hbck.lock
drwxr-xr-x - hadoop supergroup 0 2023-04-04 17:19 /hbase/MasterProcWALs
......
- -rm [-r]:删除文件/文件夹
如果想要删除hdfs中的目录或者文件,可以使用rm
删除文件
[hadoop@hadoop81 hadoop]$ hdfs dfs -rm /README.txt
2023-09-13 16:20:19,669 INFO fs.TrashPolicyDefault: Moved: 'hdfs://cluster1/README.txt' to trash at: hdfs://cluster1/user/hadoop/.Trash/Current/README.txt
删除目录,注意,删除目录需要指定-r参数
[hadoop@hadoop81 hadoop]$ hdfs dfs -rm /abc
rm: `/abc': Is a directory
[hadoop@hadoop81 hadoop]$ hdfs dfs -ls -R /abc
drwxr-xr-x - hadoop supergroup 0 2023-09-13 16:16 /abc/def
drwxr-xr-x - hadoop supergroup 0 2023-09-13 16:16 /abc/def/ghi
[hadoop@hadoop81 hadoop]$ hdfs dfs -rm -r /abc
2023-09-13 16:23:19,476 INFO fs.TrashPolicyDefault: Moved: 'hdfs://cluster1/abc' to trash at: hdfs://cluster1/user/hadoop/.Trash/Current/abc
/abc/def/ghi被递归删除
案例实操
需求:统计HDFS中文件的个数和每个文件的大小
1:统计根目录下文件的个数
[hadoop@hadoop81 hadoop]$ hdfs dfs -ls / |grep /| wc -l
7
2:统计根目录下每个文件的大小,最终把文件名称和大小打印出来
[hadoop@hadoop81 bin]$ hdfs dfs -ls / |grep / | awk '{print $8,$5}'
/hbase 0
/hdptest 0
/mysql_secure_installation 9170549
/mysql_ssl_rsa_setup 7782694
/mysqlshow 9178434
/mysqlslap 9285880
/system 0
/test 0
/tmp 0
/user 0
标签:hdfs,Shell,HDFS,drwxr,常见,supergroup,hadoop,...,xr
From: https://blog.51cto.com/u_11585528/7462840