首页 > 系统相关 >HDFS的常见Shell操作

HDFS的常见Shell操作

时间:2023-09-13 18:04:59浏览次数:50  
标签:hdfs Shell HDFS drwxr 常见 supergroup hadoop ... xr

HDFS的常见Shell操作

直接在命令行中输入hdfs dfs,可以查看dfs后面可以跟的所有参数。

详细使用方法请参考官方文档。

注意:这里面的[]表示是可选项,<>表示是必填项

[hadoop@hadoop81 hadoop]$ hdfs dfs
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>]
	[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] <path> ...]
	[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-v] [-x] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
	[-head <file>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-truncate [-w] <length> <path> ...]
	[-usage [cmd ...]]

Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]

[hadoop@hadoop81 hadoop]$
  • -ls:查询指定路径信息

首先看第一个ls命令

查看hdfs根目录下的内容

[hadoop@hadoop81 ~]$ hdfs dfs -ls hdfs://cluster1/
Found 5 items
drwxr-xr-x   - hadoop supergroup          0 2023-04-04 09:38 hdfs://cluster1/hbase
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 17:23 hdfs://cluster1/hdptest
drwxr-xr-x   - hadoop supergroup          0 2022-03-26 22:23 hdfs://cluster1/system
drwxrwx---   - hadoop supergroup          0 2022-03-14 11:05 hdfs://cluster1/tmp
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 11:05 hdfs://cluster1/user

其实后面hdfs的url这一串内容在使用时默认是可以省略的,因为hdfs在执行的时候会根据HDOOP_HOME自动识别配置文件core-site.xml中的fs.defaultFS属性

所以这样简写也是可以的

[hadoop@hadoop81 hadoop]$ hdfs dfs -ls /
Found 5 items
drwxr-xr-x   - hadoop supergroup          0 2023-04-04 09:38 /hbase
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 17:23 /hdptest
drwxr-xr-x   - hadoop supergroup          0 2022-03-26 22:23 /system
drwxrwx---   - hadoop supergroup          0 2022-03-14 11:05 /tmp
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 11:05 /user
  • -put:从本地上传文件

接下来我们向hdfs中上传一个文件,使用Hadoop中的README.txt,直接上传到hdfs的根目录即可

[hadoop@hadoop81 hadoop]$ ls
bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share
[hadoop@hadoop81 hadoop]$ hdfs dfs -put README.txt  /

上传成功之后没有任何提示,注意,没有提示就是最好的结果

确认一下刚才上传的文件

[hadoop@hadoop81 hadoop]$ hdfs dfs -ls  /
Found 6 items
-rw-r--r--   3 hadoop supergroup       1366 2023-09-13 16:08 /README.txt
drwxr-xr-x   - hadoop supergroup          0 2023-04-04 09:38 /hbase
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 17:23 /hdptest
drwxr-xr-x   - hadoop supergroup          0 2022-03-26 22:23 /system
drwxrwx---   - hadoop supergroup          0 2022-03-14 11:05 /tmp
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 11:05 /user

在这里可以发现使用hdfs中的ls查询出来的信息和在linux中执行ll查询出来的信息是类似的

在这里能看到这个文件就说明刚才的上传操作是成功的

  • -cat:查看HDFS文件内容

文件上传上去以后,我们还想查看一下HDFS中文件的内容,很简单,使用cat即可

[hadoop@hadoop81 hadoop]$ hdfs dfs -cat /README.txt
For the latest information about Hadoop, please visit our website at:

   http://hadoop.apache.org/core/

and our wiki, at:

   http://wiki.apache.org/hadoop/

This distribution includes cryptographic software.  The country in 
which you currently reside may have restrictions on the import, 
possession, use, and/or re-export to another country, of 
encryption software.  BEFORE using any encryption software, please 
check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to 
see if this is permitted.  See <http://www.wassenaar.org/> for more
information.

The U.S. Government Department of Commerce, Bureau of Industry and
Security (BIS), has classified this software as Export Commodity 
Control Number (ECCN) 5D002.C.1, which includes information security
software using or performing cryptographic functions with asymmetric
algorithms.  The form and manner of this Apache Software Foundation
distribution makes it eligible for export under the License Exception
ENC Technology Software Unrestricted (TSU) exception (see the BIS 
Export Administration Regulations, Section 740.13) for both object 
code and source code.

The following provides more details on the included cryptographic
software:
  Hadoop Core uses the SSL libraries from the Jetty project written 
by mortbay.org.
  • -get:下载文件到本地

如果我们想把hdfs中的文件下载到本地linux文件系统中需要怎么做呢?使用get即可实现

[hadoop@hadoop81 hadoop]$ hdfs dfs -get /README.txt
get: `README.txt': File exists

注意:这样执行报错了,提示文件已存在,我这条命令的意思是要把HDFS中的README.txt下载当前目录中,但是当前目录中已经有这个文件了,要么换到其它目录,要么给文件重命名

[hadoop@hadoop81 hadoop]$ hdfs dfs -get /README.txt README.txt.bak
[hadoop@hadoop81 hadoop]$ ls
bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  README.txt.bak  sbin  share
  • -mkdir [-p]:创建文件夹

后期我们需要在hdfs中维护很多文件,所以就需要创建文件夹来进行分类管理了

下面我们来创建一个文件夹,hdfs中使用mkdir命令

[hadoop@hadoop81 hadoop]$ hdfs dfs -mkdir /test
[hadoop@hadoop81 hadoop]$ hdfs dfs -ls /
Found 7 items
-rw-r--r--   3 hadoop supergroup       1366 2023-09-13 16:08 /README.txt
drwxr-xr-x   - hadoop supergroup          0 2023-04-04 09:38 /hbase
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 17:23 /hdptest
drwxr-xr-x   - hadoop supergroup          0 2022-03-26 22:23 /system
drwxr-xr-x   - hadoop supergroup          0 2023-09-13 16:15 /test
drwxrwx---   - hadoop supergroup          0 2022-03-14 11:05 /tmp
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 11:05 /user

如果要递归创建多级目录,还需要再指定-p参数

[hadoop@hadoop81 hadoop]$ hdfs dfs -mkdir -p /abc/def/ghi
[hadoop@hadoop81 hadoop]$ hdfs dfs -ls /
Found 8 items
-rw-r--r--   3 hadoop supergroup       1366 2023-09-13 16:08 /README.txt
drwxr-xr-x   - hadoop supergroup          0 2023-09-13 16:16 /abc
drwxr-xr-x   - hadoop supergroup          0 2023-04-04 09:38 /hbase
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 17:23 /hdptest
drwxr-xr-x   - hadoop supergroup          0 2022-03-26 22:23 /system
drwxr-xr-x   - hadoop supergroup          0 2023-09-13 16:15 /test
drwxrwx---   - hadoop supergroup          0 2022-03-14 11:05 /tmp
drwxr-xr-x   - hadoop supergroup          0 2022-03-14 11:05 /user

想要递归显示所有目录的信息,可以在ls后面添加-R参数

[hadoop@hadoop81 hadoop]$ hdfs dfs -ls -R /
-rw-r--r--   3 hadoop supergroup       1366 2023-09-13 16:08 /README.txt
drwxr-xr-x   - hadoop supergroup          0 2023-09-13 16:16 /abc
drwxr-xr-x   - hadoop supergroup          0 2023-09-13 16:16 /abc/def
drwxr-xr-x   - hadoop supergroup          0 2023-09-13 16:16 /abc/def/ghi
drwxr-xr-x   - hadoop supergroup          0 2023-04-04 09:38 /hbase
drwxr-xr-x   - hadoop supergroup          0 2022-04-01 22:03 /hbase/.hbck
drwxr-xr-x   - hadoop supergroup          0 2023-04-04 09:57 /hbase/.tmp
drwxr-xr-x   - hadoop supergroup          0 2023-04-04 09:57 /hbase/.tmp/data
drwxr-xr-x   - hadoop supergroup          0 2023-04-04 10:59 /hbase/.tmp/data/default
-rw-r--r--   3 hadoop supergroup          0 2023-04-17 10:31 /hbase/.tmp/hbase-hbck.lock
drwxr-xr-x   - hadoop supergroup          0 2023-04-04 17:19 /hbase/MasterProcWALs
......
  • -rm [-r]:删除文件/文件夹

如果想要删除hdfs中的目录或者文件,可以使用rm

删除文件

[hadoop@hadoop81 hadoop]$ hdfs dfs -rm /README.txt
2023-09-13 16:20:19,669 INFO fs.TrashPolicyDefault: Moved: 'hdfs://cluster1/README.txt' to trash at: hdfs://cluster1/user/hadoop/.Trash/Current/README.txt

删除目录,注意,删除目录需要指定-r参数

[hadoop@hadoop81 hadoop]$ hdfs dfs -rm /abc
rm: `/abc': Is a directory
[hadoop@hadoop81 hadoop]$ hdfs dfs -ls -R  /abc
drwxr-xr-x   - hadoop supergroup          0 2023-09-13 16:16 /abc/def
drwxr-xr-x   - hadoop supergroup          0 2023-09-13 16:16 /abc/def/ghi
[hadoop@hadoop81 hadoop]$ hdfs dfs -rm -r /abc
2023-09-13 16:23:19,476 INFO fs.TrashPolicyDefault: Moved: 'hdfs://cluster1/abc' to trash at: hdfs://cluster1/user/hadoop/.Trash/Current/abc

/abc/def/ghi被递归删除

案例实操

需求:统计HDFS中文件的个数和每个文件的大小

1:统计根目录下文件的个数

[hadoop@hadoop81 hadoop]$ hdfs dfs -ls / |grep /| wc -l
7

2:统计根目录下每个文件的大小,最终把文件名称和大小打印出来

[hadoop@hadoop81 bin]$ hdfs dfs -ls / |grep / |  awk '{print $8,$5}'
/hbase 0
/hdptest 0
/mysql_secure_installation 9170549
/mysql_ssl_rsa_setup 7782694
/mysqlshow 9178434
/mysqlslap 9285880
/system 0
/test 0
/tmp 0
/user 0

标签:hdfs,Shell,HDFS,drwxr,常见,supergroup,hadoop,...,xr
From: https://blog.51cto.com/u_11585528/7462840

相关文章

  • 七牛云存储____qshell的使用
    //新建一个qshell.conf文件内容编译下//qshell执行account命令加上aksk命令如下//qshell执行该命令//qdownload命令如下qshellqdownload命令线程10配置文件qshell.conf//运行结果//工具项目和代码下载七牛云存储资源http://pan.baidu.com/s/1bJok2M......
  • 七牛云存储____qshell简介
    ......
  • Java实现常见查找算法
    Java实现常见查找算法查找是在大量的信息中寻找一个特定的信息元素,在计算机应用中,查找是常用的基本运算,例如编译程序中符号表的查找。线性查找线性查找(LinearSearch)是一种简单的查找算法,用于在数据集中逐一比较每个元素,直到找到目标元素或搜索完整个数据集。它适用于任何类型......
  • Jmeter BeanShell, 读取HTTP请求返回的JSON,并将其存到文件中
    1、创建BeanShellSampler将fastjson-1.2.30.jar放到Jmeter安装目录\lib下 importjava.io.*;importcom.alibaba.fastjson.JSONObject;importjava.io.IOException;importjava.io.File;importjava.io.FileOutputStream;importjava.io.FileWriter;privatestat......
  • shell脚本生成随机密码
    1.创建generate_password.sh脚本2.编写内容#!/bin/bash#默认密码长度length=12#默认包含大小写字母、数字和特殊字符characters="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()_\-+=<>?"#生成随机密码generate_password(){loca......
  • IP欺骗最常见的3种攻击类型!
    IP欺骗是指行为产生的IP数据包为伪造的源IP地址,以便冒充其他系统或发件人的身份。IP欺骗是最容易发起的攻击之一,也是最具破坏性的攻击之一,因此IP欺骗受到了大家的广泛关注。本文主要为大家介绍一下IP欺骗最常见的3种攻击类型,知己知彼,防患于未然。1、僵尸网络僵尸网络Bo......
  • 软件性能测试的步骤有哪些?常见测试指标和工具 ?
    ​性能测试报告一、软件性能测试的步骤有哪些?1.确定测试环境和资源。2.确定测试指标和参数。3.设计用例和性能测试脚本。4.配置测试环境。5.执行测试。6.分析和回归测试。7.报告出具。二、性能测试的常见指标一般操作响应时间:系统执行查询响应时间不超过多少......
  • Shell脚本中文英文多语言国际化和命令行批处理(bash sh cmd bat)中定义函数的简单写法
    目录命令行脚本参考-bat命令行脚本参考-bash值得学习的知识点1.识别终端使用的语言2.函数的编写3.获取用户的输入4.bat文件老是乱码怎么办有时候为了方便别人使用,我们会选择去编写各种各样的命令行脚本:给Windows用户编写.batcmd批处理脚本,给macOS、Linux用户编写.shbas......
  • 买彩票能中大奖?用Java盘点常见的概率悖论 | 京东云技术团队
    引言《双色球头奖概率与被雷劈中的概率哪个高?》《3人轮流射击,枪法最差的反而更容易活下来?》让我们用Java来探索ta们!悖论1:著名的三门问题规则描述:你正在参加一个游戏节目,你被要求在三扇门中选择一扇:其中一扇后面有一辆车;其余两扇后面则是山羊。你选择了一道门,假设是一号门,然后......
  • 使用Python调用Hadoop Hdfs的API
    一、Java调用hdfs的apiimportorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.FileSystem;importorg.apache.hadoop.fs.Path;importorg.junit.After;importorg.junit.Before;importorg.junit.Test;importjava.io.IOException;importjava.net......