完成大型数据库实验二熟悉常用的HDFS操作,以下为实验内容
实验2
熟悉常用的HDFS操作
1.实验目的
(1)理解HDFS在Hadoop体系结构中的角色;
(2)熟练使用HDFS操作常用的Shell命令;
(3)熟悉HDFS操作常用的Java API。
2. 实验平台
(1)操作系统:Linux(建议Ubuntu16.04或Ubuntu18.04);
(2)Hadoop版本:3.1.3;
(3)JDK版本:1.8;
(4)Java IDE:Eclipse。
3. 实验步骤
(一)编程实现以下功能,并利用Hadoop提供的Shell命令完成相同任务:
(1) 向HDFS中上传任意文本文件,如果指定的文件在HDFS中已经存在,则由用户来指定是追加到原有文件末尾还是覆盖原有的文件;
(2) 从HDFS中下载指定文件,如果本地文件与要下载的文件名称相同,则自动对下载的文件重命名;
(3) 将HDFS中指定文件的内容输出到终端中;
(4) 显示HDFS中指定的文件的读写权限、大小、创建时间、路径等信息;
(5) 给定HDFS中某一个目录,输出该目录下的所有文件的读写权限、大小、创建时间、路径等信息,如果该文件是目录,则递归输出该目录下所有文件相关信息;
(6) 提供一个HDFS内的文件的路径,对该文件进行创建和删除操作。如果文件所在目录不存在,则自动创建目录;
(7) 提供一个HDFS的目录的路径,对该目录进行创建和删除操作。创建目录时,如果目录文件所在目录不存在,则自动创建相应目录;删除目录时,由用户指定当该目录不为空时是否还删除该目录;
(8) 向HDFS中指定的文件追加内容,由用户指定内容追加到原有文件的开头或结尾;
(9) 删除HDFS中指定的文件;
(10) 在HDFS中,将文件从源路径移动到目的路径。
(二)编程实现一个类“MyFSDataInputStream”,该类继承“org.apache.hadoop.fs.FSDataInputStream”,要求如下:实现按行读取HDFS中指定文件的方法“readLine()”,如果读到文件末尾,则返回空,否则返回文件一行的文本。
(三)查看Java帮助手册或其它资料,用“java.net.URL”和“org.apache.hadoop.fs.FsURLStreamHandlerFactory”编程完成输出HDFS中指定文件的文本到终端中。
4.实验报告
题目: |
熟悉常用的HDFS操作 |
姓名 李健龙 |
|
日期 2024/12/5 |
实验环境:Ubuntu 18.04.6 LTS Hadoop 3.1.3 |
||||
实验内容与完成情况: 3073 SecondaryNameNode 2726 NameNode 2873 DataNode hadoop@hadoop:~/hadoop$ hadoop fs -ls /
Found 2 items drwxr-xr-x - hadoop supergroup 0 2024-07-19 22:06 /tmp drwxr-xr-x - hadoop supergroup 0 2024-11-04 17:08 /user hadoop@hadoop:~/hadoop$ hadoop@hadoop:~/hadoop$ ^C hadoop@hadoop:~/hadoop$ hadoop fs -put /path/to/localfile /path/in/hdfs put: `/path/in/hdfs': No such file or directory: `hdfs://localhost:9000/path/in/hdfs' hadoop@hadoop:~/hadoop$ hadoop fs -put /path/to/localfile /user put: `/path/to/localfile': No such file or directory hadoop@hadoop:~/hadoop$ ^C hadoop@hadoop:~/hadoop$ cat /home/hadoop/sample.txt cat: /home/hadoop/sample.txt: 没有那个文件或目录 hadoop@hadoop:~/hadoop$ cat /home/hadoop/sample.txt hadoop@hadoop:~/hadoop$ echo "Hadoop HDFS 文件系统" > /home/hadoop/sample.txt hadoop@hadoop:~/hadoop$ echo "====================" >> /home/hadoop/sample.txt hadoop@hadoop:~/hadoop$ echo "" >> /home/hadoop/sample.txt hadoop@hadoop:~/hadoop$ echo "1. HDFS是一个分布式文件系统,用于存储大规模数据。" >> /home/hadoop/sample.txt hadoop@hadoop:~/hadoop$ echo "2. 它具有高容错性,并能提供高吞吐量的数据访问。" >> /home/hadoop/sample.txt hadoop@hadoop:~/hadoop$ echo "3. HDFS的核心组件包括NameNode、DataNode和SecondaryNameNode。" >> /home/hadoop/sample.txt hadoop@hadoop:~/hadoop$ echo "4. 文件在HDFS中以块的形式存储,块大小默认是128MB。" >> /home/hadoop/sample.txt hadoop@hadoop:~/hadoop$ echo "5. HDFS支持数据冗余,确保数据的可靠性。" >> /home/hadoop/sample.txt hadoop@hadoop:~/hadoop$ # 先检查文件是否存在于HDFS中 hadoop@hadoop:~/hadoop$ hadoop fs -test -e /user/sample.txt hadoop@hadoop:~/hadoop$ if [ $? -eq 0 ]; then > echo "文件已存在,是否覆盖 (y/n)?" > read choice > if [ "$choice" == "y" ]; then > hadoop fs -put -f /home/hadoop/sample.txt /user > echo "文件已覆盖" > else > echo "选择追加内容" > cat /home/hadoop/sample.txt | hadoop fs -appendToFile - /user/sample.txt > echo "内容已追加到文件末尾" > fi > else > hadoop fs -put /home/hadoop/sample.txt /user > echo "文件已上传" > fi 2024-11-11 16:15:56,750 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 文件已上传 hadoop@hadoop:~/hadoop$ ^C hadoop@hadoop:~/hadoop$ hadoop fs -ls /user Found 2 items drwxr-xr-x - hadoop supergroup 0 2024-11-04 17:10 /user/hadoop -rw-r--r-- 1 hadoop supergroup 385 2024-11-11 16:15 /user/sample.txt hadoop@hadoop:~/hadoop$ hadoop fs -get /user/sample.txt /home/hadoop/sample_download.txt 2024-11-11 16:17:04,040 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false hadoop@hadoop:~/hadoop$ hadoop fs -cat /user/sample.txt 2024-11-11 16:17:17,969 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false Hadoop HDFS 文件系统 ====================
1. HDFS是一个分布式文件系统,用于存储大规模数据。 2. 它具有高容错性,并能提供高吞吐量的数据访问。 3. HDFS的核心组件包括NameNode、DataNode和SecondaryNameNode。 4. 文件在HDFS中以块的形式存储,块大小默认是128MB。 5. HDFS支持数据冗余,确保数据的可靠性。 hadoop@hadoop:~/hadoop$ hadoop fs -ls -l /user/sample.txt -ls: Illegal option -l Usage: hadoop fs [generic options] [-appendToFile <localsrc> ... <dst>] [-cat [-ignoreCrc] <src> ...] [-checksum <src> ...] [-chgrp [-R] GROUP PATH...] [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>] [-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] <path> ...] [-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>] [-createSnapshot <snapshotDir> [<snapshotName>]] [-deleteSnapshot <snapshotDir> <snapshotName>] [-df [-h] [<path> ...]] [-du [-s] [-h] [-v] [-x] <path> ...] [-expunge] [-find <path> ... <expression> ...] [-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-getfacl [-R] <path>] [-getfattr [-R] {-n name | -d} [-e en] <path>] [-getmerge [-nl] [-skip-empty-file] <src> <localdst>] [-head <file>] [-help [cmd ...]] [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]] [-mkdir [-p] <path> ...] [-moveFromLocal <localsrc> ... <dst>] [-moveToLocal <src> <localdst>] [-mv <src> ... <dst>] [-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>] [-renameSnapshot <snapshotDir> <oldName> <newName>] [-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...] [-rmdir [--ignore-fail-on-non-empty] <dir> ...] [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]] [-setfattr {-n name [-v value] | -x name} <path>] [-setrep [-R] [-w] <rep> <path> ...] [-stat [format] <path> ...] [-tail [-f] [-s <sleep interval>] <file>] [-test -[defsz] <path>] [-text [-ignoreCrc] <src> ...] [-touch [-a] [-m] [-t TIMESTAMP ] [-c] <path> ...] [-touchz <path> ...] [-truncate [-w] <length> <path> ...] [-usage [cmd ...]]
Generic options supported are: -conf <configuration file> specify an application configuration file -D <property=value> define a value for a given property -fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations. -jt <local|resourcemanager:port> specify a ResourceManager -files <file1,...> specify a comma-separated list of files to be copied to the map reduce cluster -libjars <jar1,...> specify a comma-separated list of jar files to be included in the classpath -archives <archive1,...> specify a comma-separated list of archives to be unarchived on the compute machines
The general command line syntax is: command [genericOptions] [commandOptions]
Usage: hadoop fs [generic options] -ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...] hadoop@hadoop:~/hadoop$ hadoop fs -ls -R /user drwxr-xr-x - hadoop supergroup 0 2024-11-04 17:10 /user/hadoop drwxr-xr-x - hadoop supergroup 0 2024-11-04 17:10 /user/hadoop/test -rw-r--r-- 1 hadoop supergroup 4009 2024-11-04 17:10 /user/hadoop/test/.bashrc -rw-r--r-- 1 hadoop supergroup 385 2024-11-11 16:15 /user/sample.txt hadoop@hadoop:~/hadoop$ hadoop fs -mkdir -p /user/newdir hadoop@hadoop:~/hadoop$ hadoop fs -put /home/hadoop/sample.txt /user/newdir/sample.txt 2024-11-11 16:17:58,136 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false hadoop@hadoop:~/hadoop$ hadoop fs -rm /user/newdir/sample.txt Deleted /user/newdir/sample.txt hadoop@hadoop:~/hadoop$ hadoop fs -mkdir -p /user/newdir hadoop@hadoop:~/hadoop$ hadoop fs -rm -r /user/newdir Deleted /user/newdir hadoop@hadoop:~/hadoop$ # 追加到文件末尾 hadoop@hadoop:~/hadoop$ hadoop fs -appendToFile /home/hadoop/sample.txt /user/sample.txt 2024-11-11 16:18:33,211 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false hadoop@hadoop:~/hadoop$ hadoop@hadoop:~/hadoop$ # 如果用户指定追加到开头,HDFS本身不直接支持此操作,你需要先下载文件,修改内容后重新上传 hadoop@hadoop:~/hadoop$ hadoop fs -get /user/sample.txt /home/hadoop/sample.txt get: `/home/hadoop/sample.txt': File exists hadoop@hadoop:~/hadoop$ echo "新内容" | cat - /home/hadoop/sample.txt > /home/hadoop/temp.txt hadoop@hadoop:~/hadoop$ mv /home/hadoop/temp.txt /home/hadoop/sample.txt hadoop@hadoop:~/hadoop$ hadoop fs -put -f /home/hadoop/sample.txt /user/sample.txt 2024-11-11 16:18:41,903 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false hadoop@hadoop:~/hadoop$ hadoop fs -rm /user/sample.txt Deleted /user/sample.txt hadoop@hadoop:~/hadoop$ hadoop fs -mv /user/sample.txt /user/backup/sample.txt mv: `/user/backup/sample.txt': No such file or directory: `hdfs://localhost:9000/user/backup/sample.txt' hadoop@hadoop:~/hadoop$ hadoop fs -cat /etc/hadoop/core-site.xml
cat: `/etc/hadoop/core-site.xml': No such file or directory hadoop@hadoop:~/hadoop$ hadoop@hadoop:~/hadoop$ hdfs getconf -confKey fs.defaultFS hdfs://localhost:9000/
|
||||
出现的问题: |
||||
解决方案(列出遇到的问题和解决办法,列出没有解决的问题): |