- 现象:
- Databus 实时同步任务失败
- 报错:
- 结论:
- 当前hdfs目录下超过了最大可容纳文件个数,默认是1048576
- 目录统计
#统计该目录下文件数量
HADOOP_CLIENT_OPTS="-Xmx4096m" hdfs dfs -ls -h /databus_online_class/class/class_stock_relation | wc -l
#查看该目录下最新的10个文件
HADOOP_CLIENT_OPTS="-Xmx4096m" hdfs dfs -ls -h /databus_online_class/class/class_stock_relation | tail -10
#查看该目录被访问的审计日志
HADOOP_CLIENT_OPTS="-Xmx4096m" hdfs dfs -text /ranger/audit/hdfs/202305*/* |grep '/databus_online_class/class/class_stock_relation'
#跳过垃圾桶,删除该目录下文件
HADOOP_CLIENT_OPTS="-Xmx4096m" hdfs dfs -rm -skipTrash /databus_online_class/class/class_stock_relation/2020*
- /databus_online_class/class/class_stock_relation
- count:926133
- tail :2022121504-20221215041000-9.gz
- last visit:最近一周无访问(一周前的审计日志缺失,无法确认一周前的有无访问)
- /databus_online_class/class/flow_compensate_operate
- count:1048577(已满)
- tail :2023020311-20230205194000-0.gz.tmp
- last visit:有写请求、但失败;无读请求
- /databus_online_class/class/learning_progress
- count:1036229
- tail :2023051214-20230516011000-0.gz.tmp
- last visit:有写请求、成功;无读请求
- /databus_online_class/class/online_class
- count:1048577(已满)
- tail :2022121506-20221215070000-0.gz
- last visit:有写请求、但失败;无读请求
- /databus_online_class/class/online_class_extend
- count:970881
- tail :2022121506-20221215070000-0.gz
- last visit:最近一周无访问(一周前的审计日志缺失,无法确认一周前的有无访问)
- /databus_online_class/class/online_class_student
- count:983171
- tail :2022121506-20221215070000-0.gz
- last visit:最近一周无访问(一周前的审计日志缺失,无法确认一周前的有无访问)
- /databus_online_class/class/order_compensate_operate
- count:7128
- tail :2021080720-20210807204000-7.gz
- last visit:最近一周无访问(一周前的审计日志缺失,无法确认一周前的有无访问)
- /databus_online_class/class/require_class
- count:614347
- tail :2022120908-20221209084000-0.gz
- last visit:最近一周无访问(一周前的审计日志缺失,无法确认一周前的有无访问)
- 解决
- hadoop-achive 归档
#!/bin/bash
year_arr=(2019 2020 2021 2022)
dir_arr=(flow_compensate_operate online_class_extend online_class_student order_compensate_operate require_class)
source_dir=/databus_online_class/class
tmp_dir=/tmp/backup
for dir in ${dir_arr[*]};
do
for year in ${year_arr[*]};
do
echo 'hdfs dfs -mkdir -p '$tmp_dir'/'$dir'/'$year''
hdfs dfs -mkdir -p $tmp_dir/$dir/$year
echo 'HADOOP_CLIENT_OPTS="-Xmx20480m" hadoop distcp -m 400 '$source_dir'/'$dir'/'$year'* '$tmp_dir'/'$dir'/'$year'/'
HADOOP_CLIENT_OPTS="-Xmx20480m" hadoop distcp -m 400 $source_dir/$dir/$year* $tmp_dir/$dir/$year/
echo 'HADOOP_CLIENT_OPTS="-Xmx8192m" hadoop archive -archiveName '$year'_history.har -p '$tmp_dir'/'$dir'/'$year' '$tmp_dir'/'$dir''
HADOOP_CLIENT_OPTS="-Xmx8192m" hadoop archive -archiveName ${year}_history.har -p $tmp_dir/$dir/$year $tmp_dir/$dir
echo -----------
sleep 60s
done
done
标签:tmp,归档,year,Hadoop,databus,online,har,class,dir
From: https://blog.51cto.com/u_11701690/6600600