目录
- 1. 解压Oozie官方提供的案例
- 2. 编辑文件
- 3. 拷贝待执行的jar包到map-reduce的lib目录下
- 4. 上传配置好的app文件夹到HDFS
- 5. 执行任务
1. 解压Oozie官方提供的案例
oozie-apps目录可以选择性自行创建
[hadoop@hadoop201 oozie-4.0.0-cdh5.3.6]$ tar -zxvf oozie-examples.tar.gz -C ./
[hadoop@hadoop201 oozie-4.0.0-cdh5.3.6]$ cp -r examples/apps/map-reduce/ oozie-apps/
2. 编辑文件
job.properties
nameNode=hdfs://hadoop201:8020
jobTracker=hadoop202:8021
queueName=default
examplesRoot=oozie-apps
#hdfs://hadoop201:8020/user/hadoop/oozie-apps/map-reduce/workflow.xml
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml
outputDir=map-reduce
找到一个可以运行的mapreduce任务的jar包(这里以hadoop官方提供的wordcount为例)
1、编辑文件
[hadoop@hadoop201 map-reduce]$ vim word.txt
2、随便添加一些单词
3、上传到hdfs文件系统
[hadoop@hadoop201 map-reduce]$ /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/bin/hdfs dfs -put /opt/modules/cdh/oozie-4.0.0-cdh5.3.6/oozie-apps/map-reduce/word.txt /
4、运行官方worldcount案例
[hadoop@hadoop201 hadoop-2.5.0-cdh5.3.6]$ pwd
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6
[hadoop@hadoop201 hadoop-2.5.0-cdh5.3.6]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount /word.txt /output
5、从历史服务器中获取mapper reducer 的全类名
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/output/"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<!-- 配置调度MR任务时,使用新的API -->
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<!-- 指定Job Key输出类型 -->
<property>
<name>mapreduce.job.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<!-- 指定Job Value输出类型 -->
<property>
<name>mapreduce.job.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
<!-- 指定要处理文件所在的hdfs的路径 -->
<property>
<name>mapred.input.dir</name>
<value>/word.txt</value>
</property>
<!-- 指定处理数据后的输出hdfs的路径 -->
<property>
<name>mapred.output.dir</name>
<value>/output1/</value>
</property>
<!-- 指定Map类 -->
<!-- 这里用到了mapper的全类名 -->
<property>
<name>mapreduce.job.map.class</name>
<value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value>
</property>
<!-- 指定Reduce类 -->
<!-- 这里用到了reducer的全类名 -->
<property>
<property>
<name>mapreduce.job.reduce.class</name>
<value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
3. 拷贝待执行的jar包到map-reduce的lib目录下
[hadoop@hadoop201 lib]$ cp /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar /opt/modules/cdh/oozie-4.0.0-cdh5.3.6/oozie-apps/map-reduce/lib/
4. 上传配置好的app文件夹到HDFS
[hadoop@hadoop201 oozie-4.0.0-cdh5.3.6]$ /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/bin/hdfs dfs -put oozie-apps/map-reduce/ /user/hadoop/oozie-apps/
5. 执行任务
[hadoop@hadoop201 oozie-4.0.0-cdh5.3.6]$ bin/oozie job -oozie http://hadoop201:11000/oozie -config oozie-apps/map-reduce/job.properties -run