1.为何要使用lzo
看这里,http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
中文的也很多,搜索一下吧
2.安装流程(仅限linux centos 5.7通过)
为编译hadoop的lzo准备的库
curl -O http://www.oberhumer.com/opensource/lzo/download/lzo-2.06.tar.gz
tar zxvf lzo-2.06.tar.gz
cd lzo-2.06
./configure --enable-shared
make
make install
#64位
cp /usr/local/lib/liblzo2* /usr/lib64/
#32位
cp /usr/local/lib/liblzo2* /usr/lib/
#64位
cp /usr/local/lib/liblzo2* /usr/lib64/
#32位
cp /usr/local/lib/liblzo2* /usr/lib/
有问题可安装rpm包
wget http://apt.sw.be/redhat/el5/en/x86_64/rpmforge/RPMS/lzo-devel-2.06-1.el5.rf.x86_64.rpm
wget http://apt.sw.be/redhat/el5/en/x86_64/rpmforge/RPMS/lzo-2.06-1.el5.rf.x86_64.rpm
rpm -ivh lzo-2.06-1.el5.rf.x86_64.rpm
rpm -ivh lzo-devel-2.06-1.el5.rf.x86_64.rpm
3.安装 hadoop-lzo
#来源https://github.com/twitter/hadoop-lzo/
部分网来上提供的是https://github.com/kevinweil/hadoop-lzo,这个是老版本的
wget https://github.com/twitter/hadoop-lzo/archive/master.zip
unzip master
#更新hadoop-lzo中的pom.xml
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<hadoop.current.version>2.2.0</hadoop.current.version>
<hadoop.old.version>1.0.4</hadoop.old.version>
</properties>
export CFLAGS=-m64
export CXXFLAGS=-m64
mvn clean package -Dmaven.test.skip=true
cd target/native/Linux-amd64-64
tar -cBf - -C lib . | tar -xBvf - -C ./
cp ./libgplcompression* /opt/modules/hadoop/lib/native/
cp target/hadoop-lzo-0.4.20-SNAPSHOT.jar /opt/modules/hadoop/share/hadoop/common/
(这一步很重要的,拷贝到<span style="font-family: Arial, Helvetica, sans-serif;">hadoop/lib下,我这里是不能发现该jar的</span>)
4.更新配置
core-site.xml
<property>
<name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>
mapred-site.xml(可选)
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
5.lzo文件的处理,创建索引
hadoop jar /path/to/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer /lzo_logs
6.mapreduce的变化
一般的jar包,使用的是新接口,用LzoTextInputFormat代替TextInputFormat即可
stream方式的mapreduce增加参数 -inputformat com.hadoop.mapred.DeprecatedLzoTextInputFormat
7.其他,参考源码readme:
https://github.com/twitter/hadoop-lzo/
8.本地支持lzo文件
#安装lzop-1.03.tar.gz
tar zxvf lzop-1.03.tar.gz
cd lzop-1.03
./configure
make
make install