首页 > 其他分享 >把xls的数据导到Hbase

把xls的数据导到Hbase

时间:2023-09-21 10:01:32浏览次数:37  
标签:导到 hadoop job values org apache import Hbase xls


这属于Hbase的一个例子,不过Hbase的例子有点问题,需要更改下。
其实我感觉Hbase属于一个BigTable,感觉和xls真的很像,闲话不说了,上code才是王道。


Java代码


import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.log4j.Logger;

/**
 * Sample Uploader MapReduce
 * <p>
 * This is EXAMPLE code.  You will need to change it to work for your context.
 * <p>
 * Uses {@link TableReducer} to put the data into HBase. Change the InputFormat
 * to suit your data.  In this example, we are importing a CSV file.
 * <p>
 * <pre>row,family,qualifier,value</pre>
 * <p>
 * The table and columnfamily we're to insert into must preexist.
 * <p>
 * There is no reducer in this example as it is not necessary and adds
 * significant overhead.  If you need to do any massaging of data before
 * inserting into HBase, you can do this in the map as well.
 * <p>Do the following to start the MR job:
 * <pre>
 * ./bin/hadoop org.apache.hadoop.hbase.mapreduce.SampleUploader /tmp/input.csv TABLE_NAME
 * </pre>
 * <p>
 * This code was written against HBase 0.21 trunk.
 */
public class SampleUploader {

	public static Logger loger = Wloger.loger;

  private static final String NAME = "SampleUploader";

  static class Uploader
  extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {

    private long checkpoint = 100;
    private long count = 0;

    @Override
    public void map(LongWritable key, Text line, Context context)
    throws IOException {

      // Input is a CSV file
      // Each map() is a single line, where the key is the line number
      // Each line is comma-delimited; row,family,qualifier,value

      // Split CSV line
      String [] values = line.toString().split(",");
      if(values.length != 4) {
        return;
      }

      // Extract each value
      byte [] row = Bytes.toBytes(values[0]);
      byte [] family = Bytes.toBytes(values[1]);
      byte [] qualifier = Bytes.toBytes(values[2]);
      byte [] value = Bytes.toBytes(values[3]);
      loger.info(values[0]+":"+values[1]+":"+values[2]+":"+values[3]);

      // Create Put
      Put put = new Put(row);
      put.add(family, qualifier, value);

      // Uncomment below to disable WAL. This will improve performance but means
      // you will experience data loss in the case of a RegionServer crash.
      // put.setWriteToWAL(false);

      try {
        context.write(new ImmutableBytesWritable(row), put);
      } catch (InterruptedException e) {
        e.printStackTrace();
        loger.error("write到hbase 异常:",e);
      }

      // Set status every checkpoint lines
      if(++count % checkpoint == 0) {
        context.setStatus("Emitting Put " + count);
      }
    }
  }

  /**
   * Job configuration.
   */
  public static Job configureJob(Configuration conf, String [] args)
  throws IOException {
    Path inputPath = new Path(args[0]);
    String tableName = args[1];
    Job job = new Job(conf, NAME + "_" + tableName);
    job.setJarByClass(Uploader.class);
    FileInputFormat.setInputPaths(job, inputPath);
    job.setInputFormatClass(TextInputFormat.class);
    job.setMapperClass(Uploader.class);
    // No reducers.  Just write straight to table.  Call initTableReducerJob
    // because it sets up the TableOutputFormat.
    loger.error("TableName:"+tableName);
    TableMapReduceUtil.initTableReducerJob(tableName, null, job);
    job.setNumReduceTasks(0);
    return job;
  }

  /**
   * Main entry point.
   *
   * @param args  The command line parameters.
   * @throws Exception When running the job fails.
   */
  public static void main(String[] args) throws Exception {
    Configuration conf = HBaseConfiguration.create();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if(otherArgs.length != 2) {
      System.err.println("Wrong number of arguments: " + otherArgs.length);
      System.err.println("Usage: " + NAME + " <input> <tablename>");
      System.exit(-1);
    }
    Job job = configureJob(conf, otherArgs);
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Map/Reduce的输入/输出就不说了,不懂的,可以看hadoop专栏去.
[这个任务调用和上一个IndexBuilder有些不同哦,具体的可以参照上一个例子,相同点:都只有map任务]
xls内容如下:


Java代码

key3,family1,column1,xls1
key3,family1,column2,xls11
key4,family1,column1,xls2
key4,family1,column2,xls12

这是csv格式的,如果是xls是可以导为csv格式的,具体可以google一下.
运行命令如下:


Java代码

bin/hadoop jar SampleUploader.jar SampleUploader /tmp/input.csv 'table1'

这里的'table1'是上一遍IndexBuilder的时候建的表,表就使用上一张表[懒]
注意,这里使用的文件需要提交到hdfs上,否则会提示找不到,因为map/reduce是使用的是hdfs的文件系统.


标签:导到,hadoop,job,values,org,apache,import,Hbase,xls
From: https://blog.51cto.com/u_16255870/7548764

相关文章

  • Hbase--执行hbase shell命令时提示:ERROR: KeeperErrorCode = NoNode for /hbase/mast
    1、问题描述执行hbase shell命令时提示:ERROR:KeeperErrorCode=NoNodefor/hbase/master2、问题原因这是与因为服务器重启后Hadoop的运行和Hbase的运行异常。3、解决办法依次去停止和启动Hadoop(1)到hadoop的sbin目录下 ./stop-all.sh(2)再./start-all.sh(3)再到hbase的b......
  • Hbase Shell的常用命令
    总结的一些Hbaseshell的命令都很简单,可以help来查看帮助create'user_test','info'describe'user_test'disable'user_testinfo'drop'user_testinfo'put'user_test','test-1','info:username','test1......
  • HFile详解-基于HBase0.90.5
    1.HFile详解HFile文件分为以下六大部分 序号名称描述1数据块由多个block(块)组成,每个块的格式为:[块头]+[key长]+[value长]+[key]+[value]。2元数据块元数据是key-value类型的值,但元数据快只保存元数据的value值,元数据的key值保存在第五项(元数据索引块)中。该块由多个元数......
  • HBase HFile与Prefix Compression内部实现全解--KeyValue格式
    1.引子 HFile(HBaseFile)是HBase使用的一种文件存储格式的抽象, 目前存在两种版本的HFile:HFileV1和HFileV2 HBase0.92之前的版本仅支持HFileV1,HBase0.92/0.94同时支持HFileV1和HFileV2。 以下分别是HFileV1/V2的结构图: HFileV1HFileV2(注:这两个图片在hbase......
  • HBase_API_(HBaseDML,对数据的api)
    对表中数据进行以下操作:静态属性1.插入数据2.读取数据3.扫描数据4.5.HBaseConnection.java(提供connection连接)packagecom.atguigu;importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.hbase.client.Connection;importorg.apache.hadoop.h......
  • HBase_API_(HBaseDDL,对表的api)
    对hbase数据表进行以下操作:1.创建命名空间2.判断表是否存在3.创建表格4.修改表格5.删除表格注意:对表格的操作要调用admin,对数据进行操作调用table(这篇博客没有涉及到)packagecom.atguigu;importorg.apache.hadoop.hbase.NamespaceDescriptor;importorg.apache.hadoop......
  • python处理xls数据并保存到mysql数据库
    #-*-coding:utf-8-*-#CreatedbyY.W.on2017/7/3117:46.importpymysqlimportxlrd#获取xlsx文件,获取sheet文件try:book=xlrd.open_workbook('D:/test.xls')sheet=book.sheet_by_name(u'Sheet1')exceptExceptionase:prin......
  • hbase truncate table后没有释放空间
    HBase中的truncatetable操作会清空表中的所有数据,但不会立即释放物理存储空间。这是因为HBase使用一种称为“MajorCompaction(主要合并)”的过程来清理和释放存储空间。MajorCompaction是HBase自动执行的周期性任务,通常在后台进行。MajorCompaction将删除表中已标记为删除的数......
  • HBase学习9(phoenix两种方式预分区)
    P351.phoenix预分区如要分4个分区建表命令如下:createtableifnotexistsORDER_DTL("id"varcharprimarykey,C1."status"varchar,C1."money"float,C1."pay_way"integer,C1."user_id"varchar,C1."operation_time&q......
  • 每日学习之Hbase的高可用
    7.1Hbase高可用简介HBase的高可用配置其实就是HMaster的高可用。要搭建HBase的高可用,只需要再选择一个节点作为HMaster。7.2搭建HBase的高可用教程连接:018.HBase的HMaster高可用_哔哩哔哩_bilibilihbase配置高可用之后,对原来的Java代码是不影响的8.HBase叫个屁8.1系统架构8......