首页 > 其他分享 >Gora – 大数据持久化

Gora – 大数据持久化

时间:2023-08-04 17:01:51浏览次数:29  
标签:持久 gora demo args Gora import apache org 数据


gora-demo托管于github

 

wget http://mirrors.cnnic.cn/apache/gora/0.3/apache-gora-0.3-src.zip

unzip apache-gora-0.3-src.zip

cd apache-gora-0.3

mvn clean package

1、创建项目

mvn archetype:create -DgroupId=org.apdplat.demo.gora -DartifactId=gora-demo

2、增加依赖

vi gora-demo/pom.xml

在<dependencies>标签内增加:

<dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-core</artifactId>
              <version>1.2.1</version>
       </dependency>
       <dependency>
              <groupId>org.apache.hbase</groupId>
              <artifactId>hbase</artifactId>
              <version>0.94.12</version>
       </dependency>
       <dependency>
              <groupId>org.apache.gora</groupId>
              <artifactId>gora-core</artifactId>
              <version>0.3</version>
                     <exclusions>
                                   <exclusion>
                                                 <groupId>org.apache.hadoop</groupId>
                                                 <artifactId>hadoop-core</artifactId>
                                   </exclusion>
                                   <exclusion>
                                                 <groupId>org.apache.cxf</groupId>
                                                 <artifactId>cxf-rt-frontend-jaxrs</artifactId>
                                   </exclusion>
                     </exclusions>
       </dependency>
       <dependency>
              <groupId>org.apache.gora</groupId>
              <artifactId>gora-hbase</artifactId>
              <version>0.3</version>
                     <exclusions>
                                   <exclusion>
                                                 <groupId>org.apache.hbase</groupId>
                                                 <artifactId>hbase</artifactId>
                                   </exclusion>
                                   <exclusion>
                                                 <groupId>org.apache.hadoop</groupId>
                                                 <artifactId>hadoop-test</artifactId>
                                   </exclusion>
                     </exclusions>
       </dependency>

3、数据建模

mkdir -p gora-demo/src/main/avro

vi gora-demo/src/main/avro/person.json

输入:

  

{
        "type": "record",
        "name": "Person",
        "namespace":"org.apdplat.demo.gora.generated",
        "fields" : [
             {"name":"idcard", "type": "string"},
             {"name":"name", "type": "string"},
             {"name":"age", "type": "string"}
        ]
      }

4、生成JAVA

bin/gora  goracompiler  gora-demo/src/main/avro/person.json  gora-demo/src/main/java/

5、模型映射

mkdir -p gora-demo/src/main/resources/

vi gora-demo/src/main/resources/gora-hbase-mapping.xml

输入:

<gora-orm>
        <table name="Person">
             <familyname="basic"/>
             <familyname="detail"/>
        </table>
        <class table="Person"name="org.apdplat.demo.gora.generated.Person"keyClass="java.lang.String">
         <field name="idcard"family="basic" qualifier="idcard"/>
         <field name="name"family="basic" qualifier="name"/>
         <field name="age"family="detail" qualifier="age"/>
        </class>
      </gora-orm>

6Gora配置

vi gora-demo/src/main/resources/gora.properties

输入:

      gora.datastore.default=org.apache.gora.hbase.store.HBaseStore

      gora.datastore.autocreateschema=true

7Hbase配置

vi gora-demo/src/main/resources/hbase-site.xml

输入:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl"href="configuration.xsl"?>
 
<configuration>
 <property>
   <name>hbase.zookeeper.property.clientPort</name>
   <value>2181</value>
 </property>
 <property>
   <name>hbase.zookeeper.quorum</name>
   <value>host001</value>
 </property>
</configuration>

8、编写PersonManager.javaPersonAnalytics.java

vi gora-demo/src/main/java/org/apdplat/demo/gora/PersonManager.java

    输入:

package org.apdplat.demo.gora;
 
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.text.ParseException;
import org.apache.avro.util.Utf8;
import org.apache.gora.query.Query;
import org.apache.gora.query.Result;
import org.apache.gora.store.DataStore;
import org.apache.gora.store.DataStoreFactory;
import org.apache.hadoop.conf.Configuration;
import org.apdplat.demo.gora.generated.Person;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
 
publicclass PersonManager {
   privatestaticfinal Logger log = LoggerFactory.getLogger(PersonManager.class);    
   private DataStore<String, Person> dataStore;   
   public PersonManager() {
     try{
        init();
     } catch(IOException ex) {
        thrownew RuntimeException(ex);
     }
   }
   privatevoid init() throws IOException {
      Configuration  conf = new Configuration();
     dataStore= DataStoreFactory.getDataStore(String.class, Person.class, conf);
   }
   privatevoid parse(String input) throws IOException,ParseException, Exception {
     log.info("解析文件:" + input);
     BufferedReader reader = new BufferedReader(new FileReader(input));
     longlineCount = 0;
     try{
        String line = reader.readLine();
        do {
          Person person = parseLine(line);
         
          if(person != null) {
            //入库
           storePerson(person.getIdcard().toString(), person);
          }
          lineCount++;
          line = reader.readLine();
        } while(line != null);
       
     } finally{
        reader.close(); 
     }
     log.info("文件解析完毕. 总人数:" + lineCount);
   }
   private Person parseLine(String line) throws ParseException {
            String[] attrs = line.split(" ");
        String idcard = attrs[0];
        String name = attrs[1];
     String age = attrs[2];
     
     Person person = new Person();
     person.setIdcard(new Utf8(idcard));
     person.setName(new Utf8(name));
     person.setAge(new Utf8(age));
     
     return person;
   }
   privatevoid storePerson(String key,Person person) throwsIOException, Exception {
          log.info("保存人员信息: " + person.getIdcard()+"\t"+person.getName()+"\t"+person.getAge());
     dataStore.put(key,person);
   }
   privatevoid get(String key) throws IOException, Exception{
     Person person = dataStore.get(key);
     printPerson(person);
   }
   privatevoid query(String key) throws IOException, Exception{
     Query<String, Person> query = dataStore.newQuery();
     query.setKey(key);
     
     Result<String, Person> result = query.execute();
     
     printResult(result);
   }
   privatevoid query(String startKey,String endKey) throwsIOException, Exception {
     Query<String, Person> query = dataStore.newQuery();
     query.setStartKey(startKey);
     query.setEndKey(endKey);
     
     Result<String, Person> result = query.execute();
     
     printResult(result);
   }
   privatevoid delete(String key) throws Exception {
     dataStore.delete(key);
     dataStore.flush();
     log.info("身份证号码为:" + key + " 的人员信息被删除");
   }
   privatevoid deleteByQuery(StringstartKey, String endKey) throws IOException, Exception {
     Query<String, Person> query = dataStore.newQuery();
     query.setStartKey(startKey);
     query.setEndKey(endKey);
     
     dataStore.deleteByQuery(query);
     log.info("身份证号码从 " + startKey + " 到 " + endKey + " 的人员信息被删除");
   }
   privatevoid printResult(Result<String, Person> result) throws IOException, Exception {     
     while(result.next()){
     String resultKey =result.getKey();
     Person resultPerson =result.get();
       
     System.out.println(resultKey + ":");
     printPerson(resultPerson);
     }
     
     System.out.println("人数:" + result.getOffset());
   }
   privatevoid printPerson(Personperson) {
     if(person== null){
        System.out.println("没有结果");
     } else{
       System.out.println(person.getIdcard()+"\t"+person.getName()+"\t"+person.getAge());
     }
   }
   privatevoid close() throws IOException, Exception{
     if(dataStore != null)
        dataStore.close();
   } 
   privatestaticfinal String USAGE = "PersonManager -parse<input_person_file>\n" +
                                        "          -get <idcard>\n" +
                                        "          -query <idcard>\n" +
                                        "          -query <startIdcard> <endIdcard>\n" +
                                       "          -delete <idcard>\n" +
                                       "          -deleteByQuery <startIdcard> <endIdcard>\n";
   
   publicstaticvoid main(String[] args) throws Exception {
     if(args.length < 2) {
        System.err.println(USAGE);
        System.exit(1);
     }
     
     PersonManager manager = new PersonManager();
     
     if("-parse".equals(args[0])){
        manager.parse(args[1]);
     } elseif("-get".equals(args[0])){
        manager.get(args[1]);
     } elseif("-query".equals(args[0])){
        if(args.length == 2)
          manager.query(args[1]);
        else
          manager.query(args[1], args[2]);
     } elseif("-delete".equals(args[0])){
        manager.delete(args[1]);
     } elseif("-deleteByQuery".equalsIgnoreCase(args[0])){
        manager.deleteByQuery(args[1], args[2]);
     } else{
        System.err.println(USAGE);
        System.exit(1);
     }
     
     manager.close();
   }
}

vi gora-demo/src/main/java/org/apdplat/demo/gora/PersonAnalytics.java

    输入:

package org.apdplat.demo.gora;
 
import java.io.IOException;
 
import org.apache.avro.util.Utf8;
import org.apache.gora.mapreduce.GoraMapper;
import org.apache.gora.store.DataStore;
import org.apache.gora.store.DataStoreFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apdplat.demo.gora.generated.Person;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
 
publicclass PersonAnalytics extends Configured implements Tool {
    privatestaticfinal Logger log= LoggerFactory
            .getLogger(PersonAnalytics.class);
 
    publicstaticclassPersonAnalyticsMapper extends
            GoraMapper<String,Person, Text, LongWritable> {
        private LongWritable one = new LongWritable(1L);
 
        @Override
        protectedvoid map(String key, Person person, Contextcontext)
                throws IOException,InterruptedException {
            Utf8 age =person.getAge();
            context.write(new Text(age.toString()), one);
        };
    }
 
    publicstaticclassPersonAnalyticsReducer extends
            Reducer<Text,LongWritable, Text, LongWritable> {
        @Override
        protectedvoid reduce(Text key,Iterable<LongWritable> values,
                Context context) throws IOException,InterruptedException {
            long sum = 0L;
            for (LongWritable value :values) {
                sum += value.get();
            }
            context.write(key, new LongWritable(sum));
        };
    }
 
    public Job createJob(DataStore<String,Person> inStore, int numReducer)
            throws IOException {
        Job job = new Job(getConf());
        job.setJobName("Person Analytics");
        log.info("Creating Hadoop Job: " +job.getJobName());
        job.setNumReduceTasks(numReducer);
        job.setJarByClass(getClass());
        GoraMapper.initMapperJob(job,inStore, Text.class,LongWritable.class,
                PersonAnalyticsMapper.class, true);
        job.setReducerClass(PersonAnalyticsReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);
        TextOutputFormat
                .setOutputPath(job,newPath("person-analytics-output"));
        return job;
    }
 
    @Override
    publicint run(String[] args) throws Exception {
        DataStore<String,Person> inStore;
        Configuration conf = new Configuration();
        if (args.length == 1) {
            String dataStoreClass =args[0];
            inStore =DataStoreFactory.getDataStore(dataStoreClass,
                    String.class, Person.class, conf);
        } else {
            inStore =DataStoreFactory.getDataStore(String.class, Person.class,
                    conf);
        }
        Job job = createJob(inStore,2);
        boolean success = job.waitForCompletion(true);
        inStore.close();
        log.info("PersonAnalytics completed with "
                + (success ? "success": "failure"));
        return success ? 0 : 1;
    }
 
    publicstaticvoidmain(String[] args) throws Exception {
        int ret = ToolRunner.run(new PersonAnalytics(),args);
        System.exit(ret);
    }
}

9、准备数据

        vi gora-demo/src/main/resources/persons.txt

    输入:

      533001198510125839 杨尚川 25

      533001198510125840 杨尚华 22

      533001198510125841 刘德华 55

      533001198510125842 刘亦菲 25

      533001198510125843 蔡卓妍 25

      533001198510125844 林志玲 22

               533001198510125845 李连杰 55

10、在Linux命令行使用maven2编译运行项目

cd gora-demo

mvn clean compile

mvn exec:java -Dexec.mainClass=org.apdplat.demo.gora.PersonManager

mvn exec:java -Dexec.mainClass="org.apdplat.demo.gora.PersonManager" -Dexec.args="-parse src/main/resources/persons.txt"

mvn exec:java -Dexec.mainClass=org.apdplat.demo.gora.PersonAnalytics

cat person-analytics-output/part-r-00000

mvn exec:java -Dexec.mainClass="org.apdplat.demo.gora.PersonManager" -Dexec.args="-get 533001198510125842"

mvn exec:java -Dexec.mainClass="org.apdplat.demo.gora.PersonManager" -Dexec.args="-query 533001198510125844"

mvn exec:java -Dexec.mainClass="org.apdplat.demo.gora.PersonManager" -Dexec.args="-query 533001198510125842 533001198510125845"

mvn exec:java -Dexec.mainClass="org.apdplat.demo.gora.PersonManager" -Dexec.args="-delete 533001198510125840"

mvn exec:java -Dexec.mainClass="org.apdplat.demo.gora.PersonManager" -Dexec.args="-deleteByQuery 533001198510125841 533001198510125842"

mvn exec:java -Dexec.mainClass="org.apdplat.demo.gora.PersonManager" -Dexec.args="-deleteByQuery 533001198510125845 533001198510125846"

mvn exec:java -Dexec.mainClass="org.apdplat.demo.gora.PersonManager" -Dexec.args="-query 533001198510125838 533001198510125848"

11、在windows下使用eclipse编译运行项目

mvn clean package

rm -r target

vi .classpath

删除所有包含path="M2_REPO的行

删除<classpathentry kind="src" path="target/maven-shared-archive-resources"excluding="**/*.java"/>

通过WinSCP把gora-demo传到windows

http://yangshangchuan.iteye.com/blog/1839784下载修改过的hadoop-core-1.2.1.jar替换文件gora-demo\lib\hadoop-core-1.2.1.jar

将gora-demo导入eclipse

将lib下的所有jar加入构建路径

 

12、打包项目并提交Hadoop运行

cd gora-demo

mvn clean package

mkdir job

cp -r lib job/lib

cp -r target/classes/* job

hadoop fs -put persons.txt persons.txt

jar -cvf gora-demo.job *

hadoop jar gora-demo.job org.apdplat.demo.gora.PersonAnalytics 

 



标签:持久,gora,demo,args,Gora,import,apache,org,数据
From: https://blog.51cto.com/u_2650279/6964885

相关文章

  • 技术篇33:板块及股票均是周期循环下资金运行的载体/周五复盘,尾盘又见小冰点数据
    今天我们来回顾一下今年年初到本周出现的主流板块及核心个股的表现;在回顾之前,先唠叨几句:1:万物兼周期,无论是大家的盈利模式,还是板块,还是个股等,其实都是在周期循环运行的;没有一种模式在不进行修正的情况下,可以一直适应市场;也没有一个板块就可以持续强势下去;更不存在一只股票可以永远......
  • 如何用C#在PC上查找连接蓝牙设备并实现数据传输
    在PC端用.NET开发一个蓝牙下载的程序。实现在PC上查找周围的蓝牙设备(主要是手机),并将PC上的文件通过蓝牙传输到手机上。目前我采用的是OpenNETCF.Net.Bluetooth,能够正常发现周围的蓝牙设备,手机也能够正常进行蓝牙连接并接收文件。#regionOBEXOpenStreampublicboolOBEXOpenStre......
  • Mitsubishi 三菱FXPLC学习之数据处理指令(下)
    本来打算花一篇文章的篇幅来写数据处理指令的,但写着写着发现,一篇文章根本写不完QAQ。上篇文章结束得有点突兀,那这里也再不啰嗦,我们直奔主题吧。01、字交换指令XCH字交换指令,顾名思义,就是将两个字软元件的数据相互交换。从编程手册的截图可以看到,XCH指令可以用于16位和32位......
  • 蓝图,flask-session,数据库连接池
    1蓝图#blueprint:蓝图,flask都写在一个文件中,项目这样肯定不行,分目录,分包,使用蓝图划分目录#不用蓝图,划分目录 -一直使用app对象,会出现循环导入问题-项目名statictemplatesorder_detail.htmlviews__init__.py......
  • Pandas处理时序数据(初学者必会)!
     Datawhale干货 作者:耿远昊,Datawhale成员,华东师范大学时序数据是指时间序列数据。时间序列数据是同一统一指标按时间顺序记录的数据列。在同一数据列中的各个数据必须是同口径的,要求具有可比性。时序数据可以是时期数,也可以时点数。时间序列分析的目的是通过找出样本内时间序列的......
  • 非线性混合效应 NLME模型对抗哮喘药物茶碱动力学研究|附代码数据
    茶碱数据文件报告来自抗哮喘药物茶碱动力学研究的数据。给12名受试者口服茶碱,然后在接下来的25小时内在11个时间点测量血清浓度 代码数据******** ) 。head(thdat)复制代码此处,时间是从抽取样品时开始给药的时间(h),浓度是测得的茶碱浓度(mg/L),体重是受试者的体重(kg)。12名受......
  • 2020上海静安国际大数据论坛成功举行
     Datawhale活动 2020上海静安国际大数据论坛10月22日,由上海市经济和信息化委员会、上海市静安区人民政府、上海市大数据中心指导,上海市北高新(集团)有限公司主办,上海市北高新股份有限公司、机器之心(上海)科技有限公司承办的2020上海静安国际大数据论坛在上海市静安区市北高新商务......
  • 第四届工业大数据赛事:时序序列预测 + 结构化数据挖掘2种类型赛题!
     Datawhale推荐 主办单位:中国信息通信研究院,国家电网,富士康等自2017年以来,由中国信通院主办的工业大数据创新竞赛已经成功举办三届。这是首个由政府主管部门指导的工业大数据领域的全国性权威赛事。除了权威单位的出力,许多业界知名互联网企业也贡献了宝贵的经验和数据,为参赛者......
  • 这是一份不完整的数据竞赛年鉴
     Datawhale调研 主题:关于竞赛选手的反馈摘要:2019年的数据竞赛年鉴主要关于竞赛梳理和竞赛干货分享,但少了选手的反馈,今年将首次加入选手的真实感受。上周在Datawhale竞赛社群进行了调研,目前已收到354份问卷反馈,感谢每一个贡献者。没有填写问卷的同学文末阅读原文可以直接填写,将有......
  • 深入浅出,五次课程,带您进入数据分析的世界
    导读:程序员4大出路:业务专家,全栈开发,技术专家,技术管理。最近,了解了下“数据分析”,觉得也可以作为参考。有兴趣的朋友们,可以看下本篇文章和对应的视频。 近些年,对于分布在各个行业的企业来说,「数据」已经逐渐开始扮演越来越重要的角色,成为企业长远发展不可忽视的力量。在数据分析大......