首页 > 其他分享 >Spark离线项目创建和运行步骤

Spark离线项目创建和运行步骤

时间:2022-10-21 20:46:39浏览次数:72  
标签:scala 步骤 flink 离线 version apache org Spark spark

一、安装maven

   1.解压maven安装包,将加压后的安装包放在没有中文路径的目录下

   2.创建仓库文件夹repository(理论上任何位置都是可以的,建议和maven文件夹同级别,这样好管理一些)

 

   3.    要修改settings文件,进入到 apache-maven-3.5.4\conf目录下

   4.   修改settings文件,要修改的地方有两个,分别为localRepository与mirror两个位置

    5.我们可以把原来自带的localRepository哪一行注释掉,然后复制一行出来,中间的值填写2中刚才仓库的位置路径

    找到<mirrors>标签,在其中添加以下镜像,分别为阿里云和官方仓库

   <mirror>

        <id>nexus-aliyun</id>

        <mirrorOf>*,!cloudera</mirrorOf>

        <name>Nexus aliyun</name>                    

<url>http://maven.aliyun.com/nexus/content/groups/public</url>

    </mirror>

    <mirror>

      <id>mirrorId</id>

      <mirrorOf>*,!cloudera</mirrorOf>

      <name>Human Readable Name for this Mirror.</name>

      <url>http://my.repository.com/repo/path</url>

</mirror>

6. 在idea配置maven

 

 

 

二、安装scala插件

 

 

三、安装scala sdk

https://cloud.tencent.com/developer/article/1979321

四、创建项目

1. 创建一个普通的maven项目

2. 项目添加scala框架支持

 

 

3. 修改pom.xml文件

这个是pom.xml文件

最下面的build部分要根据自己的情况来(因为可能会出现在Maven中Package后classes可能会出现两个重复的文件的情况)

不懂Mavern的话建议看一下Maven的生命周期(联想到跟自己的年龄大小一样,上面表示已经度过了,也就是说,例如下面的Package会把之前干的事情都给包含下来)Maven 构建生命周期 | 菜鸟教程 (runoob.com)

<properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <flink.version>1.14.0</flink.version>
        <scala.version>2.12</scala.version>
        <hive.version>3.1.2</hive.version>
        <mysqlconnect.version>5.1.47</mysqlconnect.version>
        <hdfs.version>3.1.3</hdfs.version>
        <spark.version>3.0.3</spark.version>
        <hbase.version>2.2.3</hbase.version>
        <kafka.version>2.4.1</kafka.version>
        <lang3.version>3.9</lang3.version>
        <flink-connector-redis.verion>1.1.5</flink-connector-redis.verion>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-reflect</artifactId>
            <version>${scala.version}.12</version>
        </dependency>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-compiler</artifactId>
            <version>${scala.version}.12</version>
        </dependency>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}.12</version>
        </dependency>

        <!--kafka-->
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_${scala.version}</artifactId>
            <version>${kafka.version}</version>
        </dependency>

        <!--flink 实时处理-->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-runtime-web_${scala.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_${scala.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_${scala.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka_${scala.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_${scala.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-json</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_${scala.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-scala-bridge_${scala.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-redis_2.11</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.flink</groupId>
                    <artifactId>flink-shaded-hadoop2</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.commons</groupId>
                    <artifactId>commons-lang3</artifactId>
                </exclusion>
            </exclusions>
            <version>${flink-connector-redis.verion}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>${lang3.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-hive_${scala.version}
            </artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-hbase-2.2_${scala.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>


        <!--mysql连接器-->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>${mysqlconnect.version}</version>
        </dependency>


        <!--spark处理离线-->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.hive</groupId>
                    <artifactId>hive-exec</artifactId>
                </exclusion>
            </exclusions>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.hive</groupId>
                    <artifactId>hive-exec</artifactId>
                </exclusion>
            </exclusions>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>



        <!-- hadoop相关-->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hdfs.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-auth</artifactId>
            <version>${hdfs.version}</version>
        </dependency>



        <!--hbase 相关-->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-mapreduce</artifactId>
            <version>${hbase.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>${hbase.version}</version>
       </dependency>



    </dependencies>


    <build>
        <resources>
            <resource>
                <directory>src/main/scala</directory>
            </resource>
            <resource>
                <directory>src/main/java</directory>
            </resource>

            <resource>
                <directory>src/main/resources</directory>
            </resource>
        </resources>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <configuration>
                    <recompileMode>incremental</recompileMode>
                </configuration>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>

            </plugin>


        </plugins>
    </build>


</project>

  

复制pom文件部分内容到idea中pom文件中。然后让maven下载依赖。

内容 从<properties>开始到</build>结束。

 

 

4、提供hive支持

把hive110/conf/hive-site.xml文件拷贝到resources资源包中

https://zhuanlan.zhihu.com/p/343132450

5. 导入数据库

6. 创建scala文件

D文件

object D {

  def main(args: Array[String]): Unit = {

    //创建sparksession对象

    val conf = new SparkConf()

      .setAppName("task1_modelB_task1_job1")

      .setMaster("local[*]")

 

    val ss = SparkSession.builder()

      .config(conf)

      //.enableHiveSupport()

      .getOrCreate()

 

    //连接mysql

    val base_orders = ss.read.format("jdbc")

      .option("driver", "com.mysql.jdbc.Driver")

      .option("url", "jdbc:mysql://127.0.0.1:3307/shtd_store?user=root&password=root&useSSL=false&useUnicode=true&characterEncoding=utf8")

      .option("dbtable", "ORDERS")

      .load()

    base_orders.createOrReplaceTempView("mysql_orders")

 

    var sql1=ss.sql("select * from mysql_orders limit 10").show()

 

    //

 

 

  }

 

}

可以直接运行

E.scala文件

object E {

 

  def main(args: Array[String]): Unit = {

    //创建sparksession对象

    val conf = new SparkConf()

      .setAppName("task1_modelB_task1_job1")

      .setMaster("local[*]")

 

    val ss = SparkSession.builder()

      .config(conf)

      .enableHiveSupport()

      .getOrCreate()

 

    //连接mysql

    val base_orders = ss.read.format("jdbc")

      .option("driver", "com.mysql.jdbc.Driver")

      .option("url", "jdbc:mysql://127.0.0.1:3307/shtd_store?user=root&password=root&useSSL=false&useUnicode=true&characterEncoding=utf8")

      .option("dbtable", "ORDERS")

      .load()

    base_orders.createOrReplaceTempView("mysql_orders")

 

 

    //向hive的ods.orders表插入10条记录,并存储在20221019分区中

    ss.sql(

      """

        |insert into table ods.orders partition(dealdate = 20221019)

        |select *

        |from mysql_orders limit 10

        |""".stripMargin)

 

  }

 

}

6. 项目打jar包

使用maven方式

 

第一步 compile  第二步 package

7. 将打好的jar包拷贝到linux机器上,启动spark,使用命令运行jar文件

到spark安装目录的bin目录下,cmd进入命令行,

 

spark-submit  --class  主类名含包名  --master local   jar包位置

 (没用的话试试在spark前面加入 ./)

 

Spark spark-submit 提交的几种模式 https://blog.csdn.net/huonan_123/article/details/84282843

标签:scala,步骤,flink,离线,version,apache,org,Spark,spark
From: https://www.cnblogs.com/catch-autumn/p/16814702.html

相关文章

  • Starrocks坏盘后数据无法查询恢复的步骤
    1.先进入/opt/starRocks/be/conf把坏盘从数据盘中删除2.停止服务停止be服务bin/stop_be.sh3.启动服务 bin/start_be.sh--daemon因为是三副本,所以数据是不会丢失......
  • Dart SDK的离线安装(适用于不同类型设备)
    前言笔者在armv8架构设备上运行ubuntu20.04系统,安装dart-sdk部署相关服务。网上基本都是照搬的官方文档的说法,经测试不是很好用,寻找版本号有些麻烦,特在此把我离线部署的......
  • spark springboot 实例WordCount.scala20221021
    spark解析aa.txt   1、aa.txt           2、pom.xml<dependency><groupId>org.apache.spark</groupId>......
  • .Net Core EF的使用步骤
    EFCore--CodeFirst(代码优先)第一步安装NuGet包Microsoft.EntityFrameworkCoreMicrosoft.EntityFrameworkCore.SqlServerMicrosoft.EntityFrameworkCore.Tools第二步添......
  • SpringBoot集成MQTT的步骤和注意事项
    最近项目用到了mqtt,所以记录下SpringBoot集成MQTT的步骤和注意事项,整理一下知识,方便自己和他人。一、pom文件里引入maven依赖jar包<dependency><groupId>org.sprin......
  • spark scala 安装 window20221021
    1、spark安装http://archive.apache.org/dist/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz 环境变量:  创建SPARK_HOME:D:\spark-2.2.0-bin-hadoop2.7Path......
  • 突破单点瓶颈、挑战海量离线任务,Apache Dolphinscheduler在生鲜电商领域的落地实践
    ​ 点亮⭐️Star·照亮开源之路GitHub:https://github.com/apache/dolphinscheduler精彩回顾近期,食行生鲜的数据平台工程师单葛尧在社区线上Meetup上给大家分享......
  • Oracle侦听配置步骤
    Oracle侦听配置步骤步骤:提供两种方法一、将配置好的tnsnames.ora文件从当前文件所在文件夹复制到D:/oracle/ora92/network/admin目录下,此过程,你还可以找到D:/oracle......
  • windows下卸载oracle步骤
    windows下卸载oracle步骤  有道是,oracle安装容易卸载难。博主总结了一些卸载步骤,供大家参考 1.删除oracle注册表信息.运行regedit,删除注册表项  HKEY_LOCAL_MACHINE/......
  • JavaWeb完整案例详细步骤
    JavaWeb完整案例详细步骤废话少说,展示完整案例主要实现功能基本的CURD、分页查询、条件查询、批量删除所使用的技术前端:Vue+Ajax+Elememt-ui后端:Web层(Servlet)+Serv......