1、特别申明,请注意JDK版本,最好用JDK1.8,用JDK17会导致很多报错
2、导入pom依赖 JDK1.8直接导入spark依赖就行。
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.13</artifactId> <version>3.4.1</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.13</artifactId> <version>3.4.1</version> </dependency>
3、如果是JDK17需要导入JDK1.8的(JDK1.8版本跳过下面依赖)
<dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-nop</artifactId> <version>1.6.0</version> </dependency> <dependency> <groupId>javax.servlet</groupId> <artifactId>javax.servlet-api</artifactId> <version>4.0.1</version> </dependency>
4、直接上代码
@Operation(summary = "spark处理") @PostMapping("sparkHander") public Resp sparkTest() throws IOException { //System.setProperty("hadoop.home.dir", "C:\\hadoop-common-2.2.0"); SparkConf sparkConf = new SparkConf(); sparkConf.set("spark.ui.enabled", "false"); sparkConf.set("spark.some.config.option", "some-value"); SparkSession spark = SparkSession .builder() .master("local") .appName("SparkSQLTest4") .config(sparkConf) .getOrCreate();
//本地后缀名.json的文件 Dataset<Row> df = spark.read().json("C:\\test.json"); df.printSchema(); df.show();
//表名可以随便写如:"test" df.createOrReplaceTempView("test");
//如果上面写test这里你就可以写查询条件select * from test Dataset<Row> dataset = spark.sql("select * from test"); dataset.show();
//得到JSON的字符串 List<String> strings = dataset.toJSON().collectAsList(); spark.stop(); return Resp.of(strings); }
5、JSON格式如下:
[ { "port": "1", "name": "测试服务器1", "showPassword": false, "status": "0" }, { "port": "2", "name": "测试服务器2", "showPassword": false, "status": "0" },{ "port": "3", "name": "测试服务器3", "showPassword": false, "status": "0" } ]
6、(JDK1.8无需配置)如果是JDK17还需要配置:vm options
--add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-exports java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.base/java.javax=ALL-UNNAMED
标签:集成,JDK1.8,java,springboot,df,add,test,spark From: https://www.cnblogs.com/Ifyou/p/18140113