首页 > 其他分享 >DataFrame中的行动算子操作1

DataFrame中的行动算子操作1

时间:2022-08-30 12:33:55浏览次数:60  
标签:... 20 zs123456789123456789123 行动 ----------- DataFrame zs123456789123456 算子 pri

val conf = new SparkConf().setAppName("action").setMaster("local[*]")
val session = SparkSession.builder().config(conf).getOrCreate()

val seq: Seq[(String, Int)] = Array(
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 21),
  ("zs123456789123456789123", 22),
  ("zs123456789123456789123", 23),
  ("zs123456789123456789123", 24),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 21),
  ("zs123456789123456789123", 22),
  ("zs123456789123456789123", 23),
  ("zs123456789123456789123", 24),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 29),
  ("zs123456789123456789123", 30),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 20),
  ("zs123456789123456789123", 29),
  ("zs123456789123456789123", 30)
)
import session.implicits._
val frame: DataFrame = seq.toDF("namea", "ageb")

1. printSchema

def printSchemaOpt(frame: DataFrame): Unit = {
  println("-----------printschema操作开始-----------")
  frame.printSchema()
  println("-----------printschema操作结束-----------")
}
结果:
-----------printschema操作开始-----------
root
 |-- namea: string (nullable = true)
 |-- ageb: integer (nullable = false)

-----------printschema操作结束-----------

2. show

show():显示所有数据,最多显示20个字符,默认为true
show(n) :显示前n条数据,最多显示20个字符,默认为true
show(true): 最多显示20个字符,默认为true
show(false): 去除最多显示20个字符的限制
show(n, true):显示前n条并最多显示20个字符

def showOpt(frame: DataFrame) = {
  println("-----------show1操作开始-----------")
  frame.show()
  println("-----------show1操作结束-----------")
  println("-----------show2操作开始-----------")
  frame.show(3)
  println("-----------show2操作结束-----------")
  println("-----------show3操作开始-----------")
  frame.show(30, true)
  println("-----------show3操作结束-----------")
}
-----------show1操作开始-----------
+--------------------+----+
|               namea|ageb|
+--------------------+----+
|zs123456789123456...|  20|
|zs123456789123456...|  21|
|zs123456789123456...|  22|
|zs123456789123456...|  23|
|zs123456789123456...|  24|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  21|
|zs123456789123456...|  22|
|zs123456789123456...|  23|
|zs123456789123456...|  24|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
+--------------------+----+
only showing top 20 rows
-----------show1操作结束-----------
-----------show2操作开始-----------
+--------------------+----+
|               namea|ageb|
+--------------------+----+
|zs123456789123456...|  20|
|zs123456789123456...|  21|
|zs123456789123456...|  22|
+--------------------+----+
only showing top 3 rows

-----------show2操作结束-----------
-----------show3操作开始-----------
+--------------------+----+
|               namea|ageb|
+--------------------+----+
|zs123456789123456...|  20|
|zs123456789123456...|  21|
|zs123456789123456...|  22|
|zs123456789123456...|  23|
|zs123456789123456...|  24|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  21|
|zs123456789123456...|  22|
|zs123456789123456...|  23|
|zs123456789123456...|  24|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
|zs123456789123456...|  29|
|zs123456789123456...|  30|
|zs123456789123456...|  20|
|zs123456789123456...|  20|
+--------------------+----+
only showing top 30 rows

-----------show3操作结束-----------

3. first/head/take/takeAsList

def getDataOpt(frame: DataFrame): Unit = {
  println("-----------first操作开始-----------")
  val row: Row = frame.first()
  println(row.getAs[Int](1))
  println("-----------first操作结束-----------")
  println("-----------head操作开始-----------")
  val array: Array[Row] = frame.head(3)
  println(array.mkString("="))
  println("-----------head操作结束-----------")
  println("-----------take操作开始-----------")
  val arr: Array[Row] = frame.take(3)
  println(arr.mkString("="))
  println("-----------take操作结束-----------")
  println("-----------takeAsList操作开始-----------")
  val list: util.List[Row] = frame.takeAsList(3)
  println(list)
  println("-----------takeAsList操作结束-----------")
}
-----------first操作开始-----------
20
-----------first操作结束-----------
-----------head操作开始-----------
[zs123456789123456789123,20]=[zs123456789123456789123,21]=[zs123456789123456789123,22]
-----------head操作结束-----------
-----------take操作开始-----------
[zs123456789123456789123,20]=[zs123456789123456789123,21]=[zs123456789123456789123,22]
-----------take操作结束-----------
-----------takeAsList操作开始-----------
[[zs123456789123456789123,20], [zs123456789123456789123,21], [zs123456789123456789123,22]]
-----------takeAsList操作结束-----------

4. collect/collectAsList:慎用:获取DataFrame中的所有数据,将DataFrame在不同分区的数据拉取到同一个节点上,容易导致内存溢出

def collectOpt(frame: DataFrame): Unit = {
  println("-----------collect操作结束-----------")
  val array: Array[Row] = frame.collect()
  println(array.mkString("="))
  println("-----------collect操作结束-----------")
  println("-----------collectAsList操作开始-----------")
  val array1 = frame.collectAsList()
  println(array1)
  println("-----------collectAsList操作结束-----------")
}
-----------collect操作结束-----------
[zs123456789123456789123,20]=[zs123456789123456789123,21]=[zs123456789123456789123,22]=[zs123456789123456789123,23]=[zs123456789123456789123,24]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,21]=[zs123456789123456789123,22]=[zs123456789123456789123,23]=[zs123456789123456789123,24]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,29]=[zs123456789123456789123,30]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,20]=[zs123456789123456789123,29]=[zs123456789123456789123,30]
-----------collect操作结束-----------
-----------collectAsList操作开始-----------
[[zs123456789123456789123,20], [zs123456789123456789123,21], [zs123456789123456789123,22], [zs123456789123456789123,23], [zs123456789123456789123,24], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,21], [zs123456789123456789123,22], [zs123456789123456789123,23], [zs123456789123456789123,24], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,29], [zs123456789123456789123,30], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,20], [zs123456789123456789123,29], [zs123456789123456789123,30]]
-----------collectAsList操作结束-----------

标签:...,20,zs123456789123456789123,行动,-----------,DataFrame,zs123456789123456,算子,pri
From: https://www.cnblogs.com/jsqup/p/16638826.html

相关文章

  • DataFrame与rdd之间的转换(val rdd1 = dataFrame.rdd)
    核心语句valrdd1=dataFrame.rddpackageSparkSQL.DataFreamCreate.dataframetorddimportorg.apache.spark.SparkConfimportorg.apache.spark.rdd.RDDimportorg.......
  • DataFrame中的转换算子操作1
    valsparkConf=newSparkConf().setMaster("local[2]").setAppName("tran")valsparkSession=SparkSession.builder().config(sparkConf).getOrCreate()valseq:Seq......
  • DataFrame中的转换算子2
    valsparkConf=newSparkConf().setMaster("local[2]").setAppName("tran")valsparkSession=SparkSession.builder().config(sparkConf).getOrCreate()valseq:Seq......
  • DataFrame操作数据的两种方式(SQL和DSL)
    SQL方式需要将DataFrame注册成为一张临时表,并给临时表起名字,通过SQL语句查询分析DataFrame中数据局部临时表、全局临时表[注意]:--1如果我们注册的是全局表,查询全局表......
  • 从外部存储的结构化文件创建DataFrame---常用的一种方式
    1.从txt文件中创建DataFrame从txt文件中创建DataFrame如果是从普通的文本文件创建DataFrame文件中的列和列的分隔符不清楚所以创建的DataFrame只有一列,一......
  • 从关系型数据库中创建DataFrame
    说明:/*需要引入依赖<dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version......
  • 从Hive中使用HQL语句创建DataFrame--常用方式
    默认情况下SparkSession不支持读取Hive中的数据,也不支持操作HQL语法,如果要读取Hive中的数据,需要开启Hive的支持,构建sparkSession的时候调用一个函数enableHiveSupport()......
  • RDD,DataFrame,DataSet
    RDD:以Person为类型参数,但是Spark框架本身不了解Person类的内部结构。DataFrame:DataFrame每一行的类型固定为Row,每一列的值没法直接访问,只有通过解析才能获取各个......
  • 分区器算子--转换算子
    1.HashPartitioner定义:HashPartitioner----按照key值的hashcode的不同分到不同分区里面弊端:可能会造成数据倾斜问题(每一个分区分配的数据可能差别很多)objectWordCo......
  • 键值对行动算子
    1.countByKey定义:countByKey():scala.collection.Map(K,Long)按照key值计算每一个key出现的总次数案例:valrdd:RDD[(String,Int)]=sc.makeRDD(Array(("zs",60),("zs",70)......