今日学习SprackSQL的两种语言风格,分别是DLS风格和SQL风格,其中SQL风格的语句需要先将DataFrame注册成表才能使用
接下来是学习中使用到的部分代码
# coding:utf8 from pyspark.sql import SparkSession from pyspark.sql.types import StructType, StringType, IntegerType import pandas as pd if __name__ == '__main__': spark = SparkSession.builder. \ appName("test"). \ master("local[*]"). \ getOrCreate() sc = spark.sparkContext df = spark.read.format("csv"). \ schema("id INT, subject STRING, score INT"). \ load("../data/input/stu_score.txt") # column对象的获取 id_column = df['id'] subject_column = df['subject'] # DLS风格 df.select(["id", "subject"]).show() df.select("id", "subject").show() df.select(id_column, subject_column).show() #filter API df.filter("score < 99").show() df.filter(df['score'<99]).show() #where API df.where("score < 99").show() df.where(df['score'] < 99).show() #Group By API df.groupby("subject").count().show() #SQL风格,需要先将dataFrame注册成表 df.createTempView("score")#注册临时表 df.createOrReplaceTempView("score_2")#注册或替换临时表 df.createGlobalTempView("score_3")#注册全局临时表,可以跨SparkSession使用 spark.sql("SELECT subject, COUNT(*) AS cnt FROM score GROUP BY subject")
标签:__,10,score,df,column,寒假,id,大三,subject From: https://www.cnblogs.com/wrf1/p/17975692