数据容器转rdd对象
通过SparkContext对象的parallelize成员方法,将python数据容器转为rdd对象
from pyspark import SparkConf,SparkContextconf = SparkConf().setMaster("local[*]").setAppName("test_spark_app")
sc = SparkContext(conf=conf)
data1 = [1, 2, 3, 4, 5]
data2 = (1, 2, 3, 4, 5)
data3 = {1, 2, 3, 4, 5}
data4 = "abcdefg"
data5 = {"key1":"value1", "key2":"value2"}
rdd1 = sc.parallelize(data1)
rdd2 = sc.parallelize(data2)
rdd3 = sc.parallelize(data3)
rdd4 = sc.parallelize(data4)
rdd5 = sc.parallelize(data5)
print(rdd1.collect())
print(rdd2.collect())
print(rdd3.collect())
print(rdd4.collect())
print(rdd5.collect())
sc.stop() 读取文本文件转rdd对象 通过SparkContext的textfile成员方法,读取文本文件得到rdd对象 from pyspark import SparkConf,SparkContext
conf = SparkConf().setMaster("local[*]").setAppName("test_spark_app")
sc = SparkContext(conf=conf)
rdd = sc.textFile("D:\WordCount\input\data.txt")
print(rdd.collect())
sc.stop() 标签:SparkContext,parallelize,pyspark,python,print,collect,rdd,sc,输入 From: https://www.cnblogs.com/wjzohou/p/17969760