在使用pyspark的时候,遇到了如下问题:
Could not serialize object: IndexError: tuple index out of range
代码如下:
from pyspark import SparkContext
import os
import logging
logging.basicConfig(level=logging.ERROR)
from pyspark.sql import SparkSession,Row
ss = SparkSession.builder.appName("rdd").master("local[2]").getOrCreate()
# user_df = ss.createDataFrame([(1,'Tom',22),(2,'Lucy',18),(3,'Nick',21)],['id','name','age'])
# print(user_df.show())
Person = Row("id", "name", "age", "weight")
user_row_df = ss.createDataFrame([Person(1,"tom",21,75.5), Person(2, "lucy", 18, 50.0)])
print(user_row_df.show())
报错信息如下:
错误原因:
标签:index,logging,pyspark,df,object,Person,报错,user,import From: https://blog.51cto.com/u_16143036/8613591Python版本过高,更换为较低的版本即可解决问题(推荐3.7,3.8版本)