首页 > 数据库 >invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by re

invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by re

时间:2023-07-18 20:56:16浏览次数:36  
标签:invalidate recreating Iterator scala involved anon apache org spark

	... 1 more
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://ns1/user/hive/warehouse/dw.db/dw_uniswapv3_position_detail/pk_day=1689552000000/part-00000-bbe52b3b-4963-4c76-9ba9-e315305baed7.c000
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:129)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:103)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
	... 3 more

在写入之前,由于meta发生了修改,所以需要刷新表,这里需要注意的是在SQL插入和dataFrame插入下,刷新表的语句不同。

#sparksql模式
spark.sql("REFRESH TABLE db.tablename")

# dataframe模式
spark.Catalog.refreshTable("db.tablename")

标签:invalidate,recreating,Iterator,scala,involved,anon,apache,org,spark
From: https://www.cnblogs.com/30go/p/17564104.html

相关文章