学习基于Amit Nandi 的 Spark for Python Developers
1.1 word count example
Chapter 5 Streaming Live Data with Spark
目的:“investigate various implementations using live sources of data such as TCP sockets to the Twitter firehose and put in place a low latency,
high throughput, and scalabel data pipeline combining Spark, Kafka and Flume."
fault tolerance
Main Spark Streaming fault tolerance mechanisms are check pointing, automatic driver restart, and automatic failover. Spark enables recovery
from driver failure using check pointing, which preserves the application state. Furthermore, Failures require recomputing results and DStream operations
have exactly-one semantics.
Processing live data with TCP sockets
标签:pointing,PySpark,TCP,学习,Spark,data,sockets From: https://www.cnblogs.com/chadyoungs/p/15654715.html