- 依赖
- 初始化 StreamingContext
- Discretized Streams(DStreams)(离散化流)
- Input DStreams 和 Receivers
- DStreams 上的 Transformations(转换)
- DStreams 上的输出操作
- DataFrame 和 SQL 操作
- MLlib 操作
- 缓存 / 持久化
- CheckPointing
- 累加器和广播变量
- 应用程序部署
- 监控应用程序
- 从 Spark SQL 1.6 升级到 2.0
- 从 Spark SQL 1.5 升级到 1.6
- 从 Spark SQL 1.4 升级到 1.5
- 从 Spark SQL 1.3 升级到 1.4
- 从 Spark SQL 1.0~1.2 升级到 1.3
- 兼容 Apache Hive
- ML Pipelines(ML管道)
- Extracting, transforming and selecting features(特征的提取,转换和选择)
- Classification and regression(分类和回归)
- Clustering(聚类)
- Collaborative Filtering(协同过滤)
- ML Tuning: model selection and hyperparameter tuning(ML调优:模型选择和超参数调整)
- Advanced topics(高级主题)
- Data Types - RDD-based API(数据类型)
- Basic Statistics - RDD-based API(基本统计)
- Classification and Regression - RDD-based API(分类和回归)
- Collaborative Filtering - RDD-based API(协同过滤)
- Clustering - RDD-based API(聚类 - 基于RDD的API)
- Dimensionality Reduction - RDD-based API(降维)
- Feature Extraction and Transformation - RDD-based API(特征的提取和转换)
- Frequent Pattern Mining - RDD-based API(频繁模式挖掘)
- Evaluation metrics - RDD-based API(评估指标)
- PMML model export - RDD-based API(PMML模型导出)
- Optimization - RDD-based API(最优化)
http://cwiki.apachecn.org/pages/viewpage.action?pageId=2883613
标签:指南,中文,based,RDD,API,文档,SQL,Spark From: https://blog.51cto.com/u_6468453/6964650