首页 > 其他分享 >pyspark小案例

pyspark小案例

时间:2023-08-12 17:49:10浏览次数:37  
标签:word rdd py 案例 Harris president pyspark


#
#   py_pyspark_demo.py
#   py_learn
#
#   Created by Z. Steve on 2023/8/12 15:33.
#


# 统计文件中各个单词出现的次数

# 1. 导入库
from pyspark import SparkConf, SparkContext

# 2. 创建 SparkConf 对象 和 SparkContext 对象
conf = SparkConf().setMaster("local[*]").setAppName("spark_demo")
sc = SparkContext(conf=conf)

# 3. 读取文本文件
rdd = sc.textFile("/Users/stevexhz/PycharmProjects/py_learn/content.txt")
word_list_rdd = rdd.flatMap(lambda x: x.split(" "))

word_group_rdd = word_list_rdd.map(lambda word: (word, 1))
result_rdd = word_group_rdd.reduceByKey(lambda a, b: a + b)

# 4. 输出
print(result_rdd.collect())


'''
content.txt

hello
welcome
to
our
country
significant
vulnerable
hurl
hello
welcome
to
our
country
significant
vulnerable
hurl
today is a great day.
and everybody should be here.
Vice President Kamala Harris is scheduled to visit Seattle on Tuesday to attend a political fundraiser and deliver a speech on the Biden administration’s actions to address climate change.

It’s expected that Harris will address provisions of the Inflation Reduction Act, signed into law by President Joe Biden after the vice president cast her tie-breaking vote last August in a divided Senate.

The legislation allocates nearly $375 billion over the next decade for climate-changing measures, including tax credits for clean energy manufacturing and production, and for consumer investments in electric vehicles and wind and solar power.

The act also aimed to lower prescription drug costs, provide more funding for the Internal Revenue Service, and impose a new corporate minimum tax while – say supporters – paying down the federal deficit over time. During the Senate vote, only Democrats favored the bill; Republicans were equally opposed.

During multiple appearances across the nation this month, when Congress is in recess, Harris and the president have been touting the benefits of the legislation and the Bipartisan Infrastructure Law.

Second Gentleman Doug Emhoff is expected to join the vice president on Tuesday’s visit. The Seattle Times reported that Harris will also headline a high-priced political fundraising luncheon co-hosted by Microsoft president Brad Smith and his wife, Kathy Surace-Smith, along with other Microsoft executives and community, business, and civic leaders.


'''

标签:word,rdd,py,案例,Harris,president,pyspark
From: https://www.cnblogs.com/zxhoo/p/17625143.html

相关文章

  • Springboot - 员工部门案例
    目录查询全部部门信息查询全部部门信息//知识点1:@RequestMapping(value="/list",method=RequestMethod.GET)等价于:@GetMapping("/list")//知识点2:privatestaticLoggerlogger=LoggerFactory.getLogger(DeptController.class);等价于@Slf4j(lombok.extern.slf......
  • pyspark使用
    ##py_pyspark.py#py_learn##CreatedbyZ.Steveon2023/8/1017:51.##pyspark编程主要分三步:1.数据输入。2.数据处理。3.数据输出。#RDD:ResilientDistributedDatasets弹性分布式数据集#1.安装pyspark库#pip3installpyspark#2.导入p......
  • 微信小程序开发_入门案例_3
       ......
  • 微信小程序开发_入门案例_4
       记得开启项目和Redis  ......
  • k8s实战案例之运行dubbo微服务
    1、dubbo微服务架构图通过上述架构可以了解到,生产者通过注册中心,将服务注册至注册中心,消费者通过注册中心找到生产者,从而实现消费者拿到生产者的实际地址,然后直接和生产者通信;管理端通过注册中心发现生产者和消费者,通过svc来管理生产者和消费者;集群外部客户端通过负载均衡器来......
  • 微信小程序开发_入门案例_2
        ......
  • Python用 PyMC3 贝叶斯推理案例研究:抛硬币和保险索赔发生结果可视化
    全文链接:https://tecdat.cn/?p=33416原文出处:拓端数据部落公众号介绍在这里,我们将帮助客户将PyMC3用于两个贝叶斯推理案例研究:抛硬币和保险索赔发生。方法:回想一下,我们最初的贝叶斯推理方法是:设置先前的假设,并根据启发式、历史或样本数据建立我们数据的“已知已知”。形......
  • R语言结构方程模型SEM、路径分析房价和犯罪率数据、预测智力影响因素可视化2案例|附代
    原文链接:http://tecdat.cn/?p=25044原文出处:拓端数据部落公众号最近我们被客户要求撰写关于结构方程模型的研究报告,包括一些图形和统计输出。1简介在本文,我们将考虑观察/显示所有变量的模型,以及具有潜在变量的模型。第一种有时称为“路径分析”,而后者有时称为“测量模型”。......
  • 微信小程序开发_入门案例_1
       ......
  • 多线程总结2(多线程代码案例)
    1.单例模式(Singletonpattern))单例模式保证某个类在进程中只创建一个实例1.1饿汉模式类加载的同时立即创建实例classSingleHungry{//只创建了这一个唯一的实例privatestaticSingleHungryinstance=newSingleHungry();publicstaticSingleHungrygetInstan......