首页 > 其他分享 >2024.12.3

2024.12.3

时间:2024-12-03 23:10:46浏览次数:10  
标签:2024.12 tuple JavaPairRDD Tuple2 new import line

// 计算每个人的平均成绩
JavaPairRDD<String, Double> averages = scores.join(counts).mapValues(new Function<Tuple2<Integer, Integer>, Double>() {
    @Override
    public Double call(Tuple2<Integer, Integer> tuple) {
        return (double) tuple._1 / tuple._2;
    }
});

在这个修正后的代码中,我们使用 mapValues 来转换 scores.join(counts) 的值。mapValues 接受一个 Function,这个 Function 只作用于 Tuple2 的值部分,即 IntegerInteger 的对。我们计算这两个数的商,得到平均成绩。

完整的程序应该如下所示:

package com.example;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;

import scala.Tuple2;

import java.util.List;

public class AverageScores {
    public static void main(String[] args) {
        // 配置Spark
        SparkConf conf = new SparkConf()
                .setAppName("AverageScores")
                .setMaster("local[*]"); // 根据你的环境修改
        JavaSparkContext sc = new JavaSparkContext(conf);

        // 读取HDFS文件
        JavaRDD<String> lines = sc.textFile("hdfs://node1:8020/test/1.txt");

        // 解析文件内容并按人名分组成绩
        JavaPairRDD<String, Integer> scores = lines.filter(line -> line.contains(" "))
                .mapToPair(line -> new Tuple2<>(line.split(" ")[0], Integer.parseInt(line.split(" ")[1])));

        // 计算每个人的总成绩
        JavaPairRDD<String, Integer> totalScores = scores.reduceByKey((a, b) -> a + b);

        // 计算每个人成绩的数量
        JavaPairRDD<String, Integer> scoreCounts = lines.filter(line -> line.contains(" "))
                .mapToPair(line -> new Tuple2<>(line.split(" ")[0], 1))
                .reduceByKey((a, b) -> a + b);

        // 计算每个人的平均成绩
        JavaPairRDD<String, Double> averages = totalScores.join(scoreCounts).mapValues(new Function<Tuple2<Integer, Integer>, Double>() {
            @Override
            public Double call(Tuple2<Integer, Integer> tuple) {
                return (double) tuple._1 / tuple._2;
            }
        });

        // 收集并输出结果
        List<Tuple2<String, Double>> output = averages.collect();
        for (Tuple2<String, Double> tuple : output) {
            System.out.printf("(%s, %.2f)%n", tuple._1, tuple._2);
        }

        // 关闭Spark上下文
        sc.close();
    }
}

标签:2024.12,tuple,JavaPairRDD,Tuple2,new,import,line
From: https://www.cnblogs.com/258-333/p/18585240

相关文章

  • 2024.12.2 周一
    2024.12.2周一Q1.1100给定一个数字(32位以内),使用1,0,-1构造二进制数位,同时保证不能有相邻的非0数存在。Q2.1200给定2个相同数位的数(<=1e100),任意操作:交换2数中相同位的数字使两数之积最大。Q3.1300前缀后缀板题Q4.1400给定n,m(<=2e6)。a:1n,b:1m,问:满足a+b是b*g......
  • 2024.12.3(周二)
    #导入必要的库fromsklearnimportdatasetsfromsklearn.model_selectionimporttrain_test_split,cross_val_score,StratifiedKFoldfromsklearn.svmimportSVCfromsklearn.metricsimportaccuracy_score,precision_score,recall_score,f1_score,classification......
  • 2024.12.2(周一)
    importnumpyasnpfromsklearnimportdatasetsfromsklearn.model_selectionimporttrain_test_split,cross_val_scorefromsklearn.metricsimportaccuracy_score,precision_score,recall_score,f1_score,confusion_matrix,make_scorerfromsklearn.treeimpo......
  • 2024.12.6(周五)
    #导入相关库importnumpyasnpfromsklearn.datasetsimportload_irisfromsklearn.model_selectionimporttrain_test_split,cross_val_scorefromsklearn.clusterimportKMeansfromsklearn.metricsimportaccuracy_score,precision_score,recall_score,f1_scor......
  • 2024.12.5(周四)
    #导入必要的库importnumpyasnpfromsklearnimportdatasetsfromsklearn.model_selectionimporttrain_test_split,cross_val_score,StratifiedKFoldfromsklearn.naive_bayesimportGaussianNBfromsklearn.metricsimportaccuracy_score,precision_score,reca......
  • 2024.12.9(周一)
    importnumpyasnpimportpandasaspdfromsklearnimportdatasetsfromsklearn.model_selectionimporttrain_test_split,cross_val_score,StratifiedKFoldfromsklearn.ensembleimportRandomForestClassifierfromsklearn.metricsimportaccuracy_score,prec......