• 2024-05-27PySpark分布式项目运行流程
    1.PySpark是Spark为Python开发者提供的API。2.基于PySpark的分布式项目主要由三部分组成,如图1所示,我们在开发自己的分布式程序时,只需要关注两部分,1是开发自己项目的PySpark代码,2是将该代码运行需要的环境进行打包。下面的countNum.py即一个简单的分布式程序。#count
  • 2024-05-27常用hdfs命令
    hdfsdfs-mkdir/home/hdp-ait/wangwei22hdfsdfs-ls/home/hdp-ait/wangwei22hdfsdfs-du-h/home/hdp-ait/wangwei22hdfsdfs-touchz/home/hdp-ait/wangwei22/a.txthdfsdfs-rm/home/hdp-ait/wangwei22/edges.txthdfsdfs-rm/home/hdp-ait/wangwei22/vertexs.txt