阅读本篇博客前需先了解图数据、scala、spark相关知识
GraphFrames是一款图处理类库。该类库构建在DataFrame之上,既能利用DataFrame良好的扩展性和强大的性能,同时也为Scala、Java和Python提供了统一的图处理API。
github:https://github.com/graphframes/graphframes
官方文档:https://graphframes.github.io/graphframes/docs/_site/user-guide.html#graphframe-to-graphx
一、对比graphX
GraphFrames | GraphX | |
---|---|---|
数据模型 | DataFrames | RDD |
开发语言 | Scala/Java/Python | Scala |
使用场景 | 数据查询、图计算 | 图计算 |
顶点ID | Any Type | Long |
点边属性 | DataFrame columns | Any Type(VD, ED) |
返回类型 | GraphFrame、DataFrame | Graph[VD, ED] 、RDD[Long, VD] |
二、scala下GraphFrames使用
//导入graphframes依赖
<dependency> <groupId>graphframes</groupId> <artifactId>graphframes</artifactId> <version>0.8.1-spark2.4-s_2.11</version> </dependency>
三、官网案例实践
简单获取
import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession import org.graphframes.GraphFrame object GraphFramesExample { def main(args: Array[String]): Unit = { val sparkConfig = new SparkConf().setAppName("GraphFrames").setMaster("local[2]") .set("spark.sql.shuffle.partitions", "1")//分区大小 val spark: SparkSession = SparkSession.builder().config(sparkConfig).getOrCreate() // Vertex DataFrame val v = spark.createDataFrame(List( ("a", "Alice", 34), ("b", "Bob", 36), ("c", "Charlie", 30), ("d", "David", 29), ("e", "Esther", 32), ("f", "Fanny", 36), ("g", "Gabby", 60) )).toDF("id", "name", "age") // Edge DataFrame val e = spark.createDataFrame(List( ("a", "b", "friend"), ("b", "c", "follow"), ("c", "b", "follow"), ("f", "c", "follow"), ("e", "f", "follow"), ("e", "d", "friend"), ("d", "a", "friend"), ("a", "e", "friend") )).toDF("src", "dst", "relationship") // Create a GraphFrame val g = GraphFrame(v, e) // g.find("(a)-[e]->(b); (b)-[e2]->(a)").show() // g.find("(a)-[e]->(b); (b)-[e2]->(a)").show() g.find("(a)-[e]->(b); (b)-[e2]->(c); (c)-[e3]->(a)") .where("a.age > 29") .show() //获取图内所有点 g.vertices.show() //获取图内所有边 g.edges.show() //获取点的入度表 g.inDegrees.show() //获取点的出度表 g.outDegrees.show() //获取点的出入度表 g.degrees.show() //获取图内所有三元组 g.triplets.show() } }
输出
//获取图内所有点 g.vertices.show() +---+-------+---+ | id| name|age| +---+-------+---+ | a| Alice| 34| | b| Bob| 36| | c|Charlie| 30| | d| David| 29| | e| Esther| 32| | f| Fanny| 36| | g| Gabby| 60| +---+-------+---+ //获取图内所有边 g.edges.show() +---+---+------------+ |src|dst|relationship| +---+---+------------+ | a| b| friend| | b| c| follow| | c| b| follow| | f| c| follow| | e| f| follow| | e| d| friend| | d| a| friend| | a| e| friend| +---+---+------------+ //获取点的入度表 g.inDegrees.show() +---+--------+ | id|inDegree| +---+--------+ | b| 2| | c| 2| | f| 1| | d| 1| | a| 1| | e| 1| +---+--------+ //获取点的出度表 g.outDegrees.show() +---+---------+ | id|outDegree| +---+---------+ | a| 2| | b| 1| | c| 1| | f| 1| | e| 2| | d| 1| +---+---------+ //获取点的出入度表 g.degrees.show() +---+------+ | id|degree| +---+------+ | a| 3| | b| 3| | c| 3| | f| 2| | e| 3| | d| 2| +---+------+ //获取图内所有三元组 g.triplets.show() +----------------+--------------+----------------+ | src| edge| dst| +----------------+--------------+----------------+ | [a, Alice, 34]|[a, b, friend]| [b, Bob, 36]| | [b, Bob, 36]|[b, c, follow]|[c, Charlie, 30]| |[c, Charlie, 30]|[c, b, follow]| [b, Bob, 36]| | [f, Fanny, 36]|[f, c, follow]|[c, Charlie, 30]| | [e, Esther, 32]|[e, f, follow]| [f, Fanny, 36]| | [e, Esther, 32]|[e, d, friend]| [d, David, 29]| | [d, David, 29]|[d, a, friend]| [a, Alice, 34]| | [a, Alice, 34]|[a, e, friend]| [e, Esther, 32]| +----------------+--------------+----------------+
Motif finding(主题查找)
GraphFrame主题查找使用特定语言(DSL)来表达结构查询。使用()表示点,[]表示边
例如,graph.find("(a)-[e]->(b); (b)-[e2]->(a)")将搜索由双向边缘连接的顶点a,b对。它将返回图形中所有此类结构DataFrame,
中包含主题中每个命名元素(顶点或边缘)的列。
+----------------+--------------+----------------+--------------+ | a| e| b| e2| +----------------+--------------+----------------+--------------+ | [b, Bob, 36]|[b, c, follow]|[c, Charlie, 30]|[c, b, follow]| |[c, Charlie, 30]|[c, b, follow]| [b, Bob, 36]|[b, c, follow]| +----------------+--------------+----------------+--------------+
查找a点指向b点,同时存在b点指向a点的模式
g.find("(a)-[e]->(b); (b)-[e2]->(a)").show()
+----------------+--------------+----------------+--------------+ | a| e| b| e2| +----------------+--------------+----------------+--------------+ | [b, Bob, 36]|[b, c, follow]|[c, Charlie, 30]|[c, b, follow]| |[c, Charlie, 30]|[c, b, follow]| [b, Bob, 36]|[b, c, follow]| +----------------+--------------+----------------+--------------+
查找a点指向b点,b点指向c点,c点指向a点,相当于一个有向的三点成环
g.find("(a)-[e]->(b); (b)-[e2]->(c); (c)-[e3]->(a)").show()
+---------------+--------------+---------------+--------------+---------------+--------------+ | a| e| b| e2| c| e3| +---------------+--------------+---------------+--------------+---------------+--------------+ |[e, Esther, 32]|[e, d, friend]| [d, David, 29]|[d, a, friend]| [a, Alice, 34]|[a, e, friend]| | [d, David, 29]|[d, a, friend]| [a, Alice, 34]|[a, e, friend]|[e, Esther, 32]|[e, d, friend]| | [a, Alice, 34]|[a, e, friend]|[e, Esther, 32]|[e, d, friend]| [d, David, 29]|[d, a, friend]| +---------------+--------------+---------------+--------------+---------------+--------------+
在以上基础之上添加条件过滤,此处使用where等同于filter
g.find("(a)-[e]->(b); (b)-[e2]->(c); (c)-[e3]->(a)")
.where("a.age > 29")
.show()
+---------------+--------------+---------------+--------------+--------------+--------------+ | a| e| b| e2| c| e3| +---------------+--------------+---------------+--------------+--------------+--------------+ |[e, Esther, 32]|[e, d, friend]| [d, David, 29]|[d, a, friend]|[a, Alice, 34]|[a, e, friend]| | [a, Alice, 34]|[a, e, friend]|[e, Esther, 32]|[e, d, friend]|[d, David, 29]|[d, a, friend]| +---------------+--------------+---------------+--------------+--------------+--------------+
标签:show,36,介绍,用法,follow,GraphFrames,29,friend,e2 From: https://www.cnblogs.com/MuXinu/p/17839253.html