首页 > 其他分享 >ONgDB并发计算节点相似度

ONgDB并发计算节点相似度

时间:2022-11-28 16:08:20浏览次数:53  
标签:name similarity CREATE ONgDB Person 并发 LIKES stars 节点


节点间相似度计算

以下测试均在ongdb完成
在ongdb集群的运行过程中,READ_REPLICA节点只支持运行Mode.READ类型的过程(集群中所有节点都支持Mode.READ类型的过程),需要运行中写入的过程需要在支持写入的节点运行。
并发计算的过程不支持写入,运算得到的结果需要neo4j-stream的支持进行写入,或者额外的插件单独写入。结果的保存也可以放在其它存储系统。
可支持的并发数量与服务器的CPU核数有关系。

一、Jaccard相似度 - algo.similarity.jaccard - Mode.WRITE

杰卡尔德相似度计算更适合在大规模数据下的分布式并行运算

1、创建测试数据

CREATE (a:Person {name:'Alice'})
CREATE (b:Person {name:'Bob'})
CREATE (c:Person {name:'Charlie'})
CREATE (d:Person {name:'Dana'})
CREATE (i1:Item {name:'p1'})
CREATE (i2:Item {name:'p2'})
CREATE (i3:Item {name:'p3'})
CREATE (a)-[:LIKES]->(i1),
(a)-[:LIKES]->(i2),
(a)-[:LIKES]->(i3),
(b)-[:LIKES]->(i1),
(b)-[:LIKES]->(i2),
(c)-[:LIKES]->(i3)

2、运行相似度计算

不支持并发计算支持写入

MATCH (p:Person)-[:LIKES]->(i:Item)
WITH {item:id(p), categories: collect(distinct id(i))} as userData
WITH collect(userData) as data
CALL algo.similarity.jaccard(data, {write:true,showComputations:true,similarityCutoff:0.1}) yield p25, p50, p75, p90, p95, p99, p999, p100, nodes, similarityPairs, computations RETURN *

3、相似度计算结果

生成了一条SIMILAR关系线,更新了score属性;很明确可以看到人物节点之间的相似性得分。

ONgDB并发计算节点相似度_数据挖掘

4、algo.similarity.jaccard.stream - Mode.READ

并发计算,不支持写入

MATCH (p:Person)-[:LIKES]->(i:Item)
WITH {item:id(p), categories: collect(distinct id(i))} as userData
WITH collect(userData) as data
call algo.similarity.jaccard.stream(data,{topK:4,concurrency:4,similarityCutoff:-0.1}) yield item1, item2, count1, count2, intersection, similarity RETURN * ORDER BY item1,item2

二、余弦相似度 - algo.similarity.cosine - Mode.WRITE

1、创建测试数据

CREATE (a:Person {name:'Alice'})
CREATE (b:Person {name:'Bob'})
CREATE (c:Person {name:'Charlie'})
CREATE (d:Person {name:'Dana'})
CREATE (i1:Item {name:'p1'})
CREATE (i2:Item {name:'p2'})
CREATE (i3:Item {name:'p3'})
CREATE (a)-[:LIKES {stars:1}]->(i1),
(a)-[:LIKES {stars:2}]->(i2),
(a)-[:LIKES {stars:5}]->(i3),
(b)-[:LIKES {stars:1}]->(i1),
(b)-[:LIKES {stars:3}]->(i2),
(c)-[:LIKES {stars:4}]->(i3)

2、运行相似度计算

MATCH (i:Item) WITH i ORDER BY id(i) MATCH (p:Person) OPTIONAL MATCH (p)-[r:LIKES]->(i)
WITH {item:id(p), weights: collect(coalesce(r.stars,0))} as userData
WITH collect(userData) as data
CALL algo.similarity.cosine(data, {write:true,showComputations:true,similarityCutoff:0.1}) yield p25, p50, p75, p90, p95, p99, p999, p100, nodes, similarityPairs, computations RETURN *

3、algo.similarity.cosine.stream - Mode.READ

CALL algo.similarity.cosine.stream([{item:id, weights:[weights]}], {similarityCutoff:-1,degreeCutoff:0})
YIELD item1, item2, count1, count2, intersection, similarity - computes cosine distance

三、Pearson相似度 - algo.similarity.pearson - Mode.WRITE

1、创建测试数据

CREATE (a:Person {name:'Alice'})
CREATE (b:Person {name:'Bob'})
CREATE (c:Person {name:'Charlie'})
CREATE (d:Person {name:'Dana'})
CREATE (i1:Item {name:'p1'})
CREATE (i2:Item {name:'p2'})
CREATE (i3:Item {name:'p3'})
CREATE (i4:Item {name:'p4'})
CREATE (a)-[:LIKES {stars:1}]->(i1),
(a)-[:LIKES {stars:2}]->(i2),
(a)-[:LIKES {stars:3}]->(i3),
(a)-[:LIKES {stars:4}]->(i4),
(b)-[:LIKES {stars:2}]->(i1),
(b)-[:LIKES {stars:3}]->(i2),
(b)-[:LIKES {stars:4}]->(i3),
(b)-[:LIKES {stars:5}]->(i4),
(c)-[:LIKES {stars:3}]->(i1),
(c)-[:LIKES {stars:4}]->(i2),
(c)-[:LIKES {stars:4}]->(i3),
(c)-[:LIKES {stars:5}]->(i4),
(d)-[:LIKES {stars:3}]->(i2),
(d)-[:LIKES {stars:2}]->(i3),
(d)-[:LIKES {stars:5}]->(i4)

2、运行相似度计算

MATCH (i:Item) WITH i ORDER BY id(i) MATCH (p:Person) OPTIONAL MATCH (p)-[r:LIKES]->(i)
WITH {item:id(p), weights: collect(coalesce(r.stars,0))} as userData
WITH collect(userData) as data
CALL algo.similarity.pearson(data,{similarityCutoff:-0.1}) yield p25, p50, p75, p90, p95, p99, p999, p100, nodes, similarityPairs, computations RETURN *

3、algo.similarity.jaccard.stream - Mode.READ

MATCH (i:Item) WITH i ORDER BY i MATCH (p:Person) OPTIONAL MATCH (p)-[r:LIKES]->(i)
WITH p, i, r ORDER BY id(p), id(i) WITH {item:id(p), weights: collect(coalesce(r.stars,$missingValue))} as userData
WITH collect(userData) as data
call algo.similarity.pearson.stream(data,{similarityCutoff:-0.1}) yield item1, item2, count1, count2, intersection, similarity RETURN item1, item2, count1, count2, intersection, similarity ORDER BY item1,item2

四、欧式距离 - algo.similarity.euclidean - Mode.WRITE

1、创建测试数据

CREATE (a:Person {name:'Alice'})
CREATE (b:Person {name:'Bob'})
CREATE (c:Person {name:'Charlie'})
CREATE (d:Person {name:'Dana'})
CREATE (i1:Item {name:'p1'})
CREATE (i2:Item {name:'p2'})
CREATE (i3:Item {name:'p3'})
CREATE (a)-[:LIKES {stars:1}]->(i1),
(a)-[:LIKES {stars:2}]->(i2),
(a)-[:LIKES {stars:5}]->(i3),
(b)-[:LIKES {stars:1}]->(i1),
(b)-[:LIKES {stars:3}]->(i2),
(c)-[:LIKES {stars:4}]->(i3)

2、运行相似度计算

MATCH (i:Item) WITH i ORDER BY id(i) MATCH (p:Person) OPTIONAL MATCH (p)-[r:LIKES]->(i)
WITH {item:id(p), weights: collect(coalesce(r.stars,0))} AS userData
WITH collect(userData) AS data
CALL algo.similarity.euclidean(data, {similarityCutoff:-0.1}) YIELD p25, p50, p75, p90, p95, p99, p999, p100, nodes, similarityPairs, computations RETURN *

3、algo.similarity.jaccard.stream - Mode.READ

MATCH (i:Item) WITH i ORDER BY id(i) MATCH (p:Person) OPTIONAL MATCH (p)-[r:LIKES]->(i)
WITH {item:id(p), weights: collect(coalesce(r.stars,$missingValue))} AS userData
WITH collect(userData) AS data
CALL algo.similarity.euclidean.stream(data,{similarityCutoff:-0.1}) YIELD item1, item2, count1, count2, intersection, similarity RETURN item1, item2, count1, count2, intersection, similarity ORDER BY item1,item2

五、重叠相似度 - algo.similarity.overlap - Mode.WRITE

1、创建测试数据

CREATE (a:Person {name:'Alice'})
CREATE (b:Person {name:'Bob'})
CREATE (c:Person {name:'Charlie'})
CREATE (d:Person {name:'Dana'})
CREATE (i1:Item {name:'p1'})
CREATE (i2:Item {name:'p2'})
CREATE (i3:Item {name:'p3'})
CREATE (a)-[:LIKES]->(i1),
(a)-[:LIKES]->(i2),
(a)-[:LIKES]->(i3),
(b)-[:LIKES]->(i1),
(b)-[:LIKES]->(i2),
(c)-[:LIKES]->(i3)

2、运行相似度计算

MATCH (p:Person)-[:LIKES]->(i:Item)
WITH {item:id(p), categories: collect(distinct id(i))} as userData
WITH collect(userData) as data
CALL algo.similarity.overlap(data, {similarityCutoff:-0.1}) yield p25, p50, p75, p90, p95, p99, p999, p100, nodes, similarityPairs, computations RETURN p25, p50, p75, p90, p95, p99, p999, p100, nodes, similarityPairs, computations

3、algo.similarity.jaccard.stream - Mode.READ

MATCH (p:Person)-[:LIKES]->(i:Item)
WITH {item:id(p), categories: collect(distinct id(i))} as userData
WITH collect(userData) as data
call algo.similarity.overlap.stream(data,{similarityCutoff:-0.1}) yield item1, item2, count1, count2, intersection, similarity RETURN item1, item2, count1, count2, intersection, similarity ORDER BY item1,item2

备注

neo4j-graph-algorithms包相似度计算源码位置:
algo/src/main/java/org/neo4j/graphalgo/similarity


标签:name,similarity,CREATE,ONgDB,Person,并发,LIKES,stars,节点
From: https://blog.51cto.com/u_13618048/5891663

相关文章

  • 图数据库ONgDB Release v-1.0.2
    图数据库ONgDBReleasev-1.0.2​​图数据库ONgDBReleasev-1.0.2​​​​一、升级内容​​​​二、其它补充​​Here’sthetableofcontents:图数据库ONgDBReleasev......
  • 图数据平台解决方案:单节点部署
    图数据平台解决方案:单节点部署​​图数据平台解决方案:单节点部署​​​​一、下载ONgDB部署文件​​​​二、Windows系统部署​​​​三、CentOS系统部署​​Here’sthet......
  • ONgDB集群测试
    ONgDB项目是neo4j企业版的一个开源分支。另外ONgDB的发起组织也在快速更新。目前最新是3.6.0版本,与企业版neo4j-3.6.0版本功能基本一致。目前企业版neo4j已经更新到4.0版本,......
  • 针对图谱超级节点的一种优化解决方案
    分享一个最近研究的课题:现实中网络结构的度分布往往呈现幂律性,意即存在这么一类节点,其拥有的边数对整个网络的边数占比显著非0,我们称这类节点为超级节点。中证中小投资者......
  • boost解析多节点的XML文件的常用代码范式
    std::stringparseBond(std::string&fileName){boost::property_tree::ptreeptree_root;try{boost::property_tree::read_xml......
  • 在Windows上运行单节点的Cassandra
    Cassandra可以安裝在很多系统上,我是安装在windowsserver2008R2上,安装相当简单,只要把下载下来的压缩包解压缩放到一个目录下就可以了,这里主要是记录下......
  • PGL图学习之项目实践(UniMP算法实现论文节点分类、新冠疫苗项目实战,助力疫情)[系列九]
    原项目链接:https://aistudio.baidu.com/aistudio/projectdetail/5100049?contributionType=11.图学习技术与应用图是一个复杂世界的通用语言,社交网络中人与人之间的连接......
  • volatile关键字在并发中有哪些作用?
    作者:小牛呼噜噜|https://xiaoniuhululu.com计算机内功、JAVA源码、职业成长、项目实战、面试相关资料等更多精彩文章在公众号「小牛呼噜噜」前言读过笔者之前的一篇......
  • unix网络编程2.4——高并发服务器(四)epoll基础篇
    目录前置文章unix网络编程1.1——TCP协议详解(一)unix网络编程2.1——高并发服务器(一)基础——io与文件描述符、socket编程与单进程服务端客户端实现unix网络编程2.2——高并......
  • 驱动开发学习笔记---并发与竞争
    一、并发与竞争简介并发:多个“用户”同时访问一个共享的内存。竞争:多个“用户”同时访问一段共享的内存并对其修改,就会造成数据混乱,甚至程序崩溃,这就是竞争。二、造成并......