首页 > 其他分享 >PageRank parallel solutions

PageRank parallel solutions

时间:2024-10-11 20:24:08浏览次数:1  
标签:thread rank parallel PageRank solutions time strategy page

Assignment 4 
Due Friday by 11:59pm Points 70 
Submitting a file upload Available Oct 4 at 12am - Dec 24 at 11:59pm Start Assignment 
Assignment 4 (70 Points) 
ue Friday October 11 @ 11:59PM In this assignment, we will improve the parallel solutions of PageRank from Assignment 3. You will 
implement different task decomposition and mapping strategies, and observe their effect. You can use the same serial versions of the pull-based and push-basedPageRank from Assignment 3 to check for the correctness of your results. You should use your parallel solutions " page_rank_pull_parallel.cpp " 
and " page_rank_push_parallel_atomic.cpp 
" as the starting point of this assignment. You do not need to download a separate tarball for this assignment since all you need was provided in Assignment 3. 
You should have completed the Slurm Tutorial (https://canvas.sfu.ca/courses/84236/pages/slurm-tutorial) which walks you through how to use our servers for your code testing. If you're still have trouble with 
Slurm, please post questions on the canvas discussion board. It is strongly recommended to use Slurm 
for this assignment instead of CSIL machines, since the graphs you need for checking your programs 
re on the /scratch/ drive on the cluster's compute nodes. General Instructions 1. You are provided with the serial versions and some other tools in the same tarball from Assignment 

Assignment 4 

83. You will be asked to print the time spent by different threads on specific code regions. The time spent 
by any code region can be computed as follows: 
timer t1; 
t1.start(); 
/* ---- Code region whose time is to be measured --- */ 
double time_taken = t1.stop(); 
4. If you need to time a sub-section inside a loop, you can do that as follows: 
double time_taken = 0.0; 
timer t1; 
while(True){ 
/* ---- Code region whose time should not be measured --- */ 
t1.start(); 
/* ---- Code region whose time is to be measured --- */ 
time_taken += t1.stop(); 
/* ---- Code region whose time should not be measured --- */ 

std::cout << "Time spent on required code region : " << time_taken << "\n"; 
5. The programs operate on graph datasets. Sample input graphs are available 
at 
/scratch/input_graphs/ 
on the compute nodes (note they are present on the compute nodes only, 
and hence you can access them via slurm only). 
$ ls /scratch/input_graphs/*.cs* 
rmat.csc         test_25M_50M.csc   rmat.csr         test_25M_50M.csr  lj.csc           
      roadNet-CA.csc       
web-Google.csc  lj.csr                 roadNet-CA.csr       web-Google.csr 
Please read Assignment 3's general instructions if you want to generate more tests graphs. 
1. Add a "strategy" Command Line Parameter 
For your parallel programs " 
page_rank_pull_parallel.cpp 
" and " page_rank_push_parallel_atomic.cpp 

from Assignment 3, you need to add a command line parameter "strategy" which is used to define the program's task decomposition and mapping strategy. If the parameter is set to 1, i.e., 
--strategy 1 , your parallel program should run the same exact way as specified in Assignment 3. "strategy" values 
correspond to the strategies described in the following sections. Strategy 1: 
Vertex-based decomposition and static mapping, the same strategy as your Assignment 3 programs. 
Vertices are statically distributed among threads such that each thread performs all computations on n/T 
vertices. 
2. Edge-based Decomposition for PageRank 
Strategy 2: To achieve more load balance among threads, vertices are distributed among threads such that each thread performs computations on m/T edges. Pull-based PageRank distributes vertices based  
Assignment 4 2/8on their in-degrees, while Push-based PageRank distributes vertices based on their out-degrees. Both of your parallel programs need to support this strategy by setting the strategy parameter to 2 in the 
command line ( 
--strategy 2 
). For example, the pull-based algorithm's pseudo-code is: 
Create T threads Distribute work between threads so that each thread gets to compute on approximately m/T edges 
for(i=0; i<max_iterations; i++) { 
for each thread in parallel { 
for each vertex 'v' allocated to the thread { 
for vertex 'u' in inNeighbor(v) 
next_page_rank[v] += (current_page_rank[u]/outdegree[u]) 


for each thread in parallel { 
for each vertex 'v' allocated to the thread { 
compute the new_pagerank using the accumulated values in next_page_rank[v]. 
current_page_rank[v] = new_pagerank 
Reset next_page_rank[v] to 0 



As an example for how edge-based task decomposition works, a graph has the following vertex distribution <vertex, in-degree, out-degree>: 
v1, 8, 1 
v2, 5, 6 
v3, 8, 3 
v4, 2, 8 
v5, 3, 5 
v6, 1, 4 
v7, 5, 6 
v8, 4, 3 The graph has 36 edges. To distribute the work evenly among 4 threads, each thread needs to compute approximately 36/4=9 edges. Each thread gets assigned vertices until the total assigned edges is >= (thread_id+1) * m/T. Examining the in-degrees, the pull-basedalgorithm has the following vertex 
distribution: Thread 0: v1, v2 (13 edges >= 1x9); Thread 1: v3 (8 edges so 13+8 >= 2x9); Thread 2: v4, v5, v6 (6 edges, so 13+8+6 >= 3x9); Thread 3: v7, v8 (9 edges) 
For the push-based algorithm, we can use the following vertex distribution based on out-degrees: Thread 0: v1, v2, v3 (10 edges >= 1x9); Thread 1: v4 (8 edges so 10+8 >= 2x9); Thread 2: v5, v6 (9 
edges so 10+8+9 >= 27); Thread 3: v7, v8 (9 edges) Note that the pull-based algorithm in the above example should have a similar runtime as the vertex
based decomposition since the longest thread needs to compute 13 edges in both cases. However, the 
push-based algorithm has a shorter runtime since the longest thread computes 10 edges while the vertex-based decomposition requires 11 edge computations for the longest thread. 
 Assignment 4 3/83. Vertex-based Decomposition for PageRank with Dynamic 
Mapping 
Strategy 3: To achieve even more load balance among threads, vertices are mapped to threads dynamically based on the time at which different threads finish theirtasks. Instead of allocating approximately equal numbers of vertices (or edges) to threads, we can dynamically allocate work to each thread whenever it is free. In this strategy, each thread dynamically gets the next vertex to be 
computed until all the vertices are processed. Both of your parallel programs need to support this 
strategy by setting the strategy parameter to 3 in the command line ( 
--strategy 3 
). Below is the 
seudo-code showing dynamic task mapping for the push-based Page Rank algorithm with vertex-based 
decomposition: 
Create T threads 
for each thread in parallel { 
for(i=0; i<max_iterations; i++) { 
while(true){ 
u = getNextVertexToBeProcessed(); 
if(u == -1) break; 
               
edges_processed += outDegree(u) // used in output validation 
for vertex v in outNeighbor(u) 
next_page_rank[v] += (current_page_rank[u]/outdegree[u]) 

barrier1 
while(true){ 
v = getNextVertexToBeProcessed(); 
if(v == -1) break; 
vertices_processed += 1 // used in output validation 
compute the new_pagerank using the accumulated values in next_page_rank[v]. 
current_page_rank[v] = new_pagerank 
Reset next_page_rank[v] to 0 

barrier2 


You need to implement the above dynamic mapping strategy in your solutions for 
both " 
page_rank_pull_parallel.cpp 
" and " 
page_rank_push_parallel_atomic.cpp 

4. Vertex-based Decomposition for PageRank with Coarse-Grained 
Dynamic Mapping 
Strategy 4: To reduce the time spent by each thread on the 
getNextVertexToBeProcessed() 
, we will vary 
the task granularity so that each thread receives multiple vertices to be processed each time it 
calls 
getNextVertexToBeProcessed() 
. Both of your parallel programs need to support this strategy by 
setting the strategy parameter to 4 in the command line ( 
--strategy 4 
). You need to update the dynamic 
load distribution logic as follows: 
Each thread processes 
k vertices and then calls 
getNextVertexToBeProcessed() 

Here, 

determines the granularity of the work done by each thread before requesting new work. 
 
Assignment 4 

4/8For example, 
If 
k = 1 
, the thread calls 
getNextVertexToBeProcessed() 
after processing each vertex (exactly the 
same way as strategy 3) 
If 
k = 1000 
, the thread calls 
getNextVertexToBeProcessed() 
after processing 1000 vertices. 
The 
getNextVertexToBeProcessed() 
function should return 




2k 
, ... depending on the 
granularity 



should be provided at run time using command-line parameter. e.g: 
--granularity 100 
Below is the pseudo-code showing the logic of our parallel solution: 
k = 1000 // granularity 
Create T threads 
for each thread in parallel { 
for(i=0; i<max_iterations; i++) { 
while(true){ 
u = getNextVertexToBeProcessed() 
if(u == -1) break; 
for (j = 0; j < k; j++) { 
edges_processed += outDegree(u) // used in output validation 
for vertex v in outNeighbor(u) 
next_page_rank[v] += (current_page_rank[u]/outdegree[u] 
u++ 
if(u >= n) break; // n is the total number of vertices in the graph 


barrier1 
while(true){ 
v = getNextVertexToBeProcessed() 
if(v == -1) break; 
for (j = 0; j < k; j++) { 
vertices_processed += 1 // used in output validation 
compute the new_pagerank using the accumulated values in next_page_rank[v]. 
current_page_rank[v] = new_pagerank 
Reset next_page_rank[v] to 0 
v++ 
if(v >= n) break; // n is the total number of vertices in the graph 


barrier2 


This strategy should be used with command-line parameter 
--granularity 
to specify the granularity. You 
need to add support for 
--granularity 
to both of your parallel programs " 
page_rank_pull_parallel.cpp 

and " 
page_rank_push_parallel_atomic.cpp 

Input and Output Formats: 
1. Your program files should be named 
page_rank_pull_parallel.cpp 
and 
page_rank_push_parallel_atomic.cpp 
and should support the 
following command-line parameters: 
--nThreads 
: The number of threads. 
--nIterations 
: The number of iterations (similar to the serial code). Default is 10. 
 
Assignment 4 

5/8--inputFile 
: The absolute path to the input graph file (similar to the serial code). 
--strategy 
: A value of either 1, 2, 3, or 4 corresponding to the strategies defined above. Any 
other value should lead to program termination. The default strategy is 1. 
--granularity 
: A positive integer to be used only in strategy 4. This parameter is ignored for 
other strategies. You need to check that 
granularity 
is always a positive integer. Zero, negative, 
and non-integer values lead to program termination. The default granularity is 1. 
2. Your parallel solution must output the following information: 
Total number of threads used. 
Number of iterations. 
Strategy used 
Granularity used 
For each thread: 
Thread id (your threads should be numbered between 
[0, T) 

Number of vertices processed (across all iterations) - This is the number of vertices processed only in the second for 
loop across all iterations. Refer the pseudocode 代 写PageRank parallel solutions of pagerank above. Number of edges processed (across all iterations) - This is the number of edges processed only inthe first 
for 
loop across all iterations. Refer the pseudocode of pagerank above. 
Cumulative time spent waiting at 
barrier1 
(in seconds) 
Cumulative time spent waiting at 
barrier2 
(in seconds) 
Cumulative time spent waiting at 
getNextVertexToBeProcessed() 
(in seconds). 
Time taken by the thread (in seconds). 
The sum of pageranks of all vertices. 
The total time taken for the entire execution, including the time used in allocating vertices or 
edges to threads. 
3. The sample console output is below. Please note that you should use this same format exactly for all 
4 parts. If an output is not generated (e.g., getNextVertex time in strategy 1 and 2, print 0.0 instead). 
Using DOUBLE 
Number of Threads : 4 
Strategy : 4 
Granularity : 1 
Iterations : 20 
Reading graph 
Created graph 
thread_id, num_vertices, num_edges, barrier1_time, barrier2_time, getNextVertex_time, total_time 
0, 4576646, 26069763, 0.001790, 0.001063, 4.067461, 10.366047 
1, 4625648, 23284082, 0.001947, 0.001063, 4.107634, 10.366017 
2, 4617670, 26393229, 0.001654, 0.001049, 4.029203, 10.365975 
3, 4508596, 26353706, 0.001668, 0.001059, 4.101155, 10.365950 
Sum of page ranks : 618961.562500 
Time taken (in seconds) : 10.366616 
Testing We are not providing a testing script for this assignment. However, you should strictly adhere to the 
format above, including all the spaces and commas. You can test your program using similar command 

Assignment 4 6/8lines to the test cases of Assignment 3 while adding the strategy parameter. We will not disclose the test 
cases used for grading before the assignment due date. Assignment Report In addition to your parallel code, you need to submit a report (in pdf format) that answers the following 
two questions: 
Q1. Run each of your two parallel programs with strategy 4 above using 8 threads on the test_25M_50M data set with granularities of 10, 100, 1000, 2000. Each of your parallel programs should run 3 times, 
ach including 20 iterations on the slow cluster using the default floating point data type. {Total number of runs is 2 (parallel versions) x 4 (different granularities) x 3 (number of runs for each version/thread 
combination) = 24 runs] 
Plot a graph with average execution time on the y-axis, thread count on the x-axis where each 
granularity has 2 bars, one for the average runtime of each of your parallel versions. Your graph should 
be something like this (obviously with different values): 
Q2. Based on data from the graph in Q1, which of the four granularities is better for pull? And which is 
better for the push_atomic algorithm? 
Submission Guidelines 
Make sure that your solutions folder has the following files and sub-folders. Let's say your solutions 
folder is called 
my_assignment4_solutions 
. It should contain: 
core/ 
-- The folder containing all core files. It is already available in the assignment package. Do not modify it or remove any files. Makefile 
-- Makefile for the project. This file should not be changed. page_rank_pull_parallel.cpp 
 Assignment 4 7/8page_rank_push_parallel_atomic.cpp report.pdf -- A pdf file that includes answers to questions in the previous section.  
To create the submission file, follow the steps below: 
1. Enter in your solutions folder, and remove all the object/temporary files. 
$ cd my_assignment4_solutions/ 
$ make clean 
$ rm -rf input_graphs OR mv input_graphs ../ to avoid large tarballs 
2. Create the tar.gz file. 
$ tar cvzf assignment4.tar.gz * which creates a compressed tar ball that contains the contents of the folder. Submit via Canvas by the deadline.  
Assignment 4 8/8

标签:thread,rank,parallel,PageRank,solutions,time,strategy,page
From: https://www.cnblogs.com/comp9313/p/18459196

相关文章

  • macbook m1 pro 使用parallel desktop安装ubuntu24.04以及docker+网络配置
    1.使用paralleldesktop安装ubuntu这个不多说,一开始以为使用24.04版本太新,目前倒是也没遇到什么问题,直接使用pd首页提供的镜像就可以2.配置网络我本地是在macm1pro上,使用了shadowrocket,打开sr,下面设置中有一个代理共享,开启它,并配置一个端口回到ubuntu中,打开网络配置,把......
  • 流水线并行(Pipeline Parallelism)原理详解
    文章目录0.概览1.简单流水并行2.GPipe算法3.GPipe空间复杂度4.PipeDream算法5.总结参考0.概览数据并行(DataParallelism):在不同的GPU上运行同一批数据的不同子集;流水并行(PipelineParallelism):在不同的GPU上运行模型的不同层;模型并行(ModelParallelism):将......
  • C# Parallel ConcurrentBag
    usingSystem.Collections.Concurrent;usingSystem.Diagnostics;namespaceConsoleApp85{internalclassProgram{staticvoidMain(string[]args){try{Stopwatchwatch=newStopwatch();......
  • C# Linq.FirstOrDefault、Linq.Where、Linq.AsParallel、List.Exists、List.Find、Dic
    C#Linq.FirstOrDefault、Linq.Where、Linq.AsParallel、List.Exists、List.Find、Dictionar.TryGetValue、HashSet.Contains性能的比较 今天我们来比较一下集合检索方法性能更优问题,测试代码publicclassEntity{publicintId{get;set;}publicintNo{......
  • Count Equation Solutions 题解
    前言题目链接:洛谷;UVA。题意简述求以下方程解的个数:\[a_1x_1-a_2x_2+a_3x_3-a_4x_4+a_5x_5-a_6x_6=0\]其中\(1\leqx_i\leqm\leq10^2\),\(x_i\in\mathbb{Z}\),多测。题目分析把\(a_2,a_4,a_6\)变成其相反数,变成\(\sum\limits_{i=1}^6a......
  • centos 7 for Mac m3 parallel desktop 安装
    镜像下载地址https://www.alipan.com/t/1VYeNVvBvDLBeuW24r6i失效请追加评论,安装过程省略问题关闭selinux无法启动1、启动进入单用户模式启动后按e进入在最后加入selinux=0ctrl+x启动,完成。2、修改grub文件:vi/etc/grub2-efi.cfg在第100行末尾加入selinux=0100......
  • vue打包优化——使用webpack-parallel-uglify-plugin并行压缩JavaScript
    1、安装插件npminstallwebpack-parallel-uglify-plugin--save-dev我用的install命令,其他命令大同小异,大家百一下就行2、配置vue.config.js首先引入插件:constParallelUglifyPlugin=require('webpack-parallel-uglify-plugin');这里注意我用的vue-cli构建的项目,所以修改w......
  • ParallelStreamPerformance2
    packagecom.shrimpking.t4;/***CreatedbyIntelliJIDEA.**@Author:Shrimpking*@create2024/9/1212:09*/publicinterfaceTest{voidtest();}packagecom.shrimpking.t4;/***CreatedbyIntelliJIDEA.**@Author:Shrimpking*@cr......
  • ParallelStreamPerformance
    packagecom.shrimpking.t4;/***CreatedbyIntelliJIDEA.**@Author:Shrimpking*@create2024/9/1212:09*/publicinterfaceTest{voidtest();}packagecom.shrimpking.t4;/***CreatedbyIntelliJIDEA.**@Author:Shrimpking*@cr......
  • ReduceParallelStream
    packagecom.shrimpking.t3;importjava.util.Arrays;importjava.util.List;importjava.util.stream.Stream;/***CreatedbyIntelliJIDEA.**@Author:Shrimpking*@create2024/9/1118:27*/publicclassReduceParallelStream{publicstaticv......