超算是离我们平时生活比较远的一个事情,即使是对于一个计算机专业方向的学生来说,正好实验室得到了华为的超算平台的使用账号,于是就摸索了一下,不得不承认这个东西确实不是普通人能搞的明白的。
基本概念:
一个工作Job可以开多个副本,每个副本都是mpirun -N 所开出的,每个副本又被叫做任务task,而每个任务task又可以申请多个CPU核心和多个GPU计算资源。
运算代码:
import mpi4py.MPI as MPI import sys import socket import numpy as np def func1(queue, num): import time # time.sleep(num) # time.sleep(1) x = np.random.rand(100) for _ in range(1000000): x += np.random.rand(100) num += np.sum(x) queue.put(num) def run_queue(): from multiprocessing import Process, Queue ps = 120 queue = Queue(maxsize=200) # the following attribute can call in anywhere process = [Process(target=func1, args=(queue, num)) for num in range(ps)] [p.start() for p in process] [p.join() for p in process] return [queue.get() for p in process] comm = MPI.COMM_WORLD comm_rank = comm.Get_rank() comm_size = comm.Get_size() node_name = MPI.Get_processor_name() # node_name = socket.gethostname() # point to point communication data_send = [comm_rank]*1 comm.send(data_send,dest=(comm_rank+1)%comm_size) res = run_queue() ### data_recv =comm.recv(source=(comm_rank-1)%comm_size) # print("my rank is %d, and Ireceived:" % comm_rank, data_recv, file=sys.stdout, flush=True) # print(data_recv) with open("/home/share/xxxxxxxxxx/home/xxxxxxxx/xxxxxxx/results/{}.txt".format(comm_rank, ), "w") as f: f.write("my rank is %d/%d, and node_name: %s Ireceived:" % (comm_rank, comm_size, node_name) + str(data_recv) + str(res) + "\n" )
超算的启动命令:( -R 为task做资源申请 )
一个job开8个task,每个task申请120个CPU:
/opt/batch/cli/bin/dsub -n task_test -A xxxxxxxxxxxx --priority 9999 --job_retry 10 --job_type hmpi -R "cpu=120;mem=128" -N 8 -eo error.txt -oo output.txt /home/share/xxxxxxxxxx/home/xxxxxxx/xxxxxxx/run_python.sh
运行时间:6分43秒
一个job开8个task,每个task申请1个CPU:
/opt/batch/cli/bin/dsub -n task_test -A xxxxxxxxxxxx --priority 9999 --job_retry 10 --job_type hmpi -R "cpu=1;mem=128" -N 8 -eo error.txt -oo output.txt /home/share/xxxxxxxxxx/home/xxxxxxx/xxxxxxx/run_python.sh
标签:task,--,平台,rank,queue,num,comm,单任务 From: https://www.cnblogs.com/devilmaycry812839668/p/17525506.html