lambda structure
[capture list] (parameter list) -> return type { function body }
(parameter list) and return type are optional
Value and Reference capture
There is a notable question: when lambda used between CPU and GPU, we need to notice the store location of variable.
For example, if we open up a device memory, but the code is written in host code and the lambda also in host code. Now, we want use lambda to capture a variable, as shown in the following code.
template <typename Func>
__global__ void myFunc(Func func) {
std::cout << func(0) << std::endl;
}
int main() {
std::vector<int, CudaAllocator<int>> arr(n);
muFunc<<<gridSize, blockSize>>> ([] __device__ (int i) -> void{
arr[i] = 0;
});
return 0;
}
The
CudaAllocator
is a custom class which implements theallocate
anddeallocate
member function
Please note that the above code is incomplete, if we just use the []
, we can't use the arr
in the lambda function body. But what should we use in []
?
In fact, we use [&]
or [=]
directly are both wrong method. When we use [&]
, its meaning is like we let a device code to access a variable which is opened up in host memory. And if we use the [=]
, its meaning is like we copy the entire data of vector, it's time and resource consuming.
So, what is the right way to implement it ? We can use arr.data()
to get the orginal pointer which points to the device memory, note that this variable is still in the host memory, so next we can use [=]
to get a copy value so that device code can use it. In summary, the first step is used to solve the resource copy problem, and the second step is used to solve the resource access between host and device memory.