c++ - Crash in thrust sorting example -
i trying first example of official website's example https://developer.nvidia.com/thrust , changed vector size 32<<23. code like:
#include <thrust/host_vector.h> #include <thrust/device_vector.h> #include <thrust/generate.h> #include <thrust/sort.h> #include <thrust/copy.h> #include <algorithm> #include <cstdlib> #include <time.h> using namespace std; int main(void){ // generate random numbers serially thrust::host_vector<int> h_vec(32 << 23); std::generate(h_vec.begin(), h_vec.end(), rand); std::cout << "1." << time(null) << endl; // transfer data device thrust::device_vector<int> d_vec = h_vec; cout << "2." << time(null) << endl; // sort data on device (846m keys per second on geforce gtx 480) thrust::sort(d_vec.begin(), d_vec.end()); // transfer data host thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin()); std::cout << "3." << time(null) << endl; return 0; }
but program crashed when running line of thrust::sort. tried alternatively use std::vector , std:sort , worked well.
is bug of thrust?? using thrust 1.7 + cuda 6.5 + visual studio 2013 update 2.
i using geforce gt 740m total memory of 2048m.
i used processexplorer monitor process , saw allocated 1.0g memory. have 2g gpu memory, 16g main cpu memory.
the error message "a problem caused program stop working correctly. windows close program , notify if solution available. [debug] [close program]". after clicking [debug], see call stack. issue line:
thrust::device_vector<int> d_vec = h_vec;
the last source cuda this:
testcuda.exe!thrust::system::cuda::detail::malloc<thrust::system::cuda::detail::tag>(thrust::system::cuda::detail::execution_policy<thrust::system::cuda::detail::tag> & __formal, unsigned __int64 n) line 48 c++
it seems memory allocation issue. have 2g gpu memory, 16g main cpu memory. why??
to robert:
the original example works well, 32<<21, 32<<22. there virtual memory management system gpu memory? continuous here means physically continuous or virtually? there exception raised in scenario can catch it?
my test code herer: https://github.com/henrywoo/wufuheng/blob/master/testcuda.cu
in test, there no exception, runtime error.
sizeof(int) * 32<<23 = 4* 2^28i.e. allocating 1 gb of gpu ram. likely, card cannot handle many elements. might because:
- there isn't enough gpu ram in general
- there isn't enough continuous free gpu ram (this needed because vector has fit in continuous piece of memory)
Comments
Post a Comment