Unenhanced performance of matlab GPU computing -
with intention of comparing speed of gpu vs cpu computing, ran example codes available here (a mandelbrot set on gpu) matlab central. below results obtained:
case 1 (without gpu): 6.2 secs
case 2 (using parallel.gpu.gpuarray): 6.518 secs (1.39 secs in example)
case 3 (using element-wise operation): 1.259 secs (0.14 secs in example)
as can seen, there no improvement in case 2 , slight improvement of around 4 times in case 3. example did not state details of gpu used, may know if due "incompetency" of graphic card or missing important?
the graphic card responsible driving display (hp z display z23i 23-inch ips led backlit monitor).
cpu: intel i7-4790, 3.6 ghz (8 cores)
gpu:
name: 'nvs 510' index: 1 computecapability: '3.0' supportsdouble: 1 driverversion: 6 toolkitversion: 5 maxthreadsperblock: 1024 maxshmemperblock: 49152 maxthreadblocksize: [1024 1024 64] maxgridsize: [2.1475e+09 65535 65535] simdwidth: 32 totalmemory: 2.1475e+09 freememory: 1.6934e+09 multiprocessorcount: 1 clockratekhz: 797000 computemode: 'default' gpuoverlapstransfers: 1 kernelexecutiontimeout: 1 canmaphostmemory: 1 devicesupported: 1 deviceselected: 1
thank you!
edit
the gpu used in example here tesla c2050. (credits @sam roberts)
the times on link different gpu in comparison yours. don't specify kind of graphics card they're using, guess they're using more higher end card.
by googling nvs 510
, specs similar card have machine. however, card geared towards business while mine geared towards gaming. have gtx 660 1 of higher end gpus available on market.
these attributes of graphics card:
cudadevice properties: name: 'geforce gtx 660' index: 1 computecapability: '3.0' supportsdouble: 1 driverversion: 6.5000 toolkitversion: 5.5000 maxthreadsperblock: 1024 maxshmemperblock: 49152 maxthreadblocksize: [1024 1024 64] maxgridsize: [2.1475e+09 65535 65535] simdwidth: 32 totalmemory: 2.1475e+09 freememory: 1.5357e+09 multiprocessorcount: 5 clockratekhz: 1084500 computemode: 'default' gpuoverlapstransfers: 1 kernelexecutiontimeout: 1 canmaphostmemory: 1 devicesupported: 1 deviceselected: 1
the differences between card , yours have 5 multiprocessors, , clock rate 300 mhz faster yours. side-by-side comparison, check out card in comparison yours:
nvs 510
: http://www.nvidia.ca/object/nvs-510-graphics-card.html#pdpcontent=2gtx 660
: http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-660/specifications
upon further inspection, have higher memory bandwidth card. have 960 gpu cores in comparison 192.
i decided run these tests compare performance timings. cpu i7-4770 3.6 ghz intel , have 16 gb of ram on machine.
the times running examples following:
- case #1 - without gpu: 6.46 seconds
- case #2 - naive gpu: 0.82 seconds - 7.9x faster
- case #3 - through cuda: 0.09 seconds - 71.7x faster
with this, guess graphics card may of lower quality in comparison tests mathworks performed. maybe try updating graphics drivers , see if helps. however, guess performance better due multiprocessor count, faster clock, higher amount of cores , higher memory bandwidth.
Comments
Post a Comment