Unenhanced performance of matlab GPU computing -
with intention of comparing speed of gpu vs cpu computing, ran example codes available here (a mandelbrot set on gpu) matlab central. below results obtained:
case 1 (without gpu): 6.2 secs
case 2 (using parallel.gpu.gpuarray): 6.518 secs (1.39 secs in example)
case 3 (using element-wise operation): 1.259 secs (0.14 secs in example)
as can seen, there no improvement in case 2 , slight improvement of around 4 times in case 3. example did not state details of gpu used, may know if due "incompetency" of graphic card or missing important?
the graphic card responsible driving display (hp z display z23i 23-inch ips led backlit monitor).
cpu: intel i7-4790, 3.6 ghz (8 cores)
gpu:
name: 'nvs 510' index: 1 computecapability: '3.0' supportsdouble: 1 driverversion: 6 toolkitversion: 5 maxthreadsperblock: 1024 maxshmemperblock: 49152 maxthreadblocksize: [1024 1024 64] maxgridsize: [2.1475e+09 65535 65535] simdwidth: 32 totalmemory: 2.1475e+09 freememory: 1.6934e+09 multiprocessorcount: 1 clockratekhz: 797000 computemode: 'default' gpuoverlapstransfers: 1 kernelexecutiontimeout: 1 canmaphostmemory: 1 devicesupported: 1 deviceselected: 1 thank you!
edit
the gpu used in example here tesla c2050. (credits @sam roberts)
the times on link different gpu in comparison yours. don't specify kind of graphics card they're using, guess they're using more higher end card.
by googling nvs 510, specs similar card have machine. however, card geared towards business while mine geared towards gaming. have gtx 660 1 of higher end gpus available on market.
these attributes of graphics card:
cudadevice properties: name: 'geforce gtx 660' index: 1 computecapability: '3.0' supportsdouble: 1 driverversion: 6.5000 toolkitversion: 5.5000 maxthreadsperblock: 1024 maxshmemperblock: 49152 maxthreadblocksize: [1024 1024 64] maxgridsize: [2.1475e+09 65535 65535] simdwidth: 32 totalmemory: 2.1475e+09 freememory: 1.5357e+09 multiprocessorcount: 5 clockratekhz: 1084500 computemode: 'default' gpuoverlapstransfers: 1 kernelexecutiontimeout: 1 canmaphostmemory: 1 devicesupported: 1 deviceselected: 1 the differences between card , yours have 5 multiprocessors, , clock rate 300 mhz faster yours. side-by-side comparison, check out card in comparison yours:
nvs 510: http://www.nvidia.ca/object/nvs-510-graphics-card.html#pdpcontent=2gtx 660: http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-660/specifications
upon further inspection, have higher memory bandwidth card. have 960 gpu cores in comparison 192.
i decided run these tests compare performance timings. cpu i7-4770 3.6 ghz intel , have 16 gb of ram on machine.
the times running examples following:
- case #1 - without gpu: 6.46 seconds
- case #2 - naive gpu: 0.82 seconds - 7.9x faster
- case #3 - through cuda: 0.09 seconds - 71.7x faster
with this, guess graphics card may of lower quality in comparison tests mathworks performed. maybe try updating graphics drivers , see if helps. however, guess performance better due multiprocessor count, faster clock, higher amount of cores , higher memory bandwidth.
Comments
Post a Comment