Unenhanced performance of matlab GPU computing -


with intention of comparing speed of gpu vs cpu computing, ran example codes available here (a mandelbrot set on gpu) matlab central. below results obtained:

  • case 1 (without gpu): 6.2 secs

  • case 2 (using parallel.gpu.gpuarray): 6.518 secs (1.39 secs in example)

  • case 3 (using element-wise operation): 1.259 secs (0.14 secs in example)

as can seen, there no improvement in case 2 , slight improvement of around 4 times in case 3. example did not state details of gpu used, may know if due "incompetency" of graphic card or missing important?

the graphic card responsible driving display (hp z display z23i 23-inch ips led backlit monitor).

cpu: intel i7-4790, 3.6 ghz (8 cores)

gpu:

                  name: 'nvs 510'                  index: 1      computecapability: '3.0'         supportsdouble: 1          driverversion: 6         toolkitversion: 5     maxthreadsperblock: 1024       maxshmemperblock: 49152     maxthreadblocksize: [1024 1024 64]            maxgridsize: [2.1475e+09 65535 65535]              simdwidth: 32            totalmemory: 2.1475e+09             freememory: 1.6934e+09    multiprocessorcount: 1           clockratekhz: 797000            computemode: 'default'   gpuoverlapstransfers: 1 kernelexecutiontimeout: 1       canmaphostmemory: 1        devicesupported: 1         deviceselected: 1 

thank you!

edit

the gpu used in example here tesla c2050. (credits @sam roberts)

the times on link different gpu in comparison yours. don't specify kind of graphics card they're using, guess they're using more higher end card.

by googling nvs 510, specs similar card have machine. however, card geared towards business while mine geared towards gaming. have gtx 660 1 of higher end gpus available on market.

these attributes of graphics card:

cudadevice properties:                    name: 'geforce gtx 660'                  index: 1      computecapability: '3.0'         supportsdouble: 1          driverversion: 6.5000         toolkitversion: 5.5000     maxthreadsperblock: 1024       maxshmemperblock: 49152     maxthreadblocksize: [1024 1024 64]            maxgridsize: [2.1475e+09 65535 65535]              simdwidth: 32            totalmemory: 2.1475e+09             freememory: 1.5357e+09    multiprocessorcount: 5           clockratekhz: 1084500            computemode: 'default'   gpuoverlapstransfers: 1 kernelexecutiontimeout: 1       canmaphostmemory: 1        devicesupported: 1         deviceselected: 1 

the differences between card , yours have 5 multiprocessors, , clock rate 300 mhz faster yours. side-by-side comparison, check out card in comparison yours:

upon further inspection, have higher memory bandwidth card. have 960 gpu cores in comparison 192.

i decided run these tests compare performance timings. cpu i7-4770 3.6 ghz intel , have 16 gb of ram on machine.

the times running examples following:

  • case #1 - without gpu: 6.46 seconds
  • case #2 - naive gpu: 0.82 seconds - 7.9x faster
  • case #3 - through cuda: 0.09 seconds - 71.7x faster

with this, guess graphics card may of lower quality in comparison tests mathworks performed. maybe try updating graphics drivers , see if helps. however, guess performance better due multiprocessor count, faster clock, higher amount of cores , higher memory bandwidth.


Comments

Popular posts from this blog

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

python - Django-cities exits with "killed" -

python - How to get a widget position inside it's layout in Kivy? -