OpenGL ImageStore slower when storing certain values -
i writing on 3d textures defined follows:
layout (binding = 0, rgba8) coherent uniform image3d volumegeom[miplevels]; layout (binding = 4, rgba8) coherent uniform image3d volumenormal[miplevels]; and writing following values
float df = dot(normalize(in.worldnormal), -gspotlight.direction); if(df < 0) df = 0; else df = 1; fragmentcolor = texture2d(gsampler, in.texcoord0.xy); imagestore(volumegeom[in.geominstance], ivec3(coords1), fragmentcolor); imagestore(volumenormal[in.geominstance], ivec3(coords1), vec4(normalize(in.worldnormal), 1.0)); fragmentcolor = vec4(fragmentcolor.xyz*calcshadowfactor(in.lightspacepos)*df,1.0); as can see write first fragmentcolor , purposely put other fragmentcolor after store though don't it. configuration runs @ 21 fps. if this
float df = dot(normalize(in.worldnormal), -gspotlight.direction); if(df < 0) df = 0; else df = 1; fragmentcolor = texture2d(gsampler, in.texcoord0.xy); fragmentcolor = vec4(fragmentcolor.xyz*calcshadowfactor(in.lightspacepos)*df,1.0); imagestore(volumegeom[in.geominstance], ivec3(coords1), fragmentcolor); imagestore(volumenormal[in.geominstance], ivec3(coords1), vec4(normalize(in.worldnormal), 1.0)); in second fragmentcolor computed before , stored in volumegeom entire thing runs @ 13 fps. means imagestore running slower depending on values writing in. because of compiler optimization? second fragmentcolor 0 surfaces in shadow or backface light.
not sure how complicated calcshadowfactor method is, in first version, image stores can overlap calculation, while in second, calculation has end first before stores can called. this, plus increased register pressure, can enough make #2 version slower #1.
if complete source code, in #1 case call calcshadowfactor optimized out, result never used (if fragmentcolor never read again , not output.)
Comments
Post a Comment