OpenGL es 2.0, shader language, matrices and optimisation question

Soldato
Joined
20 Jun 2010
Posts
3,251
Ok, so lately I have been writing a simple rendering engine for iOS using OpenGL. As of 2.0, calls such as glBegin have been removed in favour of the programmable shader approach, so to do anything moderately exciting requires adding back in the model, view and projection matricies.

Now, one of the most common operations a graphics engine performs is matrix multiplication, which in itself performed many times in a given frame is quite computationally expensive:

Code:
matrix[0]  = m1[0]*m2[0]  +  m1[1]*m2[4]  + m1[2]*m2[8]   + m1[3]*m2[12];
matrix[1]  = m1[0]*m2[1]  +  m1[1]*m2[5]  + m1[2]*m2[9]   + m1[3]*m2[13];
matrix[2]  = m1[0]*m2[2]  +  m1[1]*m2[6]  + m1[2]*m2[10]  + m1[3]*m2[14];
…

matrix[15] = m1[12]*m2[3] +  m1[13]*m2[7] + m1[14]*m2[11] + m1[15]*m2[15];

Now for the sake of optimisation on the CPU side, I could setup a matrix multiplication daemon with threading to split the load.

However, the OpenGL es 2.0 language gives an inbuilt matrix primitive and operations.

Code:
uniform mat4 m_model;
uniform mat4 m_view;
uniform mat4 m_projection;

gl_Position = m_projection * m_view * m_model * v_position;

Now my question is, is the matrix multiplication as expressed in the shader language (and presumable executed on the GPU) optimised? Does the matrix multiplication happen serially or in parallel? Is it better to send a precomputed on the CPU model view projection matrix to the vertex shader or is what I am doing here ok?
 
Nah thats great thanks :) Can you point me towards any better algorithms? The Strassen Algorithm is only marginally better at O(n^2.807). The Coppersmith-Winograd at O(n^2.376) but im having a tough time tracking down sources about it that arent unreadably mathematical :p

But aye thats the thing, 4x4 matrix multiplication is so common in a 3d graphics engine i cant see why it wouldn't be done in parallel, calculating each component part on a different unit. Its one of those 'would benefit massively from parallelisation' things, executing 16 calculations in 1 step.
 
Last edited:
Back
Top Bottom