Soldato
- Joined
- 20 Jun 2010
- Posts
- 3,251
Ok, so lately I have been writing a simple rendering engine for iOS using OpenGL. As of 2.0, calls such as glBegin have been removed in favour of the programmable shader approach, so to do anything moderately exciting requires adding back in the model, view and projection matricies.
Now, one of the most common operations a graphics engine performs is matrix multiplication, which in itself performed many times in a given frame is quite computationally expensive:
Now for the sake of optimisation on the CPU side, I could setup a matrix multiplication daemon with threading to split the load.
However, the OpenGL es 2.0 language gives an inbuilt matrix primitive and operations.
Now my question is, is the matrix multiplication as expressed in the shader language (and presumable executed on the GPU) optimised? Does the matrix multiplication happen serially or in parallel? Is it better to send a precomputed on the CPU model view projection matrix to the vertex shader or is what I am doing here ok?
Now, one of the most common operations a graphics engine performs is matrix multiplication, which in itself performed many times in a given frame is quite computationally expensive:
Code:
matrix[0] = m1[0]*m2[0] + m1[1]*m2[4] + m1[2]*m2[8] + m1[3]*m2[12];
matrix[1] = m1[0]*m2[1] + m1[1]*m2[5] + m1[2]*m2[9] + m1[3]*m2[13];
matrix[2] = m1[0]*m2[2] + m1[1]*m2[6] + m1[2]*m2[10] + m1[3]*m2[14];
…
matrix[15] = m1[12]*m2[3] + m1[13]*m2[7] + m1[14]*m2[11] + m1[15]*m2[15];
Now for the sake of optimisation on the CPU side, I could setup a matrix multiplication daemon with threading to split the load.
However, the OpenGL es 2.0 language gives an inbuilt matrix primitive and operations.
Code:
uniform mat4 m_model;
uniform mat4 m_view;
uniform mat4 m_projection;
gl_Position = m_projection * m_view * m_model * v_position;
Now my question is, is the matrix multiplication as expressed in the shader language (and presumable executed on the GPU) optimised? Does the matrix multiplication happen serially or in parallel? Is it better to send a precomputed on the CPU model view projection matrix to the vertex shader or is what I am doing here ok?