Fast square root function (C++)

Una

Una

Associate
Joined
26 Nov 2004
Posts
2,471
Location
Reading / Lake District
It depends how accurate you want it to be?

Newton-Rapheson approximation.

This is a hell of a fast way to do it - just take 1/the result for the sqrt.

Code:
float InvSqrt(float x)
{
   float xhalf = 0.5f*x;
   int i = *(int*)&x; // get bits for floating value
   i = 0x5f375a86- (i>>1); // gives initial guess y0
   x = *(float*)&i; // convert bits back to float
   x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
   return x;
}

Don't expect me to explain how the magic constant works though, there is a whole paper going into it :p

You could possibly make it quicker if you drop to assembly language and make better use of SSE2 registers... but I think thats an overkill for a particle engine...
 
Last edited:
Soldato
Joined
18 Oct 2002
Posts
5,159
Location
Riding my bike
You could think about using a look up table to give you a first value for the Newton-Rapheson as it would give better accuarcy with fewer iterations...

Or if the calculations are all multiplys and sqrts then use logs and add/half them to acheive the same result.
 
Associate
Joined
17 Oct 2005
Posts
311
Una said:
It depends how accurate you want it to be?

Newton-Rapheson approximation.

This is a hell of a fast way to do it - just take 1/the result for the sqrt.
Don't do that - divides are slow! Instead multiply by the original number: x * (1/sqrt(x)) = sqrt(x). For most graphics work the reciprocal sqrt is what you want anyhow.

You could possibly make it quicker if you drop to assembly language and make better use of SSE2 registers...
I think even SSE had support for doing fast approximate invsqrt(x); not sure what the best approach is these days though.
 
Associate
Joined
17 Oct 2005
Posts
311
yer_averagejoe said:
tried the newton-raphson method and the reciprocal method and decided to go with the newton raphson as it worked better.

when i tried using x*(invsqrt(x)) instead of dividing by 1, i got some very strange results.
Do you mind explaining the equations you're using that leave you needing a square root? (Rather than the inverse square root).
 
Soldato
OP
Joined
23 May 2005
Posts
2,964
Location
Auckland, New Zealand
Well I've written the algorithm right now in a step by step manner (I will change this of course, it just makes it easier to read while coding) so it does infact use the inverse square root but the way i have it right now is it uses the square root then is divided etc.. So can get around the problem of dividing by 1 when it comes to compressing the code.

It just takes the viewing point, the point of a partilce and the particles normal vector, creates 2 vectors from these and works out the angle between them (on 2 axis):

so the acute angle in radians = invCos{(a.b)/(|a|x|b|)}

where |x| is sqrt(xi^2 + xj^2 + xk^2)

so therefore the cos(anlge) is given by (a.b)x(invsqrt(ai^2 + aj^2 + ak^2))x(invsqrt(bi^2 + bj^2 + bk^2))

lol i think i typed all that ok.. those of you who get the jist will know it all anyway.. obviously to get the anlges on 2 axis' some values can be negated and not included in the calculations (because it effectively becomes 2 2D calculations and the same normal can be used)

Now... fast collision detection algorithms anyone? hehe.
 
Associate
Joined
17 Oct 2005
Posts
311
yer_averagejoe said:
Well I've written the algorithm right now in a step by step manner (I will change this of course, it just makes it easier to read while coding) so it does infact use the inverse square root but the way i have it right now is it uses the square root then is divided etc.. So can get around the problem of dividing by 1 when it comes to compressing the code.
That would be good. Because you're wasting a lot of cycles doing two unneeded divides.

so therefore the cos(anlge) is given by (a.b)x(invsqrt(ai^2 + aj^2 + ak^2))x(invsqrt(bi^2 + bj^2 + bk^2))
Note that invsqrt(ai^2 + aj^2 + ak^2))x(invsqrt(bi^2 + bj^2 + bk^2)) = invsqrt[(ai^2 + aj^2 + ak^2)(bi^2 + bj^2 + bk^2)]

So overall, that will get you from 2 invsqrts + 4 divides per particle to just 1 invsqrt per particle.
 
Back
Top Bottom