Fast square root function (C++)

It depends how accurate you want it to be?

Newton-Rapheson approximation.

This is a hell of a fast way to do it - just take 1/the result for the sqrt.

Code:
float InvSqrt(float x)
{
   float xhalf = 0.5f*x;
   int i = *(int*)&x; // get bits for floating value
   i = 0x5f375a86- (i>>1); // gives initial guess y0
   x = *(float*)&i; // convert bits back to float
   x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
   return x;
}

Don't expect me to explain how the magic constant works though, there is a whole paper going into it :p

You could possibly make it quicker if you drop to assembly language and make better use of SSE2 registers... but I think thats an overkill for a particle engine...
 
Last edited:
It doesn't actually seem to be any faster :S. carrying out around 12000 square roots in my code per frame and theres no difference in framerate between that and the library function.

I'll use it anyway because its sexy :)
 
What you need to do if profile your code and see where it slows down, then optimize those parts. Your algorithms you use are the main factor not minimal code tweaking...
 
You could think about using a look up table to give you a first value for the Newton-Rapheson as it would give better accuarcy with fewer iterations...

Or if the calculations are all multiplys and sqrts then use logs and add/half them to acheive the same result.
 
Una said:
It depends how accurate you want it to be?

Newton-Rapheson approximation.

This is a hell of a fast way to do it - just take 1/the result for the sqrt.
Don't do that - divides are slow! Instead multiply by the original number: x * (1/sqrt(x)) = sqrt(x). For most graphics work the reciprocal sqrt is what you want anyhow.

You could possibly make it quicker if you drop to assembly language and make better use of SSE2 registers...
I think even SSE had support for doing fast approximate invsqrt(x); not sure what the best approach is these days though.
 
DaveF said:
Don't do that - divides are slow! Instead multiply by the original number: x * (1/sqrt(x)) = sqrt(x).

Doh yeah must remember not to post when drunk :p
 
tried the newton-raphson method and the reciprocal method and decided to go with the newton raphson as it worked better.

when i tried using x*(invsqrt(x)) instead of dividing by 1, i got some very strange results.

oh and i haven't a clue about assembly :P
 
yer_averagejoe said:
tried the newton-raphson method and the reciprocal method and decided to go with the newton raphson as it worked better.

when i tried using x*(invsqrt(x)) instead of dividing by 1, i got some very strange results.
Do you mind explaining the equations you're using that leave you needing a square root? (Rather than the inverse square root).
 
Well I've written the algorithm right now in a step by step manner (I will change this of course, it just makes it easier to read while coding) so it does infact use the inverse square root but the way i have it right now is it uses the square root then is divided etc.. So can get around the problem of dividing by 1 when it comes to compressing the code.

It just takes the viewing point, the point of a partilce and the particles normal vector, creates 2 vectors from these and works out the angle between them (on 2 axis):

so the acute angle in radians = invCos{(a.b)/(|a|x|b|)}

where |x| is sqrt(xi^2 + xj^2 + xk^2)

so therefore the cos(anlge) is given by (a.b)x(invsqrt(ai^2 + aj^2 + ak^2))x(invsqrt(bi^2 + bj^2 + bk^2))

lol i think i typed all that ok.. those of you who get the jist will know it all anyway.. obviously to get the anlges on 2 axis' some values can be negated and not included in the calculations (because it effectively becomes 2 2D calculations and the same normal can be used)

Now... fast collision detection algorithms anyone? hehe.
 
yer_averagejoe said:
Well I've written the algorithm right now in a step by step manner (I will change this of course, it just makes it easier to read while coding) so it does infact use the inverse square root but the way i have it right now is it uses the square root then is divided etc.. So can get around the problem of dividing by 1 when it comes to compressing the code.
That would be good. Because you're wasting a lot of cycles doing two unneeded divides.

so therefore the cos(anlge) is given by (a.b)x(invsqrt(ai^2 + aj^2 + ak^2))x(invsqrt(bi^2 + bj^2 + bk^2))
Note that invsqrt(ai^2 + aj^2 + ak^2))x(invsqrt(bi^2 + bj^2 + bk^2)) = invsqrt[(ai^2 + aj^2 + ak^2)(bi^2 + bj^2 + bk^2)]

So overall, that will get you from 2 invsqrts + 4 divides per particle to just 1 invsqrt per particle.
 
Back
Top Bottom