Fast square root function (C++)

yer_averagejoe · 13 Feb 2007 at 21:55

Hey, I've built a particle engine in opengl but really need a fast square root function. the one in math.h is rubbish.

Cheers.

Joe

Una · 13 Feb 2007 at 22:07

It depends how accurate you want it to be?

Newton-Rapheson approximation.

This is a hell of a fast way to do it - just take 1/the result for the sqrt.

Code:

float InvSqrt(float x)
{
   float xhalf = 0.5f*x;
   int i = *(int*)&x; // get bits for floating value
   i = 0x5f375a86- (i>>1); // gives initial guess y0
   x = *(float*)&i; // convert bits back to float
   x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
   return x;
}

Don't expect me to explain how the magic constant works though, there is a whole paper going into it

You could possibly make it quicker if you drop to assembly language and make better use of SSE2 registers... but I think thats an overkill for a particle engine...

yer_averagejoe · 13 Feb 2007 at 22:16

haha cheers. I'll take a look at it. It doesnt need to be too acurate. maybe just a few d.ps

JonC · 13 Feb 2007 at 22:19

Indeed, but it wasn't Carmack...

M0KUJ1N · 13 Feb 2007 at 22:34

indeed, I remember reading about this on Slashdot a while ago. Found the original /. thread:

http://games.slashdot.org/article.pl?sid=06/12/01/184205

The link to the original article is also still live (which isnt always guaranteed with slashdotted articles!)

yer_averagejoe · 13 Feb 2007 at 23:04

It doesn't actually seem to be any faster :S. carrying out around 12000 square roots in my code per frame and theres no difference in framerate between that and the library function.

I'll use it anyway because its sexy

Una · 13 Feb 2007 at 23:09

What you need to do if profile your code and see where it slows down, then optimize those parts. Your algorithms you use are the main factor not minimal code tweaking...

Hodders · 14 Feb 2007 at 11:25

You could think about using a look up table to give you a first value for the Newton-Rapheson as it would give better accuarcy with fewer iterations...

Or if the calculations are all multiplys and sqrts then use logs and add/half them to acheive the same result.

DaveF · 14 Feb 2007 at 14:35

Una said:
It depends how accurate you want it to be?

Newton-Rapheson approximation.

This is a hell of a fast way to do it - just take 1/the result for the sqrt.

Don't do that - divides are slow! Instead multiply by the original number: x * (1/sqrt(x)) = sqrt(x). For most graphics work the reciprocal sqrt is what you want anyhow.

You could possibly make it quicker if you drop to assembly language and make better use of SSE2 registers...

I think even SSE had support for doing fast approximate invsqrt(x); not sure what the best approach is these days though.

Una · 14 Feb 2007 at 14:41

DaveF said:
Don't do that - divides are slow! Instead multiply by the original number: x * (1/sqrt(x)) = sqrt(x).

Doh yeah must remember not to post when drunk

yer_averagejoe · 14 Feb 2007 at 22:38

tried the newton-raphson method and the reciprocal method and decided to go with the newton raphson as it worked better.

when i tried using x*(invsqrt(x)) instead of dividing by 1, i got some very strange results.

oh and i haven't a clue about assembly

DaveF · 15 Feb 2007 at 08:56

yer_averagejoe said:
tried the newton-raphson method and the reciprocal method and decided to go with the newton raphson as it worked better.

when i tried using x*(invsqrt(x)) instead of dividing by 1, i got some very strange results.

Do you mind explaining the equations you're using that leave you needing a square root? (Rather than the inverse square root).

yer_averagejoe · 15 Feb 2007 at 15:05

Well I've written the algorithm right now in a step by step manner (I will change this of course, it just makes it easier to read while coding) so it does infact use the inverse square root but the way i have it right now is it uses the square root then is divided etc.. So can get around the problem of dividing by 1 when it comes to compressing the code.

It just takes the viewing point, the point of a partilce and the particles normal vector, creates 2 vectors from these and works out the angle between them (on 2 axis):

so the acute angle in radians = invCos{(a.b)/(|a|x|b|)}

where |x| is sqrt(xi^2 + xj^2 + xk^2)

so therefore the cos(anlge) is given by (a.b)x(invsqrt(ai^2 + aj^2 + ak^2))x(invsqrt(bi^2 + bj^2 + bk^2))

lol i think i typed all that ok.. those of you who get the jist will know it all anyway.. obviously to get the anlges on 2 axis' some values can be negated and not included in the calculations (because it effectively becomes 2 2D calculations and the same normal can be used)

Now... fast collision detection algorithms anyone? hehe.

DaveF · 15 Feb 2007 at 15:52

yer_averagejoe said:
Well I've written the algorithm right now in a step by step manner (I will change this of course, it just makes it easier to read while coding) so it does infact use the inverse square root but the way i have it right now is it uses the square root then is divided etc.. So can get around the problem of dividing by 1 when it comes to compressing the code.

That would be good. Because you're wasting a lot of cycles doing two unneeded divides.

so therefore the cos(anlge) is given by (a.b)x(invsqrt(ai^2 + aj^2 + ak^2))x(invsqrt(bi^2 + bj^2 + bk^2))

Note that invsqrt(ai^2 + aj^2 + ak^2))x(invsqrt(bi^2 + bj^2 + bk^2)) = invsqrt[(ai^2 + aj^2 + ak^2)(bi^2 + bj^2 + bk^2)]

So overall, that will get you from 2 invsqrts + 4 divides per particle to just 1 invsqrt per particle.

yer_averagejoe · 15 Feb 2007 at 16:25

cheers dave.