Realworld Unit Tests

D.P. · 25 Apr 2013 at 00:26

We are trying to put a test framework in place before we do some major refactoring so was looking for help on designing Unit tests for real-world functions.

All the examples are obvious things, with simple predictable behaviors,
not much useful detail out there on what a real world test could be.
Several issue I have are:

I have lots of stochastic functions so I don' expect the same result each time. E.g. imagine a method that returns a normally distributed random number. The best thing I can imagine is collecting a large sample size and applying statistics but this just seems a PITA when I could just examine the code and know it is right! For other random functions I could fix the random seed to ensure returning the same results but that wont test the function, e.g. if I have a function that returns a uniformly distributed number in the range 0-100 then fixing the seed and testing N random number wont guarantee that the function wont return 101, and the random seed may provide different results on different platforms with different compilers, which is something we want to be testing against.
Anything that relies on user input, e.g. GPS data form a phone, or movement of cursor. You can try to simulate user data but accurate simulation is incredibly difficult (large parts of my PhD were dedicated to accurate simulation). One can use real collected data, which is the current approach but this has limited coverage. Furthermore, you may not have ground truth to observed data, so again if you were using GPS data and you had a script to do a behavior, you cant actually unit test that without knowing what the correct outcome is.
Lots of small simple things that are just annoying me. I.e. I have lots of methods that use a large data file. This data file is constantly changing so i would have to make a separate static copy for unit testing which is a shame.
So much unit testing just seems redundant. If a method that performs a mathematical function on a set of inputs, the only way to really test is just to copy and paste the code that does the computation to find the expected result. For common functions you could try to find other software that does the computation but for an arbitrary function you are simply left with choice of coding it in a different language and hoping you get the same result.
For any complex function with complex inputs how do you really know that there is a bug in the function as you have nothing to test it against. Unit Tests seems too focused on simple methods e.g.:
double add(double x, double y) {return x - y;}
which are relatively easy to spot during code review (the - instead of +) but don't focus on the actual functionality and liveness which are much harder to ascertain from merely reviewing code.
Testing is constrained by your ability to come up with test suitable test data. An example using circular coordinate systems like bearings. Imagine you wanted a function to find the minimum separation angle between 2 bearings and someone mistakenly only took the absolute delta of the angles:
double delta(double x, double y) { return fabs(x-y);}
If all the unit testes don't wrap around the zero point then no error will be detected, the person writing the unit test code has to have the foresight to try something like delta(355,10); and find the result is 345 and not 15.
For any complex function you are not guarantee to be able to know what values are critical for valid testing.

I guess my point is that I see no real value in unit tests because the kinds of errors they tend to capture are typically obvious and easily spotted. The errors I really want to test against unit testing is the wrong concept and traditional test procedures are the best method. So why do people waste their time writing a unit test which takes longer and gives no grantees than simply properly reviewing the code?

DanF · 25 Apr 2013 at 09:46

Sounds like most of these are more intergration tests than unit tests. Have a look at this guys site, he has a book as well, we found it really handy when we approached testing.

http://artofunittesting.com/

Dj_Jestar · 25 Apr 2013 at 12:04

The short answer to why we write unit tests is: we do it to assert that our product will do what we want it to.

You're looking at this from the wrong way. They aren't Tests as such, they are specifications of what you expect your software to do. We write the test before we write the product and let the test tell us what we need from the product.

Your random number thing. Why do you need to write your own? What makes it different from the usual rand() or other derivative? This is the kind of thing your test should be telling you.

The GPS function. X goes in, what do you expect to come out? Everything can be broken down to simple deterministic input/output.

ZombieFan · 25 Apr 2013 at 13:34

What Dj_Jestar said really. If you are having problems defining what a method should output from a given set of inputs then it's too complex and should be broken down.

D.P. said:
So much unit testing just seems redundant. If a method that performs a mathematical function on a set of inputs, the only way to really test is just to copy and paste the code that does the computation to find the expected result. For common functions you could try to find other software that does the computation but for an arbitrary function you are simply left with choice of coding it in a different language and hoping you get the same result.

You shouldn't be cutting and pasting code from the method to calculate the expected result of a unit test, otherwise it is pointless. We hard code previously calculated results based on the inputs we give to the method. So in a simple Add(a, b) example, the inputs could be hard coded to 100 and 200, and the output would be hard coded as 300. We would also produce unit tests to make sure the correct exceptions are thrown for invalid inputs.

Unit tests come into their own when you start to refactor code in a large system, since you can (usually!) rely on them to flag up any serious breakages. If a new bug comes along which isn't covered, then write a new test for it so it's covered in the future.

D.P. · 25 Apr 2013 at 16:51

DanF said:
Sounds like most of these are more intergration tests than unit tests. Have a look at this guys site, he has a book as well, we found it really handy when we approached testing.

http://artofunittesting.com/

Thanks for the link, looks interesting.

Yeah, I think I just don't understand the concept of a Unit Test because they seem to really only operate on trivial functions which you can easily verify anyway and don't seem to be good at catching the issues we actually care about.

D.P. · 25 Apr 2013 at 17:04

Dj_Jestar said:
The short answer to why we write unit tests is: we do it to assert that our product will do what we want it to.

You're looking at this from the wrong way. They aren't Tests as such, they are specifications of what you expect your software to do. We write the test before we write the product and let the test tell us what we need from the product.

Your random number thing. Why do you need to write your own? What makes it different from the usual rand() or other derivative? This is the kind of thing your test should be telling you.

The GPS function. X goes in, what do you expect to come out? Everything can be broken down to simple deterministic input/output.

I exactly want to assert that the software does what I expect it to, but to em unit tests don't do that and I have given examples why.

I am not writing a random number thing, we are writing bespoke code that computes Monte Carlo simulations, stochastic metaheuristic optimization, PSO, markov localisation, particle filters, etc., etc. Stochastic methods that to be correct will be not give the exact solution over multiple trials, this is not deterministic for a truly random seed.

For the GPS we have no ground truth for the correct results. A large set of input data goes in, and set of output variables are returned, we cannot verify if that output is correct. We could simulate the input ans that way we know what the correct output should be, but the simulation is extremely complex, time consuming to develop and is not guaranteed to exhibit the same behaviors as a real device so its value in terms of testing is limited. The alternative which is our current approach is to simply test the product in the real world and make sure the entire system does what it is supposed to do - this is fine but it is not unit testing!

D.P. · 25 Apr 2013 at 17:16

ZombieFan said:
What Dj_Jestar said really. If you are having problems defining what a method should output from a given set of inputs then it's too complex and should be broken down.

You shouldn't be cutting and pasting code from the method to calculate the expected result of a unit test, otherwise it is pointless. We hard code previously calculated results based on the inputs we give to the method. So in a simple Add(a, b) example, the inputs could be hard coded to 100 and 200, and the output would be hard coded as 300. We would also produce unit tests to make sure the correct exceptions are thrown for invalid inputs.

Unit tests come into their own when you start to refactor code in a large system, since you can (usually!) rely on them to flag up any serious breakages. If a new bug comes along which isn't covered, then write a new test for it so it's covered in the future.

The methods, as in lines of C++ code are never that long, we try to make sure they fit on 1 screen. The overall behavior is complex by definition because we don't develop toy problems but complex optimization and machine learning solutions to real world problems. The problem is a high level function will be reliant on dozens of other sub-function and classes for it all to work.
Thus, given the large input data to a high level function where you cannot calculate by hand the output then how do you formulate a Unit test?
Do you simply run the procedure, grab the output and then use that as a reference? That isn't really a test of the code in my eyes because if their was an error in your code then there is an error in the output and you will forever be testing to make sure your code has this error.
In this case it is much more valuable that you do a thorough code review of the relevant code, probing for errors, combined with real world testing.

yeah, you can write Unit tests for some of the underlying basis functions that perform simple operations on the input but testing the entire system if you don't have a simple deterministic outcome then the whole methodology just falls down IMO.

You have just given the same kind of silly examples that is the only stuff I see on the internet, yeah if you method is add(x,y) then you can do something like ASSERT(add(100,200) == 300).
But what if you method did something like produce a hash-key, you cannot calculate by hand what the result is:
string hash(int val)
{
// do some magic to create a uniformly distributed hash value that minimizes collisions.
return hashKey;
}

ASSERT(hash(42) == "??????????")

I don't know what the output of that method is. Maybe I know it will be a sequence if 16 characters so I can check string length etc but how do I know that is a correctly working hash function?

I can run the program and find the hash is:
"#MFV56NVMV*^V893LKN*(&"

So I cold write my unit test as:
ASSERT(hash(42) == "#MFV56NVMV*^V893LKN*(&")

But what is there as an error in the hash code? The hard coded Unit test is useless at showing me this.

And no, this isn't an example that I need to unit test before someone asks why I am writing my won hashing function (I am not), just an example to show situations where you don't easily know what the answer is in order to tun the unit test.

ZombieFan · 25 Apr 2013 at 17:17

D.P. said:
I am not writing a random number thing, we are writing bespoke code that computes Monte Carlo simulations, stochastic metaheuristic optimization, PSO, markov localisation, particle filters, etc., etc. Stochastic methods that to be correct will be not give the exact solution over multiple trials, this is not deterministic for a truly random seed.

If the input to such a calcuation are fixed (including fixed values for random numbers) then would you get a fixed result?

If so, that's what you need to test against. Pick sets of input values which represent border conditions for the method and test against them. Pick values for which you already know the expected results. Also pick invalid inputs and make sure your code behaves as expected.

If the random number generation is happening within the module you are testing, then take a look into Dependancy Injection.

ZombieFan · 25 Apr 2013 at 17:23

D.P. said:
The methods, as in lines of C++ code are never that long, we try to make sure they fit on 1 screen. The overall behavior is complex by definition because we don't develop toy problems but complex optimization and machine learning solutions to real world problems. The problem is a high level function will be reliant on dozens of other sub-function and classes for it all to work.
Thus, given the large input data to a high level function where you cannot calculate by hand the output then how do you formulate a Unit test?
Do you simply run the procedure, grab the output and then use that as a reference? That isn't really a test of the code in my eyes because if their was an error in your code then there is an error in the output and you will forever be testing to make sure your code has this error.
In this case it is much more valuable that you do a thorough code review of the relevant code, probing for errors, combined with real world testing.

yeah, you can write Unit tests for some of the underlying basis functions that perform simple operations on the input but testing the entire system if you don't have a simple deterministic outcome then the whole methodology just falls down IMO.

Start by writing unit tests for the sub-functions and classes. The tests will prove that these classes work as expected, and then you can move up to writing more general tests for the next level of classes in the module.

Unit testing is all about proving that a class does what you expect it to do. It forces you to think about what your class should be outputting under every situation.

D.P. · 25 Apr 2013 at 17:39

ZombieFan said:
If the input to such a calcuation are fixed (including fixed values for random numbers) then would you get a fixed result?

If so, that's what you need to test against. Pick sets of input values which represent border conditions for the method and test against them. Pick values for which you already know the expected results. Also pick invalid inputs and make sure your code behaves as expected.

If the random number generation is happening within the module you are testing, then take a look into Dependancy Injection.

No, if you have a fixed set of inputs you can get different outputs, they are stochastic methods.

A simple example is if you had a function that returned a random number within a certain Normal distribution with a specified mean and standard deviation. If you fix the seed then you get a constant output but you are then limited by the seed. So really you want to test this function with random seeds (using system time is a classic way).

Then you can pull out completely different random numbers each time. Mathematically you cannot prove that the function is actually doing what it is supposed to be doing - this is randomness after all. The only solution I know of is to generation large sample sets and check that the sample destruction statistically matches the expected distribution, using external software like R for example.

The reason I don't want the random seed fixed is imagine you set the seed to 24, then do a load of tests that all succeed. You then change the random seed to 25 and something breaks because the random number returned happened to be wrong (out of range or something).

Haircut · 26 Apr 2013 at 00:34

I can understand that you have some non-deterministic stuff in your code base, but what are you then doing with that stuff?

Surely you must use that random number for something? You need to be testing that when you get a particular value as an input you get a particular value (or values) as the output.
That's where dependency injection would come into the picture to allow you to mock your truly random pieces of the system.
Though even with the random stuff you can check edge cases etc.

I think part of the problem is that you're doing the sort of development that 99% of developers don't do though.
Most development isn't writing algorithms, it's writing components that interact with other components in a particular way and unit testing helps an awful lot for that sort of stuff.

D.P. said:
Thus, given the large input data to a high level function where you cannot calculate by hand the output then how do you formulate a Unit test?

One thing I will say is that when you mention things like the above and then talk about real world testing, what exactly are you testing for when you do this real world testing?
Surely if you can test it in the real world then you can write an automated unit test for it?

D.P. · 26 Apr 2013 at 02:45

When I say realworld I am meaning something that isn't a baby example found in textbooks\online like the add(x,y) I suggested, but a more complex function like the hash key generation example where you cannot easily know the correct result. Thus you need to somehow compute the right result from some oracle and use that to assert correctness. But then how do you kn the computed answer you will use as the assertion is correct?

I also do have measurement data from a physical device, user data. But the data is not labbeled because finding the ground truth is exceedingly challenging.

To give an example imagine you designed an iphone app that uses GPS to measure your speed. You were interested in the true speed but the GPS only samples a noisy estimate and could be wrong but you invented a clever filter that reduces noise and improves estimation closer to a true measurement (something like an Extended Kalman Filter). You can record real GPS data to plug into the algorithm but how do you know what the right result is?. But you could test your app by driving in a car and checking the speedometer of the car matches your app. To get real ground truth would require using sophisticated external surveying equipment. Thus a unit test on the critical code is impossible, all you can do is test the complete solution to see if it behaves as expected. QA testing.

D.P. · 26 Apr 2013 at 03:08

ftp://ftp.taygeta.com/pub/c/boxmuller.c

Here is a a typical example of a function that I want to write a unit test for (I actually use code very similar to this).

How would you write a unit test? To me the only way to verify the function is correct is to produce thousands of large samples (e.g. Containing thousands of results) and export that intosoemthing like R or Matlab and do some statistics. How does that fit into the whole Unit testing paradigm?

At least to this problem there is a solution because other software offers the ability to test for you.

If the function was instead some arbitrary bespoke transformation then testing it using external tests becomes more challenging.

ZombieFan · 26 Apr 2013 at 14:12

I've just discovered something called the Wald-Wolfowitz test for testing the randomness of an output. Would this be any use? - http://msdn.microsoft.com/en-us/magazine/cc163551.aspx

D.P. · 26 Apr 2013 at 16:44

ZombieFan said:
I've just discovered something called the Wald-Wolfowitz test for testing the randomness of an output. Would this be any use? - http://msdn.microsoft.com/en-us/magazine/cc163551.aspx

AFAIK the Wald-Wolfowitz tests for uniform distribution. I know that I can test that a procedure generating a random number using statistical techniques, e.g. I can test for normality with test like Kolmogorov-Smirnov, Shapiro–Wilks, etc. But that involves exporting the data and processing in a statistical package like R/Matlab, something which i want to avoid because I want automated test suite (I could link this together with some bash scripts but then things start to get complicated).

Anyway, I did a lot of reading last night and it seems that is the only real way of testing stochastic functions, which is kind of intuitive.

So, is there a C++ test library that supports statistical testing of outcomes for Unit tests?

D.P. · 26 Apr 2013 at 17:02

Also, my issue with Unit Tests is not just with randomness but for reasonably complex functions where I cannot calculate easilyl by hand the correct result.

E.g. Take the Haversine function as a simple trigonometric function that is common in navigation to know the distance between 2 Positions in Spherical Coordinate systems (Lat-lon) :
http://en.wikipedia.org/wiki/Haversine_formula

To test this I can pick random positions but how do I assert the correct answer because unlike the add(100,200)==300 example I cannot intrinsically know the correct answer.
So I have several choices:

Run the code and get the result returned from my hand coded function, assume this is the right result and do Unit tests against this. This test may help me discover accidental changes to the code in the future but it doesn't assert the correctness of the function, if I made a mistake then i am also testing for the existence of that bug.
Since this is a relatively simple calculation I can do it on a bit of paper with a calculator. But I could make a mistake by hand although iteration provide me the right result.
Plug numbers into someone eases code, there are online calculators. Assume there code is correct (I could actually test in multiple places)

well in this example option 3 is feasible, I can use someone elses code to verify I get the right result and then use the Unit tests to guarantee no changes.

But this is a very simple and very common function with small input space. What if this function instead of taking only 4 inputs took 4000? What if instead of half a dozen trig functions there were dozens of math ops? Then I cannot do it by hand and plugging this in elsewhere is very slow.

Worse still, what if this is a bespoke function that transforms the data in an arbitrary manner such that no where else on the internet can one find code that will calculate the same result?
As far as I can tell the only solution to the latter is a thorough code examination by peers. A Unit test is not really possible except to act as a regression test.

Dj_Jestar · 26 Apr 2013 at 17:11

Quick brain dump:

How would you know it is correct without the unit test? Answer that, then use the same inputs/outputs to assert in a unit test.

You appear to be looking into this way too much - just give it a go. Break stuff down as much as you can. Everything is deterministic, even rand() is.

Methods that "fit in one page" are way too big.

Remember that the main point of a unit test is to use it to drive the design of your software. You write the unit test before you even begin to write/think/anything about the unit under test.

Need something that produces a position on a globe?

That's the first test. Something that produces a position.

Need that position to be latitudinally opposite the start point?

That's the next test. Change product to produce latitudinally opposite position.

Likewise for longitudinally opposite.

etc.

The point here is to let the tests drive your design. Let the tests tell you what your software needs (e.g. dependencies and parameters) and let the test tell you what it should produce.

There's two "roles" you will undertake when TDDing - there's the "I'm writing the test" hat, then there is the "I'm doing what is necessary to make the test pass" hat. You can't wear both at the same time. The former is basically documenting software requirements in executable code. The latter is the more traditional writing of code to satisfy requirement. However, the latter also requires you to "forget" everything except for what the test is telling you to do.

This is a bad analogy because I don't wish to promote a "someone else tells you what to do" thing, but think of it as a brick layer vs foreman - but all in your head. The foreman tells the brick layer to build a wall from A to B and it must be 8ft high. Brick layer does it, but at this point he doesn't know what purpose the wall is for. Is it a house? Is it just a wall? Is it for a garage? At this point, it just doesn't matter. He has to build from A to B at 8ft high. Later on he may be given further instruction to build another wall, or to extend this wall, whatever.

That's the kind of mental attitude taken when TDDing. Do only enough to make the test pass and no more, then clean up once the test is passing (keeping it passing throughout,) then move onto the next test.

D.P. · 26 Apr 2013 at 19:53

That is all well and good in some software engineering firm but we have research orientated develop in a dynamic work environment that has ever changing goals with very tight deadlines -that is the nature of Start-ups. We don't know what will work until we try something, play with it, research it, analyze and try some more - we cannot test what we don't know will work because no one has done it before. Once we have a prototype that has value then testing can become important at re-architecture time.

This is a very dynamic environment with very tight deadlines, e.g. the CEO spoke to VCs yesterday and we have 2 weeks to put a demo together that we originally speced to take about 3 months to develop correctly.

Dj_Jestar · 26 Apr 2013 at 23:26

It's not about certifying that some thing is done. Try to appreciate that despite the name this isn't *testing* your software it is*designing*.

When you write software you expect it to do something. Whether you write a physical test for it or not. You have an idea in your head and you expect a particular outcome.

TDD is about documenting that outcome before you write the implementation and then using the test to tell you what you need to accomplish the task and when you have accomplished it.

I have done plenty of work for start ups and TDD can fit just as well as anywhere else. Plenty of research about to prove TDD saves time too.

Literally the act of writing the test code will help you better understand what dependencies, resources, parameters, services etc will be needed. More importantly perhaps, it will also help prevent you from adding things that aren't necessary that you may think should be and thus reduce bloat that would otherwise slip through.

Lightnix · 27 Apr 2013 at 00:06

D.P. said:
ftp://ftp.taygeta.com/pub/c/boxmuller.c

Here is a a typical example of a function that I want to write a unit test for (I actually use code very similar to this).

How would you write a unit test? To me the only way to verify the function is correct is to produce thousands of large samples (e.g. Containing thousands of results) and export that intosoemthing like R or Matlab and do some statistics. How does that fit into the whole Unit testing paradigm?

At least to this problem there is a solution because other software offers the ability to test for you.

If the function was instead some arbitrary bespoke transformation then testing it using external tests becomes more challenging.

As a scrubby CS student who is replying to this mainly as some kind of terrible alternative to revision, I would probably remove the ranf(); calls and pass those values in as arguments. Then you could test against whatever list of data you want, with a known list of outputs, and not have to do any crazy statistical anythings against the output of that function.