Realworld Unit Tests

D.P. · 25 Apr 2013 at 00:26

We are trying to put a test framework in place before we do some major refactoring so was looking for help on designing Unit tests for real-world functions.

All the examples are obvious things, with simple predictable behaviors,
not much useful detail out there on what a real world test could be.
Several issue I have are:

I have lots of stochastic functions so I don' expect the same result each time. E.g. imagine a method that returns a normally distributed random number. The best thing I can imagine is collecting a large sample size and applying statistics but this just seems a PITA when I could just examine the code and know it is right! For other random functions I could fix the random seed to ensure returning the same results but that wont test the function, e.g. if I have a function that returns a uniformly distributed number in the range 0-100 then fixing the seed and testing N random number wont guarantee that the function wont return 101, and the random seed may provide different results on different platforms with different compilers, which is something we want to be testing against.
Anything that relies on user input, e.g. GPS data form a phone, or movement of cursor. You can try to simulate user data but accurate simulation is incredibly difficult (large parts of my PhD were dedicated to accurate simulation). One can use real collected data, which is the current approach but this has limited coverage. Furthermore, you may not have ground truth to observed data, so again if you were using GPS data and you had a script to do a behavior, you cant actually unit test that without knowing what the correct outcome is.
Lots of small simple things that are just annoying me. I.e. I have lots of methods that use a large data file. This data file is constantly changing so i would have to make a separate static copy for unit testing which is a shame.
So much unit testing just seems redundant. If a method that performs a mathematical function on a set of inputs, the only way to really test is just to copy and paste the code that does the computation to find the expected result. For common functions you could try to find other software that does the computation but for an arbitrary function you are simply left with choice of coding it in a different language and hoping you get the same result.
For any complex function with complex inputs how do you really know that there is a bug in the function as you have nothing to test it against. Unit Tests seems too focused on simple methods e.g.:
double add(double x, double y) {return x - y;}
which are relatively easy to spot during code review (the - instead of +) but don't focus on the actual functionality and liveness which are much harder to ascertain from merely reviewing code.
Testing is constrained by your ability to come up with test suitable test data. An example using circular coordinate systems like bearings. Imagine you wanted a function to find the minimum separation angle between 2 bearings and someone mistakenly only took the absolute delta of the angles:
double delta(double x, double y) { return fabs(x-y);}
If all the unit testes don't wrap around the zero point then no error will be detected, the person writing the unit test code has to have the foresight to try something like delta(355,10); and find the result is 345 and not 15.
For any complex function you are not guarantee to be able to know what values are critical for valid testing.

I guess my point is that I see no real value in unit tests because the kinds of errors they tend to capture are typically obvious and easily spotted. The errors I really want to test against unit testing is the wrong concept and traditional test procedures are the best method. So why do people waste their time writing a unit test which takes longer and gives no grantees than simply properly reviewing the code?

D.P. · 25 Apr 2013 at 16:51

DanF said:
Sounds like most of these are more intergration tests than unit tests. Have a look at this guys site, he has a book as well, we found it really handy when we approached testing.

http://artofunittesting.com/

Thanks for the link, looks interesting.

Yeah, I think I just don't understand the concept of a Unit Test because they seem to really only operate on trivial functions which you can easily verify anyway and don't seem to be good at catching the issues we actually care about.

D.P. · 25 Apr 2013 at 17:04

Dj_Jestar said:
The short answer to why we write unit tests is: we do it to assert that our product will do what we want it to.

You're looking at this from the wrong way. They aren't Tests as such, they are specifications of what you expect your software to do. We write the test before we write the product and let the test tell us what we need from the product.

Your random number thing. Why do you need to write your own? What makes it different from the usual rand() or other derivative? This is the kind of thing your test should be telling you.

The GPS function. X goes in, what do you expect to come out? Everything can be broken down to simple deterministic input/output.

I exactly want to assert that the software does what I expect it to, but to em unit tests don't do that and I have given examples why.

I am not writing a random number thing, we are writing bespoke code that computes Monte Carlo simulations, stochastic metaheuristic optimization, PSO, markov localisation, particle filters, etc., etc. Stochastic methods that to be correct will be not give the exact solution over multiple trials, this is not deterministic for a truly random seed.

For the GPS we have no ground truth for the correct results. A large set of input data goes in, and set of output variables are returned, we cannot verify if that output is correct. We could simulate the input ans that way we know what the correct output should be, but the simulation is extremely complex, time consuming to develop and is not guaranteed to exhibit the same behaviors as a real device so its value in terms of testing is limited. The alternative which is our current approach is to simply test the product in the real world and make sure the entire system does what it is supposed to do - this is fine but it is not unit testing!

D.P. · 25 Apr 2013 at 17:16

ZombieFan said:
What Dj_Jestar said really. If you are having problems defining what a method should output from a given set of inputs then it's too complex and should be broken down.

You shouldn't be cutting and pasting code from the method to calculate the expected result of a unit test, otherwise it is pointless. We hard code previously calculated results based on the inputs we give to the method. So in a simple Add(a, b) example, the inputs could be hard coded to 100 and 200, and the output would be hard coded as 300. We would also produce unit tests to make sure the correct exceptions are thrown for invalid inputs.

Unit tests come into their own when you start to refactor code in a large system, since you can (usually!) rely on them to flag up any serious breakages. If a new bug comes along which isn't covered, then write a new test for it so it's covered in the future.

The methods, as in lines of C++ code are never that long, we try to make sure they fit on 1 screen. The overall behavior is complex by definition because we don't develop toy problems but complex optimization and machine learning solutions to real world problems. The problem is a high level function will be reliant on dozens of other sub-function and classes for it all to work.
Thus, given the large input data to a high level function where you cannot calculate by hand the output then how do you formulate a Unit test?
Do you simply run the procedure, grab the output and then use that as a reference? That isn't really a test of the code in my eyes because if their was an error in your code then there is an error in the output and you will forever be testing to make sure your code has this error.
In this case it is much more valuable that you do a thorough code review of the relevant code, probing for errors, combined with real world testing.

yeah, you can write Unit tests for some of the underlying basis functions that perform simple operations on the input but testing the entire system if you don't have a simple deterministic outcome then the whole methodology just falls down IMO.

You have just given the same kind of silly examples that is the only stuff I see on the internet, yeah if you method is add(x,y) then you can do something like ASSERT(add(100,200) == 300).
But what if you method did something like produce a hash-key, you cannot calculate by hand what the result is:
string hash(int val)
{
// do some magic to create a uniformly distributed hash value that minimizes collisions.
return hashKey;
}

ASSERT(hash(42) == "??????????")

I don't know what the output of that method is. Maybe I know it will be a sequence if 16 characters so I can check string length etc but how do I know that is a correctly working hash function?

I can run the program and find the hash is:
"#MFV56NVMV*^V893LKN*(&"

So I cold write my unit test as:
ASSERT(hash(42) == "#MFV56NVMV*^V893LKN*(&")

But what is there as an error in the hash code? The hard coded Unit test is useless at showing me this.

And no, this isn't an example that I need to unit test before someone asks why I am writing my won hashing function (I am not), just an example to show situations where you don't easily know what the answer is in order to tun the unit test.

D.P. · 25 Apr 2013 at 17:39

ZombieFan said:
If the input to such a calcuation are fixed (including fixed values for random numbers) then would you get a fixed result?

If so, that's what you need to test against. Pick sets of input values which represent border conditions for the method and test against them. Pick values for which you already know the expected results. Also pick invalid inputs and make sure your code behaves as expected.

If the random number generation is happening within the module you are testing, then take a look into Dependancy Injection.

No, if you have a fixed set of inputs you can get different outputs, they are stochastic methods.

A simple example is if you had a function that returned a random number within a certain Normal distribution with a specified mean and standard deviation. If you fix the seed then you get a constant output but you are then limited by the seed. So really you want to test this function with random seeds (using system time is a classic way).

Then you can pull out completely different random numbers each time. Mathematically you cannot prove that the function is actually doing what it is supposed to be doing - this is randomness after all. The only solution I know of is to generation large sample sets and check that the sample destruction statistically matches the expected distribution, using external software like R for example.

The reason I don't want the random seed fixed is imagine you set the seed to 24, then do a load of tests that all succeed. You then change the random seed to 25 and something breaks because the random number returned happened to be wrong (out of range or something).

D.P. · 26 Apr 2013 at 02:45

When I say realworld I am meaning something that isn't a baby example found in textbooks\online like the add(x,y) I suggested, but a more complex function like the hash key generation example where you cannot easily know the correct result. Thus you need to somehow compute the right result from some oracle and use that to assert correctness. But then how do you kn the computed answer you will use as the assertion is correct?

I also do have measurement data from a physical device, user data. But the data is not labbeled because finding the ground truth is exceedingly challenging.

To give an example imagine you designed an iphone app that uses GPS to measure your speed. You were interested in the true speed but the GPS only samples a noisy estimate and could be wrong but you invented a clever filter that reduces noise and improves estimation closer to a true measurement (something like an Extended Kalman Filter). You can record real GPS data to plug into the algorithm but how do you know what the right result is?. But you could test your app by driving in a car and checking the speedometer of the car matches your app. To get real ground truth would require using sophisticated external surveying equipment. Thus a unit test on the critical code is impossible, all you can do is test the complete solution to see if it behaves as expected. QA testing.

D.P. · 26 Apr 2013 at 03:08

ftp://ftp.taygeta.com/pub/c/boxmuller.c

Here is a a typical example of a function that I want to write a unit test for (I actually use code very similar to this).

How would you write a unit test? To me the only way to verify the function is correct is to produce thousands of large samples (e.g. Containing thousands of results) and export that intosoemthing like R or Matlab and do some statistics. How does that fit into the whole Unit testing paradigm?

At least to this problem there is a solution because other software offers the ability to test for you.

If the function was instead some arbitrary bespoke transformation then testing it using external tests becomes more challenging.

D.P. · 26 Apr 2013 at 16:44

ZombieFan said:
I've just discovered something called the Wald-Wolfowitz test for testing the randomness of an output. Would this be any use? - http://msdn.microsoft.com/en-us/magazine/cc163551.aspx

AFAIK the Wald-Wolfowitz tests for uniform distribution. I know that I can test that a procedure generating a random number using statistical techniques, e.g. I can test for normality with test like Kolmogorov-Smirnov, Shapiro–Wilks, etc. But that involves exporting the data and processing in a statistical package like R/Matlab, something which i want to avoid because I want automated test suite (I could link this together with some bash scripts but then things start to get complicated).

Anyway, I did a lot of reading last night and it seems that is the only real way of testing stochastic functions, which is kind of intuitive.

So, is there a C++ test library that supports statistical testing of outcomes for Unit tests?

D.P. · 26 Apr 2013 at 17:02

Also, my issue with Unit Tests is not just with randomness but for reasonably complex functions where I cannot calculate easilyl by hand the correct result.

E.g. Take the Haversine function as a simple trigonometric function that is common in navigation to know the distance between 2 Positions in Spherical Coordinate systems (Lat-lon) :
http://en.wikipedia.org/wiki/Haversine_formula

To test this I can pick random positions but how do I assert the correct answer because unlike the add(100,200)==300 example I cannot intrinsically know the correct answer.
So I have several choices:

Run the code and get the result returned from my hand coded function, assume this is the right result and do Unit tests against this. This test may help me discover accidental changes to the code in the future but it doesn't assert the correctness of the function, if I made a mistake then i am also testing for the existence of that bug.
Since this is a relatively simple calculation I can do it on a bit of paper with a calculator. But I could make a mistake by hand although iteration provide me the right result.
Plug numbers into someone eases code, there are online calculators. Assume there code is correct (I could actually test in multiple places)

well in this example option 3 is feasible, I can use someone elses code to verify I get the right result and then use the Unit tests to guarantee no changes.

But this is a very simple and very common function with small input space. What if this function instead of taking only 4 inputs took 4000? What if instead of half a dozen trig functions there were dozens of math ops? Then I cannot do it by hand and plugging this in elsewhere is very slow.

Worse still, what if this is a bespoke function that transforms the data in an arbitrary manner such that no where else on the internet can one find code that will calculate the same result?
As far as I can tell the only solution to the latter is a thorough code examination by peers. A Unit test is not really possible except to act as a regression test.

D.P. · 26 Apr 2013 at 19:53

That is all well and good in some software engineering firm but we have research orientated develop in a dynamic work environment that has ever changing goals with very tight deadlines -that is the nature of Start-ups. We don't know what will work until we try something, play with it, research it, analyze and try some more - we cannot test what we don't know will work because no one has done it before. Once we have a prototype that has value then testing can become important at re-architecture time.

This is a very dynamic environment with very tight deadlines, e.g. the CEO spoke to VCs yesterday and we have 2 weeks to put a demo together that we originally speced to take about 3 months to develop correctly.

D.P. · 3 May 2013 at 17:24

Lightnix said:
As a scrubby CS student who is replying to this mainly as some kind of terrible alternative to revision, I would probably remove the ranf(); calls and pass those values in as arguments. Then you could test against whatever list of data you want, with a known list of outputs, and not have to do any crazy statistical anythings against the output of that function.

I appreciate the input but I don't think this approach would actual serve as a valid test. I know I can create unit tests for any sub-function and replace the call to ranf() with known values to test these sub-functions.

However, what I really want to test is if that function as a whole returns the correct results, i.e. normally distributed random number. The only way to test for this is to use statistical tests over large sample sizes.

You can remove the randomness, replace calls to rand/ranf etc. or use a fixed random seed to the same values are returned. However, you then fail to test the scope of the returned values.

E.g., it may so happen to be with the fixed seed and the extent of your testing by chance everything woks as expected. With a different seed (or after many additional calls to rand() the function gives an erroneous result.

D.P. · 3 May 2013 at 19:12

ZombieFan said:
Forgot about this thread

You don't have to export the data to use these methods. There are libraries available which you can use in your unit test project to check from within the code. For example, here is a C# library which performs a Kolmogorov-Smirnov test against a sample of data - http://www.extremeoptimization.com/...s.OneSampleKolmogorovSmirnovTest_Members.aspx

Do some digging and you will probably find a similar library in your language of choice.

yeah, that is my plan to find a suitable C++ library that doesn't have too many other dependencies (like to keep thing lean and clean without an large dependency tree).

Exporting the data would just allow it to be tested very easily in something like R.

D.P. · 3 May 2013 at 23:06

aln said:
I'd say thats an argument against TDD not unit testing, as long as you have some simple tests the infrastructure is there and you can add more later (which means "will never happen" in many companies).

That said, unit tests aren't really about helping you, they're about helping the guy who comes after you. Have you ever hit a new project you don't completely understand that has unit tests? I have, and I appreciated them.

We always write small example tests of functionality that show the code works in at least a couple of intended examples, showing new developers how the functions work etc, along with some comments and descriptions, but not a through set of unit tests. I am not sure unit tests really work well as a form of documentation for new developers because seeing thousands of lines of repeated code that have inputs often designed to break the function is not very intuitive.

The bottom line that we care about is if the intended behavior of the complete system being correct, not whether some low-level functionality is correct which is implied by the higher level goal. Secondly is regression testing so changes can be comfortably made in the future with confidence, which puts more weight on unit tests.

Third is performance tests which seems to be completely ignored by the unit test philosophy. I made some unit tests for some code earlier in the week and everything behaved exactly as expected giving all the correct results. However, one thing I did not was the functions were taking a little longer than expected to compute. Spend some time looking at the code, profiling individual components and all looked fine. Turns out the the sign was flipped in a priority queue so the heuristic search behavior was acting as a worst-first search instead of best-first. the upside was that the correct result are returned but only after extensively searching the graph through all the worst possible result before coming to the correct answer.
Unit testing completely failed to find that bug as it was performance related.