Communication vs. documentation

Lightnix · 29 Apr 2014 at 21:19

Recent development methodologies such as Agile emphasise the importance of communication over documentation. The idea being that so long as enough people know, the idea is unlikely to be lost - and that over documenting something that is likely to change is often just a waste of time and resources. Notably, people tend to avoid reading large documents if possible.

In larger companies with smaller software development teams, you might find that the development team is working on many different products over the course of many years. Products may be revisited and updated over that course and reliance on pure communication could cause ideas, even rationalisations for implementation details or, worse still, design decisions to be lost over potentially relatively short periods of time unless the communications are repeated frequently. As we know, people are forgetful.

So ideally you should have both communication and documentation. This introduces redundancy in the flow of information within your organisation. This is a good thing in that it means that communication can be checked against documentation and vice versa, but it could be a bad thing in that time is wasted in expressing the same information in different forms, and potentially confusion could be created if these things are not consistent.

Some solutions to this include things like JavaDoc, where documentation is generated from comments in the source code. This has the advantage of completing documentation as far as the implementation goes, but potentially lacks the communication aspect and likely misses out key information about design decisions, test results and so on entirely.

One thing that has been very popular in the open source community (particularly for older projects) is mailing lists. This is quite nice in that in a way, communications become documents. The problem is, they don't tend to be very good documents - they can be incomplete and navigating them can be something of a chore.

So the burning question of this thread is: How do we make sure information isn't lost about a project while minimising the amount of time and effort put towards that goal?

For me, I'd quite like to see something of an enhanced mailing system where communications are analysed through some magic black box. This magic black box would also accept searches, and spit out nicely presented documents constructed from emails containing relevant information. I wouldn't be surprised if somebody had already thought of that and implemented it already - so I would be interested to see if anybody has seen something like it.

jsmoke · 29 Apr 2014 at 23:10

http://floodyberry.com/carmack/plan.html

D.P. · 30 Apr 2014 at 01:56

In the end nothing beats actual written documentation. Forget the agile spiel, unless you work in a company of deadbeats then it is highly likely people chop and change jobs, move departments, get promoted, run away to Thailand for a year, etc.
Even when people stay put projects can change frequently meaning ideas and understandings are shelved. Personally I have a hard time remembering design decisions/architectures/algorithms/code/ from 1-2 years ago, let alone other people have remembered any communications there to.

We just document the code, both add useful comments and make summary documents. Auto documentation stuff is fairly useless most of the time. It is often important to add explanations of how the code works so it can be modified/repaired in the future.

Obviously you want code that is easy as possible to understand but sometimes some code just gets complex or performance issues means complexity is inherent. A few words to say what i portent methods do and the purpose of the main steps helps and doesn't take much time.

You also have to factor that time spent documenting saves time in the future debugging or extending. I have never heard of anyone complainingn at useful, detailed documentation, but everyone complains when there is no documentation!

Rroff · 30 Apr 2014 at 02:16

You need the right mixture of both, simple effective documentation that gives a good enough overview and communication so that the people who need to know, know the details and can pass it on where relevant. Aslong as there is enough of a framework in the documentation someone smart enough can figure out the rest (should it come to it and people move on).

D.P. · 30 Apr 2014 at 02:40

Amendum:documentation doesn't replace communication, it is in addition.

We always have a short meeting each morning, always email updates, have weekly longer meetings and always support discussion.

Haircut · 30 Apr 2014 at 10:12

D.P. said:
In the end nothing beats actual written documentation. Forget the agile spiel unless you work in a company of deadbeats then it is highly likely people chop and change jobs, move departments, get promoted, run away to Thailand for a year, etc.

I work on a team of 11 contractors, it's highly likely that in a couple of years it will be pretty much a whole different set of people working here.

I'm very much of the mindset that if you need to document what code is doing you probably need to have a look at how you're writing the code.
Code can and should be documented to describe why you're doing it the way you are though.

D.P. · 30 Apr 2014 at 19:00

Haircut said:
I work on a team of 11 contractors, it's highly likely that in a couple of years it will be pretty much a whole different set of people working here.

I'm very much of the mindset that if you need to document what code is doing you probably need to have a look at how you're writing the code.
Code can and should be documented to describe why you're doing it the way you are though.

Complex code is complex. There are no 2 ways about. If you design a novel algorithm X to solve unique problem Y then you need to explain how that algorithm X works, not have people trawl through thousands of lines and piece together clues form the code. Some clear documentation will save a lot of time in the future.

Do you know what this code does just by looking at it?

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.

It is perfectly valid code written in a special programming language that is fully Turing-complete. A little documentation would go long way in explaining what and how it does that, in case in goes wrong in the future.
I would much rather work this this code:

Code:

    +++++ +++              //Set Cell #0 to 8
    [
        >++++                //Add 4 to Cell #1; this will always set Cell #1 to 4
        [                          //  as the cell will be cleared by the loop
            >++                //  Add 4*2 to Cell #2
            >+++              //  Add 4*3 to Cell #3
            >+++              //   Add 4*3 to Cell #4
            >+                   //   Add 4 to Cell #5
            <<<<-             //  Decrement the loop counter in Cell #1
        ]                           //  Loop till Cell #1 is zero

        >+                        //  Add 1 to Cell #2
        >+                        //  Add 1 to Cell #3
        >-                         //  Subtract 1 from Cell #4
        >>+                      //  Add 1 to Cell #6
        [<]                        //  Move back to the first zero cell you find; this will
                                     //  be Cell #1 which was cleared by the previous loop
        <-                         //  Decrement the loop Counter in Cell #0
    ]                               // Loop till Cell #0 is zero

     

    //The result of this is:
   // Cell No :   0   1   2   3   4   5   6
   // Contents:   0   0  72 104  88  32   8
   // Pointer :   ^


    >>.                             // Cell #2 has value 72 which is 'H'
    >---.                           // Subtract 3 from Cell #3 to get 101 which is 'e'
    +++++ ++..+++.       // Likewise for 'llo' from Cell #3
    >>.                             // Cell #5 is 32 for the space

    <-.                            // Subtract 1 from Cell #4 for 87 to give a 'W'
    <.                              // Cell #3 was set to 'o' from the end of 'Hello'
    +++.----- -.----- ---.   // Cell #3 for 'rl' and 'd'
    >>+.                         // Add 1 to Cell #5 gives us an exclamation point
    >++.                         // And finally a newline from Cell #6

Now it is much easier for a new programmer who has never seen the code to to know what and it does it, modify, extend, repair, optimize, replace etc.

Rroff · 30 Apr 2014 at 19:05

Never seen the point of languages like that - maybe there is a need somewhere for an ultra small compiler but usually only because of people being unnecessarily awkward.

D.P. · 30 Apr 2014 at 19:11

Rroff said:
Never seen the point of languages like that - maybe there is a need somewhere for an ultra small compiler but usually only because of people being unnecessarily awkward.

That code is just for fun but shows the example perfectly well.
I've worked on projects with a fair amount of assembly, especially for embedded systems. One of the projects at the company i work at makes fairly extensive use of fortran with embedded assembly in the back end.

Here is a another example:

Code:

section .data                           
str:     db 'Hello world!', 0Ah        
str_len: equ $ - str                
                                           
 
section .text                         
global _start                        
                                          
_start:                                
	mov	eax, 4                 
	mov	ebx, 1                 
                                          
	mov	ecx, str               
	mov	edx, str_len         
	int	80h                     
	mov	eax, 1                 
	mov	ebx, 0                 
	int	80h

A little bit of documentation goes a long way:

Code:

section .data                           ; section for initialized data
str:     db 'Hello world!', 0Ah         ; message string with new-line char at the end (10 decimal)
str_len: equ $ - str                    ; calcs length of string (bytes) by subtracting this' address ($ symbol) 
                                            ; from the str's start address
 
section .text                           ; this is the code section
global _start                           ; _start is the entry point and needs global scope to be 'seen' by the 
                                            ; linker -    equivalent to main() in C/C++
_start:                                 ; procedure start
	mov	eax, 4                   ; specify the sys_write function code (from OS vector table)
	mov	ebx, 1                   ; specify file descriptor stdout -in linux, everything's treated as a file, 
                                             ; even hardware devices
	mov	ecx, str                 ; move start _address_ of string message to ecx register
	mov	edx, str_len             ; move length of message (in bytes)
	int	80h                      ; tell kernel to perform the system call we just set up - 
                                             ; in linux services are requested through the kernel
	mov	eax, 1                   ; specify sys_exit function code (from OS vector table)
	mov	ebx, 0                   ; specify return code for OS (0 = everything's fine)
	int	80h                      ; tell kernel to perform system call

tidusjar · 30 Apr 2014 at 19:47

So where I work we use an Agile methodology (SCRUM). Now we are all SCRUM certified blah blah blah. But what people don't seem to understand is they think Agile and that to them means NO documentation!

I completely disagree with this, now we also have a lot of contractors and our developers move around different projects a lot so the knowledge is spread out.

Now because of the way people are thinking about as little/next to none documentation at all when trying to hand over projects to the support teams they are really unhappy because the only way to explain how this new feature works is to show them. As they are support it is difficult sometimes to take them out for a few hours to explain from a user perspective and a technical how this new functionality works. There is no document's around new services, there are no workflows to show how this new integration piece works.

This really bugs me. So I make sure everything is well documented showing where areas could potentially go wrong and how to fix that. Code is well documented; if i make my own for example factory pattern I expect myself to comment all of the code!

It's laziness at the end of the day if you don't.

ZombieFan · 30 Apr 2014 at 22:55

Haircut said:
I'm very much of the mindset that if you need to document what code is doing you probably need to have a look at how you're writing the code.
Code can and should be documented to describe why you're doing it the way you are though.

I share the same mindset that code should be self documenting (without pointless comments). But ours is also backed up with the original user stories produced by the business analysts, which are reflected in unit tests.

As D.P. has pointed out, this doesn't work well for all languages. But I believe if you are working in a language which gives you the flexibility to name methods/functions and variables/properties as you see fit, then you should be naming them in a way which minimises the need for comments.

Dj_Jestar · 1 May 2014 at 09:51

tidusjar said:
So where I work we use an Agile methodology (SCRUM).

*sigh*.. It's "Scrum" not "SCRUM." It's not an acronym.

Also you are right. The Agile Manifesto states that we should favour "Working software over comprehensive documentation." Followed by "That is, while there is value in the items on the right, we value the items on the left more."

As for the general topic:

The definition of "Working software" is software that accurately functions to its desired/designed task and nothing more. If code is really doing only what it is supposed to be doing then documentation is redundant. Any documentation I've written, seen, read, loved, hated, etc. is documenting the process (aka the "business logic" etc.) and very little of it the code (save for quirks in 3rd party APIs, straight up bug compensation or the like.)

Seriously, code IS documentation. Why document twice?!

jsmoke · 1 May 2014 at 11:05

Function/Class names are a form of documentation, and also good notation/formatting. I just download a game engine for example and I have never seen such well presented code including all the folder structure and names the source files are put into. The function names tell you exactly what's going on and what a difference it makes, instead of trolling through and debugging the heck out of everything to try ad work out what is what.

Key comments are added when necessary and a brief summary at the top of the source.

Then there are various tools such as Doxygen + Graphviz and plenty more UML type visual source code analysers out there.

It really depends on the scale of the project, the demands of your clients and reusability needed at a later date of the code.

Maybe I have gone a bit off track here though and your talking more about the collation of ideas during development, which you are ..lol. Your talking about creating documentation from say git source control...

D.P. · 1 May 2014 at 14:15

Dj_Jestar said:
*sigh*.. It's "Scrum" not "SCRUM." It's not an acronym.

Also you are right. The Agile Manifesto states that we should favour "Working software over comprehensive documentation." Followed by "That is, while there is value in the items on the right, we value the items on the left more."

As for the general topic:

The definition of "Working software" is software that accurately functions to its desired/designed task and nothing more. If code is really doing only what it is supposed to be doing then documentation is redundant. Any documentation I've written, seen, read, loved, hated, etc. is documenting the process (aka the "business logic" etc.) and very little of it the code (save for quirks in 3rd party APIs, straight up bug compensation or the like.)

Seriously, code IS documentation. Why document twice?!

Can you tell me what this code is and where the bug is?

All variables names use standard terminology to express the appropriate mathematical variable names from the known physical equations.
Without documentation this would be a night to debug, with the appropriate documentation it is dead easy to debug, maintain and modify.

Code:

float calcWaveSpectrum(float kx, float ky, bool omnispectrum = false)
{
    float U10 = WIND;
    float Omega = OMEGA;


    float k = sqrt(kx * kx + ky * ky);
    float c = omega(k) / k;


    float kp = 9.81 * sqr(Omega / U10); // after Eq 3
    float cp = omega(kp) / kp;


    float z0 = 3.7e-5 * sqr(U10) / 9.81 * pow(U10 / cp, 0.9f); // Eq 66
    float u_star = 0.41 * U10 / log(10.0 / z0); // Eq 60

    float Lpm = exp(- 5.0 / 4.0 * sqr(kp / k)); // after Eq 3
    float gamma = Omega < 1.0 ? 1.7 : 1.7 + 6.0 * log(Omega); // after Eq 3 

    float sigma = 0.08 * (1.0 + 4.0 / pow(Omega, 3.0f)); // after Eq 3
    float Gamma = exp(-1.0 / (2.0 * sqr(sigma)) * sqr(sqrt(k / kp) - 1.0));
    float Jp = pow(gamma, Gamma); // Eq 3
    float Fp = Lpm * Jp * exp(- Omega / sqrt(10.0) * (sqrt(k / kp) - 1.0)); // Eq 32
    float alphap = 0.006 * sqrt(Omega); // Eq 34
    float Bl = 0.5 * alphap * cp / c * Fp; // Eq 31

    float alpham = 0.01 * (u_star < cm ? 1.0 + log(u_star / cm) : 1.0 + 3.0 * log(u_star / cm)); // Eq 44
    float Fm = exp(-0.25 * sqr(k / km - 1.0)); // Eq 41
    float Bh = 0.25 * alpham * cm / c * Fm * Lpm; // Eq 40 

    if (omnispectrum) {
        return A * (Bl + Bh) / (k * sqr(k)); // Eq 30
    }

    float a0 = log(2.0) / 4.0; float ap = 4.0; float am = 0.13 * u_star / cm; // Eq 59
    float Delta = tanh(a0 + ap * pow(c / cp, 2.5f) + am * pow(cm / c, 2.5f)); // Eq 57

    float phi = atan2(ky, kx);

    if (kx < 0.0) {
        return 0.0;
    } else {
        Bl *= 2.0;
        Bh *= 2.0;
    }

    return A * (Bl + Bh) * (1.0 + Delta * cos(2.0 * phi)) / (2.0 * M_PI * sqr(sqr(k))); // Eq 67
}

Why code 5 times when you can simply add some light documentation that saves so much time in the future rather than forcing insane reverse engineering efforts?

I'm not saying code shouldn't be simple, clean, clear, precise with sensible naming conventions - but that alone does not ensure code that is quickly and effortlessly understood by an outsider. Only documentation can ensure that. Adding documentation to the above function would allow anyone who has never seen the code or been on the project to understand exact what the function does, what each line is doing, how it all works, how it can be corrected, and with very little time find the bug.

Dj_Jestar · 1 May 2014 at 15:53

No, because it is terrible, terrible code with crap variable names and utter lack of separation.

Convert your code to be documenting. Voila, need for documentation drastically reduced.

I'm willing to bet that the documentation starts with "k is the blah blah, ky is the blah blah"

PUT THAT IN YOUR CODE DIRECTLY. Why waste the effort to keep two separate things in sync?! Every body knows documentation can't be trusted until it is established it is the latest and relevant to the code. You can remove this uncertainty entirely. Easily. So why don't you do it? To save a few poxy keystrokes?

D.P. · 1 May 2014 at 16:55

No, the variables names are accurate and easy to understand with the supporting documentation & comments. Changing them would only make the code harder to read and convoluted. The names refer to physical constants and standard mathematical values and variables.

But I think we need to understand what we both mean by documentation.
My definition of documentation mainly refers to adding comments to the code to explain what the variables are and what the steps are doing. But it is also useful to add external documentation, e.g. in this case is is much easier to read the mathematical equations in a latex document.

My understanding is when people say the code is self documenting they mean comments are not required because the variable/method names are so descriptive and the code design so simple that comments add nothing. That is simply not true a lot of the time.

Going back to my earlier examples higher up, if the code looked like this:

Code:

def printHelloWorld:
   print "Hello World"

Then a comment would be superfluous and the code is self documenting. My point is, depending on the language and depending on the code complexity documentation is invaluable.

Dj_Jestar · 1 May 2014 at 17:15

A little extraction would make your example much more readable. If they are well defined names, that's only one part of it.

Simply put, all I am disagreeing with you on is that that code is not even remotely readable nor well factored.

Rroff · 1 May 2014 at 17:24

I really hate stuff like from that example:

"float gamma = Omega < 1.0 ? 1.7 : 1.7 + 6.0 * log(Omega); // after Eq 3"

I always write stuff like that out long hand with a proper if statement - way too easy to screw up without meaning to otherwise and makes it a lot harder for anyone else working with the code to understand it at a glance.

Aside from certain situations I always declare my variables before allocating values to them also especially with pointers it makes it much easier for me to go back and pickup on whats what within a function - once compiled it doesn't make any difference anyway in most cases (aside from some dodgy memory alignment stuff).

EDIT: Infact thats pretty much what I'd do if I was debugging that function - start with pulling out every variable used and declaring them with a comment as to what I thought their purpose was within the function and go from there.

D.P. · 2 May 2014 at 02:43

Rroff said:
I really hate stuff like from that example:

"float gamma = Omega < 1.0 ? 1.7 : 1.7 + 6.0 * log(Omega); // after Eq 3"

I always write stuff like that out long hand with a proper if statement - way too easy to screw up without meaning to otherwise and makes it a lot harder for anyone else working with the code to understand it at a glance.

Aside from certain situations I always declare my variables before allocating values to them also especially with pointers it makes it much easier for me to go back and pickup on whats what within a function - once compiled it doesn't make any difference anyway in most cases (aside from some dodgy memory alignment stuff).

EDIT: Infact thats pretty much what I'd do if I was debugging that function - start with pulling out every variable used and declaring them with a comment as to what I thought their purpose was within the function and go from there.

The thing is the ternary operator can be much faster than an if else block because the x86 instruction set includes a special conditional assignment operator that most compilers fail to utilize in an if block but always use with the ternary operator. Plus it can make for some concise cleaner code, moving something simple to one line, useful for inline header methods.it is not widely used but if you and the team use it frequently then it is very quick to parse. It is just like using x++ rather than x = x + 1. It is not as immediately verbose but if you are used to it it is dead easy. ++ can also be faster! again because there is a specific increment operator rather than addition and the compiler might not convert the + 1 into an increment instruction (it might, but with out looking at the assembly you will never be sure).
Now the ternary operator can easily be abused into something hideous and it can make testing harder. This example case is borderline to me, quite possibly I would have assigned 1.7 to gamma at declaration and then had a single if state with the addition (then again in this example log(omega) is a constant so a macro define would be better....)

Declaring variables first is also generally a bad idea for performance reasons. Best to do assignment together with declaration and keep variables as local to the scope where they are used (and that means freely defining variables inside loops contrary to a lot of old beliefs). Compilers will have a much easier time with a more local reference and you also don't want to be fighting over register space. Defining variables at the top of functions is very much a C style hangover because it was required. It doesn't necessarily make code any cleaner because you can't remember the large list of variable names out of context but a few declared variables within the local context can be remembered.

FYI, it is not actually my code but something I was trying to implement and I wished the documentation was better.

Rroff · 2 May 2014 at 03:28

D.P. said:
Declaring variables first is also generally a bad idea for performance reasons. Best to do assignment together with declaration and keep variables as local to the scope where they are used (and that means freely defining variables inside loops contrary to a lot of old beliefs). Compilers will have a much easier time with a more local reference and you also don't want to be fighting over register space. Defining variables at the top of functions is very much a C style hangover because it was required. It doesn't necessarily make code any cleaner because you can't remember the large list of variable names out of context but a few declared variables within the local context can be remembered.

FYI, it is not actually my code but something I was trying to implement and I wished the documentation was better.

Hmmm didn't realise it was as big a difference - spent a lot of time coding in C (pretty much everything I know about C and by extension C++ was learnt from the Quake 2 source code) and/or languages where it makes no difference performance wise but just pulled up something I was working on in C++ and added a few 000s to a loop so as to be able to time it and it was a 15% difference in performance :S - personally I find it a lot more readable however when referring back to something I coded a long time ago :S gonna have to keep that in mind.

(This is why I generally avoid C/C++)

PS I assumed it wasn't your code and just an example.