Writing Binary

Soldato
Joined
22 Oct 2005
Posts
2,884
Location
Moving...
What's the easiest way to write binary data to a file? I want to write actual binary, as in 1 bit rather than write 0 or 1 as an int and actually take 8 bits which is what I've done in previous program that involve reading/writing to files. I need it to be in in C, C++ or preferably Java.

Thanks for any advice.
 
It is not usually possible to write individual bits to a file.

Instead what you need to do is compose the individual bits you wish to write into a single byte and then write out a byte at a time. Use C's bitwise operators to do the composition, and fwrite or fputc to write the byte, remembering when using fopen to open the file in binary format ("wb" as the second parameter).
 
It is not usually possible to write individual bits to a file.

Instead what you need to do is compose the individual bits you wish to write into a single byte and then write out a byte at a time. Use C's bitwise operators to do the composition, and fwrite or fputc to write the byte, remembering when using fopen to open the file in binary format ("wb" as the second parameter).
That's the problem really. I'm doing this for image compression so I need to write as little data as possible to keep the file size small. It's kind of difficult to compress though if you can't write single bits. So for example I've got 255 symbols, symbol 25 appears a thousand times say, but symbol 67 only appears once, it would make sense to assign the value of 0 to symbol 25, and 11111111 to symbol 67. If i have to write the whole byte I will be writing 00000000 and 11111111 instead which is not saving any space.

I've managed to do run-length coding which works, but most other methods involve using binary output so now I'm kinda stuck. :(
 
As Mr Jack says, I/O is very rarely done on a bit-by-bit basis and you should generally treat bytes as atomic in an I/O context; any bit-level work should be done on an algorithmic level. If you really need to write individual bits, then use bitwise operations to build a byte buffer representing the same data, then write that byte to file.

Also, that method of compression won't work because if each symbol is represented by a number between 0 and 254 (don't you mean 256 symbols?), then each one will need to be represented by a sequence of 8 bits. You can't just swap 00000000 for 0, because when you're decompressing, how will you know that the next 7 bits aren't part of the current symbol's bit sequence?

For example, say you have 4 symbols:
  • A = 00
  • B = 01
  • C = 10
  • D = 11

You write ADAD to file, treating A as 0 rather than 00, and get the following:
Code:
011011

The problem with this is that it's exactly the same output as you'd get if you wrote BCD to file; how do you know which one it is?
 
Last edited:
Thanks Inquisitor. Yes i did mean 256 symbols, I just quickly made something up! The actual coding method i'm trying to use at the minute is Huffman encoding. This site is similar to my lecture notes in the way it is explained: http://www.si.umich.edu/Classes/540/Readings/Encoding - Huffman Coding.htm

See from their example they've got the following codes:
A - 100
E - 0
I - 1011
O - 11
U - 1010

E having the shortest code because it's the most frequent and I/U having the longest. So if you're saying you have to use a fixed number of bytes per symbol(4 in this case) to be able to distinguish between the symbols when reading it back in, how can Huffman coding work? (Sorry if I'm being a complete donut and this is really simple:o)
 
Thanks Inquisitor. Yes i did mean 256 symbols, I just quickly made something up! The actual coding method i'm trying to use at the minute is Huffman encoding. This site is similar to my lecture notes in the way it is explained: http://www.si.umich.edu/Classes/540/Readings/Encoding - Huffman Coding.htm

See from their example they've got the following codes:
A - 100
E - 0
I - 1011
O - 11
U - 1010

E having the shortest code because it's the most frequent and I/U having the longest. So if you're saying you have to use a fixed number of bytes per symbol(4 in this case) to be able to distinguish between the symbols when reading it back in, how can Huffman coding work? (Sorry if I'm being a complete donut and this is really simple:o)

Ah, I was under the impression you were just truncating the leading zero-bits from the symbols' binary representations rather than using a formal encoding mechanism. Huffman encoding works by ensuring that no bit sequence can have more than a single corresponding symbol sequence, so there's no possibility of ambiguity. Carry on :p
 
Last edited:
Back
Top Bottom