Binary vs ASCII Files

July 17th, 2005 admin Posted in Tidbits | No Comments »

How are they generated?

The difference in how a file is written into memory is crucial in knowing how to read it. If a file is written using fprintf, then the data in the file is stored as a sequnce of ASCII characters. This creates a file in the ASCII format. If the was written using fwrite, then the data would be stored as one of many data types depending on what the fwrite is writing. For example, if the fwrite is writing an int (32 bit quantity), then the 32 bits are written to the memory as is. This creates a binary format file.

Example

If you write 12345678 to a file using fprintf, the file will actually have the ASCII codes for the symbols ‘1′, ‘2′, ‘3′ and so on, a total of 8 bytes. If you write 12345678 to a file using fwrite and the pointer to the source is an int*, then the 32 bit (4 byte int) binary corresponding to 12345678 will be directly written to 32 bits in memory.

How does it matter?

ASCII, being a byte by byte storage format, is not affected by big-endian/little-endian storage issues. Since the bits in a byte are not affected by little or big endian storage scheme, only problem arises if the datum being stored as a single-entity crosses the byte boundary. And that happens with something like storing a 32 bit quantity as is (in binary). Therefore, when reading an int from a binary file using fread, one should be careful to flip the bytes in case the machine that is reading the binary file uses a different endian scheme than the machine that had written the file. I faced a problme recently where a trace file feeding one of our performance models was in big-endian binary format in memory. I had to read one instruction (32 bits) at a time and flip the bytes, before I could use the trace when the model was running on a little endian machine.

Leave a Reply