CIO

C File I/O & Binary Streams: Complete Guide to fopen, fread, fwrite and Serialization

TT
TopicTrick Team
C File I/O & Binary Streams: Complete Guide to fopen, fread, fwrite and Serialization

C File I/O & Binary Streams: Complete Guide to fopen, fread, fwrite and Serialization


Table of Contents


Files as Byte Streams: The Buffered I/O Pipeline

In C, a file is accessed through a FILE* — a handle to a buffered stream. When you call fwrite, your data does not go directly to disk. It goes through a three-layer pipeline:

mermaid

Why buffering? Writing to a disk one byte at a time would be catastrophically slow — each write would require a system call. Buffering accumulates writes in RAM, then flushes to disk in large, efficient chunks (typically 4-8KB matching the disk's sector/block size).

The implications:

  • Data written with fwrite is not on disk until the internal buffer is flushed (by fflush, fclose, or buffer full).
  • If the program crashes before flushing, unflushed data is permanently lost.
  • For critical data (transactions, user data), always call fflush(file) or fsync(fileno(file)) after writing.

Opening and Closing Streams: fopen Modes

fopen(filename, mode) returns a FILE* on success or NULL on failure:

ModeMeaning
"r"Read text; file must exist
"w"Write text; creates/truncates
"a"Append text; creates if missing
"r+"Read+write text; file must exist
"w+"Read+write; creates/truncates
"rb"Read binary
"wb"Write binary
"ab"Append binary
"rb+"Read+write binary
c

Text File I/O: fprintf, fscanf, fgets, fputs

Text mode provides formatted I/O:

c

[!WARNING] fgets vs gets: Never use gets() — it was removed from C11. It has no buffer size limit and is guaranteed to cause buffer overflows on long input. Always use fgets(buffer, sizeof(buffer), fp).


Binary File I/O: fread and fwrite

Binary mode writes raw bytes — no character translation, no formatting:

c

fwrite(ptr, size, count, stream) semantics:

  • ptr: Pointer to the data to write.
  • size: Size of each element in bytes.
  • count: Number of elements to write.
  • Returns: Number of elements successfully written (check against count for errors).

Random Access: fseek, ftell and rewind

By default, file I/O is sequential. fseek moves the stream position indicator to any byte offset:

c

fseek whence values:

  • SEEK_SET: Offset from beginning of file.
  • SEEK_CUR: Offset from current position.
  • SEEK_END: Offset from end of file (use with negative offsets to seek from end).

Flushing and Sync: fflush and fsync

fflush(file) forces the C library buffer to flush to the OS page cache. But the OS may still hold the data in RAM before writing to disk. For true durability:

c

Use fsync for: transaction logs, database write-ahead logs, configuration files, anything that must survive a system crash.


Binary Serialization: Saving and Loading Structs

Directly writing structs to binary files is the fastest serialization in C:

c

Endianness in Binary Files: Cross-Platform Portability

When writing multi-byte integers to binary files, endianness becomes critical for cross-platform compatibility. A file written on a little-endian x86-64 machine and read on a big-endian SPARC will have byte-reversed integers:

c

For modern, professional serialization, consider:

  • Protocol Buffers / FlatBuffers: Cross-platform, version-tolerant, handles endianness.
  • MessagePack: Binary JSON alternative with explicit type encoding.
  • Custom tagged binary format: Define your own TLV (Type-Length-Value) encoding.

Memory-Mapped Files with mmap

For large files or random access patterns, mmap maps a file directly into the process's virtual address space — you access it like an array, and the kernel handles the actual disk I/O:

c

mmap provides: zero-copy access, demand paging (only read pages you actually touch), and the ability to use pointer arithmetic across the entire file as if it were an in-memory array. Used extensively in database engines (SQLite WAL), compilers (linking large object files), and log processing.


Error Handling in File I/O

c

Key error-checking functions:

  • ferror(f): Non-zero if a read/write error occurred.
  • feof(f): Non-zero if end-of-file was reached.
  • clearerr(f): Clears both error and EOF flags.

Frequently Asked Questions

Why does my file lose data when my program crashes? Buffered I/O: data written with fwrite/fprintf sits in the C library's buffer until it's full or explicitly flushed. If the program crashes before flushing, that unwritten data is lost. Solutions: call fflush(f) after critical writes, open files in unbuffered mode (setvbuf(f, NULL, _IONBF, 0)), or use fsync() for disk durability.

What is the difference between fread and read? fread is part of the C standard library — it uses buffered I/O and works on FILE* handles. read is a POSIX system call — it bypasses the C library buffer and operates directly on file descriptors (integers). fread is more portable and typically higher performance for sequential access; read gives more control for async I/O and polling.

Can I use fseek/ftell with large files > 2 GB? The standard ftell returns long, which is 32-bit on some platforms (max 2 GB). For large files, use fseek64/ftello64 on Linux or _fseeki64/_ftelli64 on Windows. Define _FILE_OFFSET_BITS=64 before including <stdio.h> on POSIX systems to make fseek/ftell 64-bit automatically.

When is mmap better than fread? mmap wins for: random access patterns (no seek overhead), large files where you don't need every byte (only touched pages load), and zero-copy processing (data is accessed directly without a read buffer). fread wins for: sequential streaming of medium-sized files, portable code (mmap is POSIX-only), and simplicity.


Key Takeaway

File I/O in C is State that Persists. Mastering the buffered stream model — understanding when data is flushed, how to seek randomly through binary files, and how to serialize structs directly to disk — gives you the tools to build databases, configuration managers, cache engines, and logging systems.

The combination of fread/fwrite for sequential binary I/O and mmap for random access to large files covers virtually every file I/O pattern you'll encounter in real-world systems programming.

Read next: Error Handling & errno: Defensive C Programming →


Part of the C Mastery Course — 30 modules from C basics to production-grade systems engineering.