C File I/O & Binary Streams: Complete Guide to fopen, fread, fwrite and Serialization

C File I/O & Binary Streams: Complete Guide to fopen, fread, fwrite and Serialization
Table of Contents
- Files as Byte Streams: The Buffered I/O Pipeline
- Opening and Closing Streams: fopen Modes
- Text File I/O: fprintf, fscanf, fgets, fputs
- Binary File I/O: fread and fwrite
- Random Access: fseek, ftell and rewind
- Flushing and Sync: fflush and fsync
- Binary Serialization: Saving and Loading Structs
- Endianness in Binary Files: Cross-Platform Portability
- Memory-Mapped Files with mmap
- Error Handling in File I/O
- Frequently Asked Questions
- Key Takeaway
Files as Byte Streams: The Buffered I/O Pipeline
In C, a file is accessed through a FILE* — a handle to a buffered stream. When you call fwrite, your data does not go directly to disk. It goes through a three-layer pipeline:
Why buffering? Writing to a disk one byte at a time would be catastrophically slow — each write would require a system call. Buffering accumulates writes in RAM, then flushes to disk in large, efficient chunks (typically 4-8KB matching the disk's sector/block size).
The implications:
- Data written with
fwriteis not on disk until the internal buffer is flushed (byfflush,fclose, or buffer full). - If the program crashes before flushing, unflushed data is permanently lost.
- For critical data (transactions, user data), always call
fflush(file)orfsync(fileno(file))after writing.
Opening and Closing Streams: fopen Modes
fopen(filename, mode) returns a FILE* on success or NULL on failure:
| Mode | Meaning |
|---|---|
"r" | Read text; file must exist |
"w" | Write text; creates/truncates |
"a" | Append text; creates if missing |
"r+" | Read+write text; file must exist |
"w+" | Read+write; creates/truncates |
"rb" | Read binary |
"wb" | Write binary |
"ab" | Append binary |
"rb+" | Read+write binary |
Text File I/O: fprintf, fscanf, fgets, fputs
Text mode provides formatted I/O:
[!WARNING]
fgetsvsgets: Never usegets()— it was removed from C11. It has no buffer size limit and is guaranteed to cause buffer overflows on long input. Always usefgets(buffer, sizeof(buffer), fp).
Binary File I/O: fread and fwrite
Binary mode writes raw bytes — no character translation, no formatting:
fwrite(ptr, size, count, stream) semantics:
ptr: Pointer to the data to write.size: Size of each element in bytes.count: Number of elements to write.- Returns: Number of elements successfully written (check against
countfor errors).
Random Access: fseek, ftell and rewind
By default, file I/O is sequential. fseek moves the stream position indicator to any byte offset:
fseek whence values:
SEEK_SET: Offset from beginning of file.SEEK_CUR: Offset from current position.SEEK_END: Offset from end of file (use with negative offsets to seek from end).
Flushing and Sync: fflush and fsync
fflush(file) forces the C library buffer to flush to the OS page cache. But the OS may still hold the data in RAM before writing to disk. For true durability:
Use fsync for: transaction logs, database write-ahead logs, configuration files, anything that must survive a system crash.
Binary Serialization: Saving and Loading Structs
Directly writing structs to binary files is the fastest serialization in C:
Endianness in Binary Files: Cross-Platform Portability
When writing multi-byte integers to binary files, endianness becomes critical for cross-platform compatibility. A file written on a little-endian x86-64 machine and read on a big-endian SPARC will have byte-reversed integers:
For modern, professional serialization, consider:
- Protocol Buffers / FlatBuffers: Cross-platform, version-tolerant, handles endianness.
- MessagePack: Binary JSON alternative with explicit type encoding.
- Custom tagged binary format: Define your own TLV (Type-Length-Value) encoding.
Memory-Mapped Files with mmap
For large files or random access patterns, mmap maps a file directly into the process's virtual address space — you access it like an array, and the kernel handles the actual disk I/O:
mmap provides: zero-copy access, demand paging (only read pages you actually touch), and the ability to use pointer arithmetic across the entire file as if it were an in-memory array. Used extensively in database engines (SQLite WAL), compilers (linking large object files), and log processing.
Error Handling in File I/O
Key error-checking functions:
ferror(f): Non-zero if a read/write error occurred.feof(f): Non-zero if end-of-file was reached.clearerr(f): Clears both error and EOF flags.
Frequently Asked Questions
Why does my file lose data when my program crashes?
Buffered I/O: data written with fwrite/fprintf sits in the C library's buffer until it's full or explicitly flushed. If the program crashes before flushing, that unwritten data is lost. Solutions: call fflush(f) after critical writes, open files in unbuffered mode (setvbuf(f, NULL, _IONBF, 0)), or use fsync() for disk durability.
What is the difference between fread and read?
fread is part of the C standard library — it uses buffered I/O and works on FILE* handles. read is a POSIX system call — it bypasses the C library buffer and operates directly on file descriptors (integers). fread is more portable and typically higher performance for sequential access; read gives more control for async I/O and polling.
Can I use fseek/ftell with large files > 2 GB?
The standard ftell returns long, which is 32-bit on some platforms (max 2 GB). For large files, use fseek64/ftello64 on Linux or _fseeki64/_ftelli64 on Windows. Define _FILE_OFFSET_BITS=64 before including <stdio.h> on POSIX systems to make fseek/ftell 64-bit automatically.
When is mmap better than fread?
mmap wins for: random access patterns (no seek overhead), large files where you don't need every byte (only touched pages load), and zero-copy processing (data is accessed directly without a read buffer). fread wins for: sequential streaming of medium-sized files, portable code (mmap is POSIX-only), and simplicity.
Key Takeaway
File I/O in C is State that Persists. Mastering the buffered stream model — understanding when data is flushed, how to seek randomly through binary files, and how to serialize structs directly to disk — gives you the tools to build databases, configuration managers, cache engines, and logging systems.
The combination of fread/fwrite for sequential binary I/O and mmap for random access to large files covers virtually every file I/O pattern you'll encounter in real-world systems programming.
Read next: Error Handling & errno: Defensive C Programming →
Part of the C Mastery Course — 30 modules from C basics to production-grade systems engineering.
