How do I sort lines from stdin in a C CLI program?

Read all lines into a dynamically grown array of char* (using realloc to expand), then call qsort() with a strcmp-based comparator. Handle memory allocation failures at each step. For very large inputs that exceed memory, use external merge sort: sort chunks that fit in memory, write sorted chunks to temp files, then merge-sort the temp files. Always free all allocated memory and close all file handles before exit.

How do I handle command-line arguments in C?

Use getopt() (POSIX) for short options (-v, -o file) and getopt_long() for long options (--verbose, --output=file). getopt() sets optarg for options that take arguments and returns the option character. Check for unknown options (getopt returns '?') and missing arguments. For simple programs, manually parse argc/argv. Always validate all arguments before using them - malformed input from the command line is a common source of bugs.

What is the correct way to read input line by line in C?

Use fgets(buf, sizeof(buf), fp) - it reads at most sizeof(buf)-1 characters and always null-terminates. Check the return value for EOF (NULL). The newline is included in the buffer if the line fits - strip it with buf[strcspn(buf, '\n')] = 0. For POSIX systems, getline() allocates a buffer dynamically and handles any line length - simpler but requires free(). Never use gets() - it has no bounds check and is removed in C11.

Project: Building a Fast CLI Data Processor in C (Phase 1 Capstone)

← Back to C Mastery

Project: Building a Fast CLI Data Processor in C (Phase 1 Capstone)

Phase 1 Capstone. You've learned types, variables, control flow, functions, and arrays. Now build something real - a command-line tool that reads numeric data from stdin or a file, computes descriptive statistics, sorts the data, and produces a formatted report. This tool uses every concept from Phase 1 and produces a useful, real-world program.

Project Scope and Goals
Architecture: The Pipeline Pattern
Step 1: Safe Input Handling with fgets
Step 2: Parsing and Validating Numbers
Step 3: Statistical Computations
Step 4: Sorting with qsort
Step 5: Formatted Report Output
Step 6: File Input Mode
Complete Program: Full Integration
Extension Challenges
Phase 1 Reflection

Project Scope and Goals

Your CLI data processor will:

Read up to 1,000 integers or floating-point numbers from stdin or a file.
Compute: count, min, max, sum, mean (average), median, variance, standard deviation.
Sort the dataset and display it.
Print a formatted ASCII report table.
Handle invalid input gracefully (skip non-numeric lines, report errors).
Process a CSV file in a single pass using fgets.

Architecture: The Pipeline Pattern

Step 1: Safe Input Handling with fgets

Never use scanf("%f", &x) or gets() for user input. Both have serious problems. Use fgets for all line-based input:

#include <stdio.h>
#include <string.h>

#define MAX_LINE 256

// Read one line from fp - strip trailing newline
// Returns: number of chars read, or -1 on EOF/error
ssize_t read_line(FILE *fp, char *buf, size_t bufsize) {
    if (!fgets(buf, (int)bufsize, fp)) return -1;
    
    size_t len = strlen(buf);
    
    // Strip trailing newline and/or carriage return
    while (len > 0 && (buf[len-1] == '\n' || buf[len-1] == '\r')) {
        buf[--len] = '\0';
    }
    
    return (ssize_t)len;
}

Step 2: Parsing and Validating Numbers

atof() silently converts any string to 0.0. Use strtod() instead - it detects invalid input:

#include <stdlib.h>
#include <errno.h>

// Parse a string as a double - returns true and sets *out on success
bool parse_double(const char *str, double *out) {
    if (!str || *str == '\0') return false;
    
    char *endptr;
    errno = 0;
    double val = strtod(str, &endptr);
    
    // Must have consumed all characters (or just trailing whitespace)
    while (*endptr == ' ' || *endptr == '\t') endptr++;
    
    if (*endptr != '\0') return false; // Trailing non-numeric chars
    if (errno == ERANGE) return false; // Overflow/underflow
    
    *out = val;
    return true;
}

Step 3: Statistical Computations

#include <math.h>

typedef struct {
    size_t count;
    double min;
    double max;
    double sum;
    double mean;
    double median;    // Requires sorted data
    double variance;
    double std_dev;
} Statistics;

Statistics compute_stats(const double *data, size_t count) {
    Statistics s = { .count = count };
    if (count == 0) return s;
    
    // Single pass: min, max, sum
    s.min = s.max = data[0];
    s.sum = 0.0;
    for (size_t i = 0; i < count; i++) {
        if (data[i] < s.min) s.min = data[i];
        if (data[i] > s.max) s.max = data[i];
        s.sum += data[i];
    }
    s.mean = s.sum / (double)count;
    
    // Variance: E[(X - mean)^2] - Welford's method for numerical stability
    double m2 = 0.0;
    for (size_t i = 0; i < count; i++) {
        double diff = data[i] - s.mean;
        m2 += diff * diff;
    }
    s.variance = m2 / (double)count;        // Population variance
    s.std_dev  = sqrt(s.variance);
    
    // Median (requires sorted data - must sort first)
    if (count % 2 == 1) {
        s.median = data[count / 2];         // Middle element
    } else {
        s.median = (data[count/2 - 1] + data[count/2]) / 2.0; // Average of middle two
    }
    
    return s;
}

Step 4: Sorting with qsort

#include <stdlib.h>

// Comparator for qsort (ascending order)
int compare_doubles(const void *a, const void *b) {
    double da = *(const double*)a;
    double db = *(const double*)b;
    return (da > db) - (da < db); // Branchless: returns -1, 0, or +1
}

void sort_data(double *data, size_t count) {
    qsort(data, count, sizeof(double), compare_doubles);
}

Step 5: Formatted Report Output

#include <stdio.h>

void print_report(const Statistics *s, const double *sorted_data) {
    printf("\n");
    printf("╔══════════════════════════════════════╗\n");
    printf("║     Data Analysis Report              ║\n");
    printf("╠══════════════════════════════════════╣\n");
    printf("║  Count:          %20zu  ║\n", s->count);
    printf("║  Minimum:        %20.4f  ║\n", s->min);
    printf("║  Maximum:        %20.4f  ║\n", s->max);
    printf("║  Sum:            %20.4f  ║\n", s->sum);
    printf("║  Mean:           %20.4f  ║\n", s->mean);
    printf("║  Median:         %20.4f  ║\n", s->median);
    printf("║  Variance:       %20.4f  ║\n", s->variance);
    printf("║  Std Deviation:  %20.4f  ║\n", s->std_dev);
    printf("╚══════════════════════════════════════╝\n");
    
    if (s->count <= 20 && sorted_data) {
        printf("\nSorted Data: ");
        for (size_t i = 0; i < s->count; i++) {
            printf("%.2f", sorted_data[i]);
            if (i < s->count - 1) printf(", ");
        }
        printf("\n");
    }
}

Step 6: File Input Mode

// Read all numbers from a file (CSV or one-per-line)
size_t load_from_file(const char *filename, double *data, size_t max_count) {
    FILE *fp = fopen(filename, "r");
    if (!fp) { perror(filename); return 0; }
    
    size_t count = 0;
    char line[MAX_LINE];
    int  line_num = 0;
    
    while (count < max_count && read_line(fp, line, sizeof(line)) >= 0) {
        line_num++;
        if (line[0] == '\0' || line[0] == '#') continue; // Skip empty/comment
        
        // Handle comma-separated values on one line
        char *token = strtok(line, ",; \t");
        while (token && count < max_count) {
            double val;
            if (parse_double(token, &val)) {
                data[count++] = val;
            } else {
                fprintf(stderr, "Warning: line %d: skipping non-numeric '%s'\n",
                        line_num, token);
            }
            token = strtok(NULL, ",; \t");
        }
    }
    
    fclose(fp);
    printf("Loaded %zu values from '%s'\n", count, filename);
    return count;
}

Complete Program: Full Integration

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#include <errno.h>
#include <math.h>

#define MAX_VALUES 1000

int main(int argc, char *argv[]) {
    double data[MAX_VALUES];
    size_t count = 0;
    
    if (argc > 1) {
        // File mode: read from file passed as argument
        count = load_from_file(argv[1], data, MAX_VALUES);
    } else {
        // Interactive mode: read from stdin
        printf("Enter numbers (one per line or CSV), Ctrl-D to finish:\n");
        char line[MAX_LINE];
        while (count < MAX_VALUES && read_line(stdin, line, sizeof(line)) >= 0) {
            if (line[0] == '\0') continue;
            double val;
            if (parse_double(line, &val)) {
                data[count++] = val;
            } else {
                fprintf(stderr, "Invalid: '%s' - skipping\n", line);
            }
        }
    }
    
    if (count == 0) {
        fprintf(stderr, "No valid data read. Nothing to analyze.\n");
        return 1;
    }
    
    // Sort data (required for median)
    sort_data(data, count);
    
    // Compute statistics
    Statistics stats = compute_stats(data, count);
    
    // Display report
    print_report(&stats, data);
    
    return 0;
}

Compile and test:

bash

gcc -O2 -Wall -lm processor.c -o processor

# Interactive mode
echo -e "10\n20\n15\n5\n25\n30" | ./processor

# File mode
echo -e "1.5, 2.3, 0.7\n4.1, 3.9, 5.2" > data.csv
./processor data.csv

Extension Challenges

Histogram: Print an ASCII bar chart of value distribution by dividing the range into 10 bins.
Percentiles: Compute P25, P75, P90, P95, P99 using the sorted array for performance benchmarking analysis.
Moving average: Read a time series and output a sliding window average using a circular buffer.
Multiple files: Accept multiple filenames, process each independently, then combine statistics.
Output formats: Support --csv, --json, --markdown flags for different output formats.

Phase 1 Reflection

You've successfully moved from "code" to "machine logic." Every tool you used in this project - fgets for safe input, strtod for parsing, qsort for sorting, printf for formatted output - follows the same pattern: explicit bounds, explicit types, explicit error checking.

This is the C discipline. In Phase 2, we'll leave the safety of the stack and explore the heap - where professional-scale applications are built with malloc, free, and pointer-based data structures.

Frequently Asked Questions

Q: What sorting algorithms are most practical to implement from scratch in C? Quicksort is the workhorse - average O(n log n), in-place, and fast in practice due to cache locality. Implement it with a median-of-three pivot to avoid worst-case O(n²) on sorted input. Merge sort is preferred when stability is required (equal elements maintain their original order). For small arrays (under 16 elements), insertion sort outperforms both. The C standard library's qsort() uses an introsort hybrid internally - study its compar function pointer pattern to understand how C achieves generic sorting.

Q: How do you read and parse structured data from a file for sorting in C? Use fopen() with mode "r", then fgets() or fscanf() to read line-by-line. For CSV-like data, strtok() or manual pointer arithmetic splits fields by delimiter. Store records in a dynamically allocated array: malloc(capacity * sizeof(Record)), doubling capacity with realloc() when full. Always check return values - fopen returns NULL on failure, malloc/realloc return NULL on allocation failure. Close the file with fclose() when done.

Q: How do you pass a custom comparator to qsort() in C? Define a function with signature int cmp(const void *a, const void *b) that returns negative if a < b, zero if equal, positive if a > b. Cast the void pointers to your actual type inside: const Record *ra = (const Record *)a. Pass the function pointer as the fourth argument: qsort(array, count, sizeof(Record), cmp). For descending order, swap the return signs. For multi-key sorting, chain comparisons: compare primary key first, return secondary key comparison only if primary keys are equal.

Part of the C Mastery Course - 30 modules from C basics to expert systems engineering.

Project: Building a Fast CLI Data Processor in C (Phase 1 Capstone)

Project: Building a Fast CLI Data Processor in C (Phase 1 Capstone)

Table of Contents

Project Scope and Goals

Architecture: The Pipeline Pattern

Step 1: Safe Input Handling with fgets

Step 2: Parsing and Validating Numbers

Step 3: Statistical Computations

Step 4: Sorting with qsort

Step 5: Formatted Report Output

Step 6: File Input Mode

Complete Program: Full Integration

Extension Challenges

Phase 1 Reflection

Frequently Asked Questions