C++Foundations

C++ Strings & Text: string_view, std::format, std::print, Unicode & Custom Formatters (C++23)

TT
TopicTrick Team
C++ Strings & Text: string_view, std::format, std::print, Unicode & Custom Formatters (C++23)

C++ Strings & Text: string_view, std::format, std::print, Unicode & Custom Formatters (C++23)


Table of Contents


std::string Internals: SSO and Heap Allocation

std::string uses Small String Optimization (SSO): strings up to ~15 bytes (implementation-defined) are stored in the stack portion of the string object itself — zero heap allocation:

cpp
#include <string>
#include <iostream>

// SSO: no heap allocation for short strings
std::string short_str = "Hello";       // Stored inline — 0 malloc calls
std::string long_str  = "Hello, World! This string is too long for SSO."; // Heap allocation

// Checking string capacity:
std::string s;
std::cout << "Empty capacity: " << s.capacity() << '\n'; // Typically 15 (SSO buffer)

s.reserve(1000);
std::cout << "After reserve: " << s.capacity() << '\n';  // At least 1000

// Avoid reallocations in loops — reserve() first:
std::string result;
result.reserve(lines.size() * 80); // Estimate final size
for (const auto& line : lines) {
    result += line;
    result += '\n';
}

std::string_view: Zero-Copy String Observation

std::string_view is a non-owning, read-only reference to a character sequence. It holds a pointer + length — no allocation, no copy:

cpp
#include <string_view>
#include <string>

// OLD: forces string copy even for read-only access
void process_old(const std::string& name) { /* reads name only */ }

// MODERN: works with string, string_view, const char*, literal — zero copy
void process(std::string_view name) {
    std::cout << name << '\n';  // Just reads the chars — no allocation
    auto first_space = name.find(' ');
    if (first_space != std::string_view::npos) {
        auto first_name = name.substr(0, first_space); // Returns string_view!
    }
}

// Called with ANY string type — no conversion overhead:
process("Alice Smith");                   // const char* — zero copy
process(std::string("Alice"));            // std::string — zero copy (view)
std::string_view sv{"Bob Jones"};
process(sv);                              // string_view — zero copy

// Key operations (all return string_view — zero copy):
std::string_view url = "https://topictrick.com/blog/cpp-mastery";
auto protocol  = url.substr(0, 5);            // "https"
auto after_sep = url.substr(url.rfind('/') + 1); // "cpp-mastery"
bool has_https  = url.starts_with("https");  // C++20
bool has_blog   = url.contains("blog");      // C++23

string_view Pitfalls: Lifetime and Null Termination

cpp
// DANGER: string_view outliving its source
std::string_view get_view() {
    std::string local = "hello";
    return local;  // DANGLING! local destroyed at function exit
}

// DANGER: string_view from temporary
std::string_view sv = std::string("temp"); // sv is dangling immediately!

// SAFE: string_view from a stable source
const std::string global_name = "Alice";
std::string_view safe_view = global_name; // OK: global_name outlives safe_view

// string_view is NOT null-terminated!
std::string_view sv2 = "hello world";
auto first_word = sv2.substr(0, 5);       // string_view "hello"
// printf("%s", first_word.data());        // UNDEFINED: no null terminator!
// Use: std::string(first_word).c_str()   // Convert when null terminator needed

std::format: Modern Type-Safe Formatting (C++20)

std::format provides Python/Rust-style format strings with compile-time format string validation:

cpp
#include <format>  // C++20
#include <string>

// Basic usage:
std::string s = std::format("Hello, {}!", "World");  // "Hello, World!"
std::string n = std::format("Value: {}", 42);         // "Value: 42"
std::string f = std::format("Pi: {:.4f}", 3.14159);  // "Pi: 3.1416"

// Positional arguments (critical for i18n/l10n):
std::string greeting = std::format("{0} has {1} points! {0} wins!", "Alice", 95);
// "Alice has 95 points! Alice wins!"

// Type-safe: format string validated at compile time
// std::format("{}", std::vector<int>{});  // Compile error: no formatter for vector

// std::format_to: write to an output iterator (no allocation)
std::string buffer;
buffer.reserve(256);
std::format_to(std::back_inserter(buffer), "API response: {} {}", 200, "OK");

Format Specifiers: Alignment, Precision & Fill

cpp
// Alignment and fill:
// {:>N}   right-align in N characters
// {:<N}   left-align in N characters
// {:^N}   center in N characters
// {:X>N}  fill with 'X', right-aligned

auto col1 = std::format("{:>10}", "hello");    // "     hello"
auto col2 = std::format("{:<10}", "hello");    // "hello     "
auto col3 = std::format("{:^10}", "hello");    // "  hello   "
auto col4 = std::format("{:*^10}", "hello");   // "**hello***"

// Numbers:
auto dec = std::format("{:d}",   255);   // "255" (decimal)
auto hex = std::format("{:X}",   255);   // "FF" (hex uppercase)
auto oct = std::format("{:o}",   255);   // "377" (octal)
auto bin = std::format("{:b}",   255);   // "11111111" (binary)
auto sci = std::format("{:e}",   1e9);   // "1.000000e+09"
auto fix = std::format("{:.2f}", 3.14159); // "3.14"
auto pct = std::format("{:.1%}", 0.875);   // "87.5%"

// Width from variable (runtime width):
int width = 12;
auto dynamic = std::format("{:{}}", "hello", width); // "       hello"

// Print a pretty table:
for (auto [name, score] : leaderboard) {
    std::cout << std::format("{:<15} {:>6}\n", name, score);
}
// Alice           95
// Bob             82
// Carol           78

std::print and std::println: Direct Output (C++23)

std::print (C++23) combines formatting and output — thread-safe, faster than cout, safer than printf:

cpp
#include <print>   // C++23

// std::print: format + write to stdout (no trailing newline)
std::print("Hello, {}!", "World");        // Hello, World!

// std::println: format + write + newline
std::println("Server started on port {}", 8080);

// Print to stderr:
std::println(stderr, "Error: {}", message);

// Print to any output stream:
std::ofstream log("server.log");
std::println(log, "[{}] {} {}", timestamp, status_code, path);

// Performance: std::print bypasses the C++ stream synchronization
// that makes std::cout slow. Benchmark: ~3-5× faster than cout for
// formatted output in tight loops.

// Thread safety: each std::print call is atomic — no interleaved output
// between concurrent threads (unlike cout where << operations can interleave)

Custom std::formatter Specialization

Make your own types work with std::format and std::print:

cpp
#include <format>

struct RGB { uint8_t r, g, b; };

// Specialize std::formatter<RGB>
template<>
struct std::formatter<RGB> {
    // Parse the format spec (empty means default)
    constexpr auto parse(std::format_parse_context& ctx) {
        return ctx.begin(); // Accept empty format spec only
    }
    
    // Format the value
    auto format(const RGB& c, std::format_context& ctx) const {
        return std::format_to(ctx.out(), "#{:02X}{:02X}{:02X}", c.r, c.g, c.b);
    }
};

// Now RGB works everywhere:
RGB red{255, 0, 0};
std::println("Background: {}", red);        // "Background: #FF0000"
std::string s = std::format("Color: {}", red);  // "Color: #FF0000"

// More sophisticated formatter with spec handling:
struct Vector2 { float x, y; };

template<>
struct std::formatter<Vector2> {
    bool show_label = true;
    
    constexpr auto parse(std::format_parse_context& ctx) {
        auto it = ctx.begin();
        if (it != ctx.end() && *it == 'n') {
            show_label = false; // {:n} = no label
            ++it;
        }
        return it;
    }
    
    auto format(const Vector2& v, std::format_context& ctx) const {
        if (show_label) return std::format_to(ctx.out(), "Vec2({:.2f}, {:.2f})", v.x, v.y);
        else            return std::format_to(ctx.out(), "({:.2f}, {:.2f})", v.x, v.y);
    }
};

Vector2 pos{1.5f, 2.7f};
std::println("{}", pos);    // "Vec2(1.50, 2.70)"
std::println("{:n}", pos);  // "(1.50, 2.70)"

std::from_chars and std::to_chars: Zero-Allocation Conversion

std::from_chars/std::to_chars are the fastest string-number conversion functions in C++ — no allocation, no locale, no exceptions:

cpp
#include <charconv>

// to_chars: integer to string
char buf[20];
auto [ptr, ec] = std::to_chars(buf, buf + sizeof(buf), 42);
if (ec == std::errc{}) {
    std::string_view result{buf, ptr}; // "42" — no allocation
}

// to_chars: float with precision
char fbuf[32];
auto [fptr, fec] = std::to_chars(fbuf, fbuf + sizeof(fbuf), 3.14159, 
                                  std::chars_format::fixed, 2);
// result: "3.14"

// from_chars: string to integer (no exceptions, no locale interpretation)
std::string_view s = "12345";
int value;
auto [p, e] = std::from_chars(s.data(), s.data() + s.size(), value);
if (e == std::errc{}) {
    // value == 12345 — parsed successfully
} else if (e == std::errc::result_out_of_range) {
    // Overflow
}

// Why use this instead of std::stoi?
// std::stoi:     throws exceptions, uses locale, allocates string
// from_chars:    no exceptions, locale-independent, zero allocation
// Benchmark: from_chars is 5-10× faster than stoi/atoi for bulk parsing

Frequently Asked Questions

When should I use string_view vs const string&? Use std::string_view when the function only reads the string and the caller might be passing a literal, const char*, or part of a string. Use const std::string& when you need null termination (e.g., passing to a C API that requires it) or when you need guaranteed std::string operations like .c_str(). In new code, prefer string_view — it's strictly more flexible.

Is std::format slower than printf at runtime? For constant format strings, std::format validates the format at compile time and has runtime performance comparable to or better than printf. For dynamic format strings, std::vformat has overhead. In benchmarks, std::print is consistently faster than std::cout formatted output because it avoids std::ios synchronization. Fastest is always std::to_chars + write() for number formatting.

How do I handle UTF-8 strings in C++? C++ doesn't natively understand Unicode grapheme clusters. The practical approach: use std::string as a container of raw UTF-8 bytes (correct — UTF-8 strings are valid byte sequences), use a library like ICU or utfcpp for grapheme-aware operations, and use std::format / std::print which handle UTF-8 output correctly on modern terminals.


Key Takeaway

Modern C++ text handling is categorized by ownership: std::string owns, std::string_view observes. For output, the 2026 standard is std::print/std::println — type-safe, fast, thread-safe. For numeric conversion, from_chars/to_chars are the zero-overhead workhorses. Custom std::formatter specializations make your types first-class citizens in the entire formatting ecosystem.

Read next: STL Containers Deep Dive: vector, map, unordered_map →


Part of the C++ Mastery Course — 30 modules from modern C++ basics to expert systems engineering.