C++ Strings & Text: string_view, std::format, std::print, Unicode & Custom Formatters (C++23)

C++ Strings & Text: string_view, std::format, std::print, Unicode & Custom Formatters (C++23)
Table of Contents
- std::string Internals: SSO and Heap Allocation
- std::string_view: Zero-Copy String Observation
- string_view Pitfalls: Lifetime and Null Termination
- std::format: Modern Type-Safe Formatting (C++20)
- Format Specifiers: Alignment, Precision & Fill
- std::print and std::println: Direct Output (C++23)
- Custom std::formatter Specialization
- std::from_chars and std::to_chars: Zero-Allocation Conversion
- String Manipulation: Split, Trim, Replace
- Unicode and UTF-8 in C++
- Frequently Asked Questions
- Key Takeaway
std::string Internals: SSO and Heap Allocation
std::string uses Small String Optimization (SSO): strings up to ~15 bytes (implementation-defined) are stored in the stack portion of the string object itself — zero heap allocation:
#include <string>
#include <iostream>
// SSO: no heap allocation for short strings
std::string short_str = "Hello"; // Stored inline — 0 malloc calls
std::string long_str = "Hello, World! This string is too long for SSO."; // Heap allocation
// Checking string capacity:
std::string s;
std::cout << "Empty capacity: " << s.capacity() << '\n'; // Typically 15 (SSO buffer)
s.reserve(1000);
std::cout << "After reserve: " << s.capacity() << '\n'; // At least 1000
// Avoid reallocations in loops — reserve() first:
std::string result;
result.reserve(lines.size() * 80); // Estimate final size
for (const auto& line : lines) {
result += line;
result += '\n';
}std::string_view: Zero-Copy String Observation
std::string_view is a non-owning, read-only reference to a character sequence. It holds a pointer + length — no allocation, no copy:
#include <string_view>
#include <string>
// OLD: forces string copy even for read-only access
void process_old(const std::string& name) { /* reads name only */ }
// MODERN: works with string, string_view, const char*, literal — zero copy
void process(std::string_view name) {
std::cout << name << '\n'; // Just reads the chars — no allocation
auto first_space = name.find(' ');
if (first_space != std::string_view::npos) {
auto first_name = name.substr(0, first_space); // Returns string_view!
}
}
// Called with ANY string type — no conversion overhead:
process("Alice Smith"); // const char* — zero copy
process(std::string("Alice")); // std::string — zero copy (view)
std::string_view sv{"Bob Jones"};
process(sv); // string_view — zero copy
// Key operations (all return string_view — zero copy):
std::string_view url = "https://topictrick.com/blog/cpp-mastery";
auto protocol = url.substr(0, 5); // "https"
auto after_sep = url.substr(url.rfind('/') + 1); // "cpp-mastery"
bool has_https = url.starts_with("https"); // C++20
bool has_blog = url.contains("blog"); // C++23string_view Pitfalls: Lifetime and Null Termination
// DANGER: string_view outliving its source
std::string_view get_view() {
std::string local = "hello";
return local; // DANGLING! local destroyed at function exit
}
// DANGER: string_view from temporary
std::string_view sv = std::string("temp"); // sv is dangling immediately!
// SAFE: string_view from a stable source
const std::string global_name = "Alice";
std::string_view safe_view = global_name; // OK: global_name outlives safe_view
// string_view is NOT null-terminated!
std::string_view sv2 = "hello world";
auto first_word = sv2.substr(0, 5); // string_view "hello"
// printf("%s", first_word.data()); // UNDEFINED: no null terminator!
// Use: std::string(first_word).c_str() // Convert when null terminator neededstd::format: Modern Type-Safe Formatting (C++20)
std::format provides Python/Rust-style format strings with compile-time format string validation:
#include <format> // C++20
#include <string>
// Basic usage:
std::string s = std::format("Hello, {}!", "World"); // "Hello, World!"
std::string n = std::format("Value: {}", 42); // "Value: 42"
std::string f = std::format("Pi: {:.4f}", 3.14159); // "Pi: 3.1416"
// Positional arguments (critical for i18n/l10n):
std::string greeting = std::format("{0} has {1} points! {0} wins!", "Alice", 95);
// "Alice has 95 points! Alice wins!"
// Type-safe: format string validated at compile time
// std::format("{}", std::vector<int>{}); // Compile error: no formatter for vector
// std::format_to: write to an output iterator (no allocation)
std::string buffer;
buffer.reserve(256);
std::format_to(std::back_inserter(buffer), "API response: {} {}", 200, "OK");Format Specifiers: Alignment, Precision & Fill
// Alignment and fill:
// {:>N} right-align in N characters
// {:<N} left-align in N characters
// {:^N} center in N characters
// {:X>N} fill with 'X', right-aligned
auto col1 = std::format("{:>10}", "hello"); // " hello"
auto col2 = std::format("{:<10}", "hello"); // "hello "
auto col3 = std::format("{:^10}", "hello"); // " hello "
auto col4 = std::format("{:*^10}", "hello"); // "**hello***"
// Numbers:
auto dec = std::format("{:d}", 255); // "255" (decimal)
auto hex = std::format("{:X}", 255); // "FF" (hex uppercase)
auto oct = std::format("{:o}", 255); // "377" (octal)
auto bin = std::format("{:b}", 255); // "11111111" (binary)
auto sci = std::format("{:e}", 1e9); // "1.000000e+09"
auto fix = std::format("{:.2f}", 3.14159); // "3.14"
auto pct = std::format("{:.1%}", 0.875); // "87.5%"
// Width from variable (runtime width):
int width = 12;
auto dynamic = std::format("{:{}}", "hello", width); // " hello"
// Print a pretty table:
for (auto [name, score] : leaderboard) {
std::cout << std::format("{:<15} {:>6}\n", name, score);
}
// Alice 95
// Bob 82
// Carol 78std::print and std::println: Direct Output (C++23)
std::print (C++23) combines formatting and output — thread-safe, faster than cout, safer than printf:
#include <print> // C++23
// std::print: format + write to stdout (no trailing newline)
std::print("Hello, {}!", "World"); // Hello, World!
// std::println: format + write + newline
std::println("Server started on port {}", 8080);
// Print to stderr:
std::println(stderr, "Error: {}", message);
// Print to any output stream:
std::ofstream log("server.log");
std::println(log, "[{}] {} {}", timestamp, status_code, path);
// Performance: std::print bypasses the C++ stream synchronization
// that makes std::cout slow. Benchmark: ~3-5× faster than cout for
// formatted output in tight loops.
// Thread safety: each std::print call is atomic — no interleaved output
// between concurrent threads (unlike cout where << operations can interleave)Custom std::formatter Specialization
Make your own types work with std::format and std::print:
#include <format>
struct RGB { uint8_t r, g, b; };
// Specialize std::formatter<RGB>
template<>
struct std::formatter<RGB> {
// Parse the format spec (empty means default)
constexpr auto parse(std::format_parse_context& ctx) {
return ctx.begin(); // Accept empty format spec only
}
// Format the value
auto format(const RGB& c, std::format_context& ctx) const {
return std::format_to(ctx.out(), "#{:02X}{:02X}{:02X}", c.r, c.g, c.b);
}
};
// Now RGB works everywhere:
RGB red{255, 0, 0};
std::println("Background: {}", red); // "Background: #FF0000"
std::string s = std::format("Color: {}", red); // "Color: #FF0000"
// More sophisticated formatter with spec handling:
struct Vector2 { float x, y; };
template<>
struct std::formatter<Vector2> {
bool show_label = true;
constexpr auto parse(std::format_parse_context& ctx) {
auto it = ctx.begin();
if (it != ctx.end() && *it == 'n') {
show_label = false; // {:n} = no label
++it;
}
return it;
}
auto format(const Vector2& v, std::format_context& ctx) const {
if (show_label) return std::format_to(ctx.out(), "Vec2({:.2f}, {:.2f})", v.x, v.y);
else return std::format_to(ctx.out(), "({:.2f}, {:.2f})", v.x, v.y);
}
};
Vector2 pos{1.5f, 2.7f};
std::println("{}", pos); // "Vec2(1.50, 2.70)"
std::println("{:n}", pos); // "(1.50, 2.70)"std::from_chars and std::to_chars: Zero-Allocation Conversion
std::from_chars/std::to_chars are the fastest string-number conversion functions in C++ — no allocation, no locale, no exceptions:
#include <charconv>
// to_chars: integer to string
char buf[20];
auto [ptr, ec] = std::to_chars(buf, buf + sizeof(buf), 42);
if (ec == std::errc{}) {
std::string_view result{buf, ptr}; // "42" — no allocation
}
// to_chars: float with precision
char fbuf[32];
auto [fptr, fec] = std::to_chars(fbuf, fbuf + sizeof(fbuf), 3.14159,
std::chars_format::fixed, 2);
// result: "3.14"
// from_chars: string to integer (no exceptions, no locale interpretation)
std::string_view s = "12345";
int value;
auto [p, e] = std::from_chars(s.data(), s.data() + s.size(), value);
if (e == std::errc{}) {
// value == 12345 — parsed successfully
} else if (e == std::errc::result_out_of_range) {
// Overflow
}
// Why use this instead of std::stoi?
// std::stoi: throws exceptions, uses locale, allocates string
// from_chars: no exceptions, locale-independent, zero allocation
// Benchmark: from_chars is 5-10× faster than stoi/atoi for bulk parsingFrequently Asked Questions
When should I use string_view vs const string&?
Use std::string_view when the function only reads the string and the caller might be passing a literal, const char*, or part of a string. Use const std::string& when you need null termination (e.g., passing to a C API that requires it) or when you need guaranteed std::string operations like .c_str(). In new code, prefer string_view — it's strictly more flexible.
Is std::format slower than printf at runtime?
For constant format strings, std::format validates the format at compile time and has runtime performance comparable to or better than printf. For dynamic format strings, std::vformat has overhead. In benchmarks, std::print is consistently faster than std::cout formatted output because it avoids std::ios synchronization. Fastest is always std::to_chars + write() for number formatting.
How do I handle UTF-8 strings in C++?
C++ doesn't natively understand Unicode grapheme clusters. The practical approach: use std::string as a container of raw UTF-8 bytes (correct — UTF-8 strings are valid byte sequences), use a library like ICU or utfcpp for grapheme-aware operations, and use std::format / std::print which handle UTF-8 output correctly on modern terminals.
Key Takeaway
Modern C++ text handling is categorized by ownership: std::string owns, std::string_view observes. For output, the 2026 standard is std::print/std::println — type-safe, fast, thread-safe. For numeric conversion, from_chars/to_chars are the zero-overhead workhorses. Custom std::formatter specializations make your types first-class citizens in the entire formatting ecosystem.
Read next: STL Containers Deep Dive: vector, map, unordered_map →
Part of the C++ Mastery Course — 30 modules from modern C++ basics to expert systems engineering.
