WriteStream Concept Design
Overview
This document describes the design of the WriteStream concept: the
fundamental partial-write primitive in the concept hierarchy. It explains
why write_some is the correct building block, how algorithms expressed
directly in terms of write_some can outperform composed complete-write
algorithms like write_now, and when each approach is appropriate.
Definition
template<typename T>
concept WriteStream =
requires(T& stream, const_buffer_archetype buffers)
{
{ stream.write_some(buffers) } -> IoAwaitable;
requires awaitable_decomposes_to<
decltype(stream.write_some(buffers)),
std::error_code, std::size_t>;
};
A WriteStream provides a single operation:
write_some(buffers) — Partial Write
Writes one or more bytes from the buffer sequence. Returns
(error_code, std::size_t) where n is the number of bytes written.
Semantics
-
On success:
!ec,n >= 1andn <= buffer_size(buffers). -
On error:
ec,n == 0. -
If
buffer_empty(buffers): completes immediately,!ec,n == 0.
The caller must not assume that all bytes are consumed. write_some
may write fewer bytes than offered. This is the defining property of a
partial-write primitive.
Concept Hierarchy
WriteStream is the base of the write-side hierarchy:
WriteStream { write_some }
|
v
WriteSink { write_some, write, write_eof(buffers), write_eof() }
Every WriteSink is a WriteStream. Algorithms constrained on
WriteStream accept both raw streams and sinks. The WriteSink
concept adds complete-write and EOF signaling on top of the partial-write
primitive. See the WriteSink design document for details.
Composed Algorithms
Two composed algorithms build complete-write behavior on top of
write_some:
write (free function)
auto write(WriteStream auto& stream,
ConstBufferSequence auto const& buffers)
-> io_task<std::size_t>;
Loops write_some until the entire buffer sequence is consumed. Always
suspends (returns task). No frame caching.
write_now (class template)
template<WriteStream Stream>
class write_now
{
public:
explicit write_now(Stream& s) noexcept;
IoAwaitable auto operator()(ConstBufferSequence auto buffers);
};
Loops write_some until the entire buffer sequence is consumed, with
two advantages over the free function:
-
Eager completion: if every
write_somereturns synchronously (itsawait_readyreturnstrue), the entire operation completes inawait_readywith zero coroutine suspensions. -
Frame caching: the internal coroutine frame is allocated once and reused across calls.
Buffer Top-Up: Why write_some Can Outperform write_now
The critical design insight behind write_some as a primitive is that
the caller retains control after each partial write. This enables a
pattern called buffer top-up: after a partial write consumes some data,
the caller refills the buffer before the next write, keeping the buffer
as full as possible. This maximizes the payload of each system call.
A composed algorithm like write_now cannot do this. It receives a fixed
buffer sequence and drains it to completion. When the kernel accepts only
part of the data, write_now must send the remainder in a second call — even though the remainder may be small. The caller has no opportunity to
read more data from the source between iterations.
Diagram: Relaying 100KB from a ReadSource through a TCP Socket
Consider relaying 100KB from a ReadSource to a TCP socket. The kernel’s
send buffer accepts at most 40KB per call. Compare two approaches:
Approach A: write_some with Top-Up (3 syscalls)
buffer contents syscall kernel accepts
Step 1: [======== 64KB ========] write_some --> 40KB, read 40KB from source
Step 2: [======== 64KB ========] write_some --> 40KB, read 20KB (source done)
Step 3: [===== 44KB =====] write_some --> 44KB
done. 100KB in 3 syscalls, every call near-full.
Approach B: write_now Without Top-Up (4 syscalls)
buffer contents syscall kernel accepts
Step 1: [======== 64KB ========] write_some --> 40KB (write_now, read 64KB)
Step 2: [=== 24KB ===] write_some --> 24KB (write_now, small payload)
Step 3: [====== 36KB ======] write_some --> 20KB (write_now, read 36KB)
Step 4: [== 16KB ==] write_some --> 16KB (write_now, small payload)
done. 100KB in 4 syscalls, two calls undersized.
Every time write_now partially drains a buffer, the remainder is a
small payload that wastes a syscall. With top-up, the caller refills
the ring buffer between calls, keeping each syscall near capacity.
Code: write_some with Buffer Top-Up
This example reads from a ReadSource and writes to a WriteStream
using a circular_dynamic_buffer. After each partial write frees space
in the ring buffer, the caller reads more data from the source to refill
it before calling write_some again.
template<ReadSource Source, WriteStream Stream>
task<> relay_with_topup(Source& src, Stream& dest)
{
char storage[65536];
circular_dynamic_buffer cb(storage, sizeof(storage));
for(;;)
{
// Fill: read from source into free space
auto mb = cb.prepare(cb.capacity());
auto [rec, nr] = co_await src.read(mb);
cb.commit(nr);
if(rec && rec != cond::eof && nr == 0)
co_return;
// Drain: write_some from the ring buffer
while(cb.size() > 0)
{
auto [wec, nw] = co_await dest.write_some(
cb.data());
if(wec)
co_return;
// consume only what was written
cb.consume(nw);
// Top-up: refill freed space before next
// write_some, so the next call presents
// the largest possible payload
if(cb.capacity() > 0 && rec != cond::eof)
{
auto mb2 = cb.prepare(cb.capacity());
auto [rec2, nr2] = co_await src.read(mb2);
cb.commit(nr2);
rec = rec2;
}
// write_some now sees a full (or nearly full)
// ring buffer, maximizing the syscall payload
}
if(rec == cond::eof)
co_return;
}
}
After write_some accepts 40KB of a 64KB buffer, consume(40KB) frees
40KB. The caller immediately reads more data from the source into that
freed space. The next write_some again presents a full 64KB payload.
Code: write_now Without Top-Up
This example reads from a ReadSource and writes to a WriteStream
using write_now. Each chunk is drained to completion before the caller
can read more from the source.
template<ReadSource Source, WriteStream Stream>
task<> relay_with_write_now(Source& src, Stream& dest)
{
char buf[65536];
write_now wn(dest);
for(;;)
{
// Read a chunk from the source
auto [rec, nr] = co_await src.read(
mutable_buffer(buf, sizeof(buf)));
if(rec == cond::eof && nr == 0)
co_return;
// write_now drains the chunk to completion.
// If the kernel accepts 40KB of 64KB, write_now
// internally calls write_some(24KB) for the
// remainder -- a small write that wastes a
// syscall. The caller cannot top up between
// write_now's internal iterations.
auto [wec, nw] = co_await wn(
const_buffer(buf, nr));
if(wec)
co_return;
if(rec == cond::eof)
co_return;
}
}
After the kernel accepts 40KB of a 64KB chunk, write_now must send
the remaining 24KB in a second write_some. The caller cannot intervene
to refill the buffer because write_now owns the loop. That 24KB write
wastes an opportunity to send a full 64KB payload.
When to Use Each Approach
| Approach | Best For | Trade-off |
|---|---|---|
|
High-throughput relays, producer-consumer loops where the caller has more data available and can top up after partial writes. |
Caller manages the loop and buffer refill. |
|
Writing discrete complete payloads (a single HTTP header, a serialized message) where there is no additional data to top up with, or where the write is expected to complete in one call. |
Cannot top up between iterations. Small remainders waste syscall payloads. |
|
Sink-oriented code where the concrete type implements complete-write natively (buffered writer, file, compressor) and the caller does not manage the loop. |
Requires |
Rule of Thumb
-
If the caller reads from a source and relays to a raw byte stream (TCP socket), use
write_somewith acircular_dynamic_bufferfor buffer top-up. -
If the caller has a discrete, bounded payload and wants zero-fuss complete-write semantics, use
write_now. -
If the destination is a
WriteSink, usewritedirectly.
Conforming Types
Examples of types that satisfy WriteStream:
-
TCP sockets:
write_somemaps to a singlesend()orWSASend()call. Partial writes are normal under load. -
TLS streams:
write_someencrypts and sends one TLS record. -
Buffered write streams:
write_someappends to an internal buffer and returns immediately when space is available, or drains to the underlying stream when full. -
QUIC streams:
write_somesends one or more QUIC frames. -
Test mock streams:
write_somerecords data and returns configurable results for testing.
All of these types also naturally extend to WriteSink by adding
write, write_eof(buffers), and write_eof().
Relationship to ReadStream
The read-side counterpart is ReadStream, which requires read_some.
The same partial-transfer / composed-algorithm decomposition applies:
| Write Side | Read Side |
|---|---|
|
|
|
|
|
|
The asymmetry is that the read side does not have a read_now with
eager completion, because reads depend on data arriving from the
network — the synchronous fast path is less reliably useful than
for writes into a buffered stream.
Summary
WriteStream provides write_some as the single partial-write
primitive. This is deliberately minimal:
-
Algorithms that need complete-write semantics use
write_now(forWriteStream) orwrite(forWriteSink). -
Algorithms that need maximum throughput use
write_somedirectly with buffer top-up, achieving fewer syscalls than composed algorithms by keeping the buffer full between iterations. -
The concept is the base of the hierarchy.
WriteSinkrefines it by addingwrite,write_eof(buffers), andwrite_eof().
The choice between write_some, write_now, and WriteSink::write
is a throughput-versus-convenience trade-off. write_some gives the
caller maximum control. write_now gives the caller maximum simplicity.
WriteSink::write gives the concrete type maximum implementation
freedom.