Run API Design
Overview
This document explains the naming conventions and call syntax of the
two launcher functions: run_async (fire-and-forget from non-coroutine
code) and run (awaitable within a coroutine). Both accept any type
satisfying IoLaunchableTask — not just task<T> — and use a
deliberate two-phase invocation pattern — f(context)(task) — that
exists for a mechanical reason rooted in coroutine frame allocation
timing.
Usage
run_async — Fire-and-Forget Launch
run_async launches any IoLaunchableTask from non-coroutine code:
main(), callback handlers, event loops. task<T> is the most common
conforming type, but any user-defined type satisfying the concept works.
The function does not return a value to the caller. Handlers receive the
task’s result or exception after completion.
// Executor only (uses default recycling allocator)
run_async(ex)(my_task());
// With a stop token for cooperative cancellation
std::stop_source source;
run_async(ex, source.get_token())(cancellable_task());
// With a custom memory resource
run_async(ex, my_pool)(my_task());
// With a result handler
run_async(ex, [](int result) {
std::cout << "Got: " << result << "\n";
})(compute_value());
// With separate success and error handlers
run_async(ex,
[](int result) { std::cout << "Got: " << result << "\n"; },
[](std::exception_ptr ep) { /* handle error */ }
)(compute_value());
// Full: executor, stop_token, allocator, success handler, error handler
run_async(ex, st, alloc, h1, h2)(my_task());
run — Awaitable Launch Within a Coroutine
run is the coroutine-side counterpart. It binds any
IoLaunchableTask to a (possibly different) executor and returns
the result to the caller via co_await. It also supports overloads
that customize stop token or allocator while inheriting the caller’s
executor.
// Switch to a different executor for CPU-bound work
task<void> parent()
{
int result = co_await run(worker_ex)(compute_on_worker());
// Completion returns to parent's executor
}
// Customize stop token, inherit caller's executor
task<void> with_timeout()
{
std::stop_source source;
co_await run(source.get_token())(subtask());
}
// Customize allocator, inherit caller's executor
task<void> with_custom_alloc()
{
co_await run(my_alloc)(subtask());
}
// Switch executor AND customize allocator
task<void> full_control()
{
co_await run(worker_ex, my_alloc)(cpu_bound_task());
}
Alternatives Considered
Several alternative naming and syntax proposals were evaluated and discarded. The following table shows each rejected form alongside the chosen form.
Builder Pattern: on / with / spawn / call
| Rejected | Chosen |
|---|---|
|
|
|
|
|
|
|
|
|
|
What this looks like in practice:
// Rejected: builder pattern
capy::on(ex).spawn(my_task());
co_await capy::on(worker_ex).call(compute());
co_await capy::with(my_alloc).call(subtask());
// Chosen: two-phase invocation
run_async(ex)(my_task());
co_await run(worker_ex)(compute());
co_await run(my_alloc)(subtask());
The builder pattern reads well as English, but it creates problems in C++ practice. See Why Not a Builder Pattern below for the full analysis.
Single-Call with Named Method
// Rejected: single-call
run_async(ex, my_task());
This fails the allocator timing constraint entirely. The task
argument my_task() is evaluated before run_async can set
the thread-local allocator. The coroutine frame is allocated with
the wrong (or no) allocator. This is not a style preference — it
is a correctness bug.
Named Method on Wrapper
// Rejected: named method instead of operator()
run_async(ex).spawn(my_task());
co_await run(ex).call(compute());
This preserves the two-phase timing guarantee and avoids the
namespace collision problems of on/with. The objection is minor:
.spawn() and .call() add vocabulary without adding clarity. The
wrapper already has exactly one purpose — accepting a task. A named
method implies the wrapper has a richer interface than it does.
operator() is the conventional C++ way to express "this object
does exactly one thing." That said, this alternative has legs and
could be revisited if the ()() syntax proves too confusing in
practice.
The Names
Why run
The run prefix was chosen for several reasons:
-
Greppability. Searching for
run_async(orrun(in a codebase produces unambiguous results. Short, common English words likeonorwithcollide with local variable names, parameter names, and other libraries. Ausing namespace capy;combined with a local variable namedonproduces silent shadowing bugs. -
Verb clarity.
runtells you what happens: something executes.run_asynctells you it executes without waiting.runinside a coroutine tells you control transfers and returns. Prepositions likeonandwithsay nothing about the action — they are sentence fragments waiting for a verb. -
Discoverability. The
run_*family groups together in documentation, autocompletion, and alphabetical listings. Users searching for "how do I launch a task" findrun_asyncandrunas a coherent pair. -
Consistency. The naming follows the established pattern from
io_context::run(),std::jthread, and other C++ APIs whererunmeans "begin executing work." -
No false promises. A builder-pattern syntax like
on(ex).spawn(t)implies composability —on(ex).with(alloc).call(t)— that the API does not deliver. Thef(x)(t)pattern is honest about being exactly two steps, no more. It does not invite users to chain methods that do not exist.
Why Not a Builder Pattern
An alternative proposal suggested replacing the two-call syntax with a builder-style API:
// Rejected builder pattern
capy::on(ex).spawn(my_task());
co_await capy::on(ex).call(compute());
co_await capy::with(st).call(subtask());
co_await capy::with(alloc).call(subtask());
capy::on(ex).block(my_task());
While the English readability of on(ex).spawn(t) is genuinely
appealing, the approach has practical problems in a Boost library:
-
Namespace pollution.
onandwithare among the most common English words in programming. In a Boost library used alongside dozens of other namespaces, these names invite collisions. Consider what happens withusing namespace capy;:int on = 42; // local variable on(ex).spawn(my_task()); // ambiguous: variable or function? void handle(auto with) { // parameter name with(alloc).call(sub()); // won't compile }The names
runandrun_asyncdo not have this problem. No one names their variablesrun_async. -
Semantic ambiguity.
with(st)versuswith(alloc)— with what, exactly? The current API usesrun(st)andrun(alloc)where overload resolution disambiguates naturally because the verbrunprovides context. A bare preposition provides none.// What does "with" mean here? Stop token or allocator? co_await capy::with(x).call(subtask()); // "run" provides a verb -- the argument type disambiguates co_await run(x)(subtask()); -
Builder illusion. Dot-chaining suggests composability that does not exist. Users will naturally try:
// These look reasonable but don't work capy::on(ex).with(alloc).call(my_task()); capy::on(ex).with(st).with(alloc).spawn(my_task(), h1, h2);The current syntax makes the interface boundary explicit: the first call captures all context, the second call accepts the task. There is no dot-chain to extend.
-
Erases the test boundary.
run_blockinglives incapy::testdeliberately — it is a test utility, not a production API. The proposedon(ex).block(t)places it alongside.spawn()and.call()as if it were a first-class production method. That is a promotion this API has not earned. -
Hidden critical ordering. The two-phase invocation exists for a mechanical reason (allocator timing, described below). With
on(ex).spawn(t), the critical sequencing guarantee is buried behind what looks like a casual method call. The()()syntax is pedagogically valuable — it signals that something important happens in two distinct steps. -
Overload count does not shrink.
run_asynchas 18 overloads for good reason (executor x stop_token x allocator x handlers). The builder pattern still needs all those combinations — they just move from free function overloads to constructor or method overloads. The complexity does not vanish; it relocates.
The Two-Phase Invocation
The Problem: Allocator Timing
Coroutine frame allocation happens before the coroutine body executes. When the compiler encounters a coroutine call, it:
-
Calls
operator newto allocate the frame -
Constructs the promise object
-
Begins execution of the coroutine body
Any mechanism that injects the allocator after the call — receiver
queries, await_transform, explicit method calls — arrives too late.
The frame is already allocated.
This is the fundamental tension identified in D4003 ยง3.3:
The allocator must be present at invocation. Coroutine frame allocation has a fundamental timing constraint:
operator newexecutes before the coroutine body. When a coroutine is called, the compiler allocates the frame first, then begins execution. Any mechanism that injects context later — receiver connection,await_transform, explicit method calls — arrives too late.
The Solution: C++17 Postfix Evaluation Order
C++17 guarantees that in a postfix-expression call, the postfix-expression is sequenced before the argument expressions:
The postfix-expression is sequenced before each expression in the expression-list and any default argument. — [expr.call]
In the expression run_async(ex)(my_task()):
-
run_async(ex)evaluates first. This returns a wrapper object (run_async_wrapper) whose constructor setscurrent_frame_allocator()— a thread-local pointer to the memory resource. -
my_task()evaluates second. The coroutine’soperator newreads the thread-local pointer and allocates the frame from it. -
operator()on the wrapper takes ownership of the task and dispatches it to the executor.
// Step 1: wrapper constructor sets TLS allocator
// v~~~~~~~~~~~~~~v
run_async(ex, alloc) (my_task());
// ^~~~~~~~~^
// Step 2: task frame allocated using TLS allocator
This sequencing is not an implementation detail — it is the only correct way to inject an allocator into a coroutine’s frame allocation when the allocator is not known at compile time.
How It Works in the Code
The run_async_wrapper constructor sets the thread-local allocator:
run_async_wrapper(Ex ex, std::stop_token st, Handlers h, Alloc a)
: tr_(detail::make_trampoline<Ex, Handlers, Alloc>(
std::move(ex), std::move(h), std::move(a)))
, st_(std::move(st))
{
// Set TLS before task argument is evaluated
current_frame_allocator() = tr_.h_.promise().get_resource();
}
The task’s operator new reads it:
static void* operator new(std::size_t size)
{
auto* mr = current_frame_allocator();
if(!mr)
mr = std::pmr::get_default_resource();
return mr->allocate(size, alignof(std::max_align_t));
}
// Correct: wrapper is a temporary, used immediately
run_async(ex)(my_task());
// Compile error: cannot call operator() on an lvalue
auto w = run_async(ex);
w(my_task()); // Error: requires rvalue
The run Variant
The run function uses the same two-phase pattern inside coroutines.
An additional subtlety arises: the wrapper is a temporary that dies
before co_await suspends the caller. The wrapper’s
frame_memory_resource would be destroyed before the child task
executes.
The solution is to store a copy of the allocator in the awaitable
returned by operator(). Since standard allocator copies are
equivalent — memory allocated with one copy can be deallocated with
another — this preserves correctness while keeping the allocator
alive for the task’s duration.
Comparison with std::execution
In std::execution (P2300), context flows backward from receiver
to sender via queries after connect():
task<int> async_work(); // Frame allocated NOW
auto sndr = async_work();
auto op = connect(sndr, receiver); // Allocator available NOW -- too late
start(op);
In the IoAwaitable model, context flows forward from launcher to task:
1. Set TLS allocator --> 2. Call task()
3. operator new (uses TLS)
4. await_suspend
The allocator is ready before the frame is created. No query machinery can retroactively fix an allocation that already happened.
Summary
|
Fire-and-forget launch from non-coroutine code |
|
Awaitable launch within a coroutine |
The run name is greppable, unambiguous, and won’t collide with
local variables in a namespace-heavy Boost codebase. The f(ctx)(task)
syntax exists because coroutine frame allocation requires the
allocator to be set before the task expression is evaluated, and
C++17 postfix sequencing guarantees exactly that ordering. The syntax
is intentionally explicit about its two steps — it tells the reader
that something important happens between them.