std::pmr in Embedded C++ - Predictable Memory Without the Heap
A practical embedded-first guide to std::pmr: what it solves, where it fits, how to size buffers safely, and when to prefer a custom allocator or plain static storage instead.
- std::pmr in Embedded C++ - Predictable Memory Without the Heap
- What is the std::pmr introduced in C++17
- std::pmr is about Control & Predictibility
- Quick embedded background (for non-embedded readers)
- Why embedded engineers care
- What PMR actually changes
- Where PMR fits best
- What PMR does not solve
- The embedded mental model
- A practical buffer sizing strategy
- How to think about performance
- The failure mode that matters most
- A stricter option: remove the heap from the path entirely
- PMR versus other embedded options
- A simple decision tree
- Why I used a custom benchmark harness instead of Google Benchmark
- The Benchmark plots and what they indicate
- When I would recommend STD::PMR
- When I would not
- Bottom line
- References and code
std::pmr in Embedded C++ - Predictable Memory Without the Heap
What is the std::pmr introduced in C++17
std::pmr (Polymorphic Memory Resources) is a major feature introduced in C++17 that makes custom memory allocation much more convenient and flexible, especially when working with STL containers.
Core Idea
Traditional C++ allocators are templated (e.g., std::vector<T, MyAllocator
std::pmr solves this by introducing type-erased, polymorphic memory resources. The allocator behavior is determined at runtime via a virtual interface instead of at compile time.
Main Components
| Component | Description |
|---|---|
| std::pmr::memory_resource | Abstract base class (the core of the system). Defines allocate(), deallocate(), and is_equal(). |
| **std::pmr::polymorphic_allocator |
A standard allocator that forwards requests to a memory_resource*. This is what containers actually use. |
| Predefined resources | Ready-to-use memory resources provided by the standard library. |
Important Predefined Memory Resources
- std::pmr::new_delete_resource() — Default resource (uses ::operator new/delete).
- std::pmr::null_memory_resource() — Throws std::bad_alloc on any allocation (useful for testing).
- std::pmr::monotonic_buffer_resource — Very fast bump allocator. Allocations just move a pointer forward. Great for temporary arenas.
- std::pmr::synchronized_pool_resource — Thread-safe pool allocator (good for many small allocations).
- std::pmr::unsynchronized_pool_resource — Faster version when thread safety isn’t needed.
- std::pmr::tracked_memory_resource (since C++20) — Helps with debugging memory usage.
pmr-enabled Containers
C++17 provides std::pmr versions of most standard containers in the std::pmr namespace:
std::pmr::vector<int> v(&resource);
std::pmr::string s(&resource);
std::pmr::map<int, double> m(&resource);
std::pmr::unordered_set<std::pmr::string> set(&resource);
These all use std::pmr::polymorphic_allocator by default.
std::pmr is about Control & Predictibility
If you build embedded software long enough, you eventually hit the same trade-off:
you want the convenience of std::vector, std::string, and std::map, but you
do not want unpredictable heap behavior in the middle of a timing-critical path.
That is the space std::pmr fills.
PMR does not replace the standard library. It gives you a way to keep standard containers while taking ownership of where their memory comes from. That matters when the question is not “can this code run?” but “can this code run within a bounded budget, every time?”
flowchart LR
A[Embedded workload] --> B{Need dynamic containers?}
B -->|No| C[Static storage]
B -->|Yes| D{Need predictable memory?
}
D -->|Yes| E[std::pmr + bounded resource]
D -->|No| F[std::allocator or custom allocator]
The practical message is simple:
- Use static storage when the shape of the problem is fixed.
- Use PMR when the shape is dynamic, but the memory budget is still bounded.
- Use a custom allocator when the allocator type is known at compile time and you want the lowest overhead.
Quick embedded background (for non-embedded readers)
If you come from backend or desktop development, the core embedded difference is that timing and memory behavior are product requirements, not implementation details.
In many embedded systems:
- RAM is small and fixed for the life of the process.
- There is no virtual memory safety net.
- A long-tail latency spike can violate a control-loop deadline.
- Memory fragmentation can cause failures long after startup.
That is why teams often distinguish:
- Mean performance: useful for throughput.
- Tail latency (p95/p99) and jitter: critical for deadline reliability.
This post focuses on that second category: reducing unpredictable behavior when containers allocate memory in timing-sensitive code.
Why embedded engineers care
On desktop systems, allocation latency is usually background noise. In embedded systems, it is often part of the product requirement.
Typical constraints look like this:
- No heap at all in safety-critical paths.
- Strict latency budgets for control loops, protocol handlers, and sensor fusion.
- Small RAM footprints where fragmentation matters.
- A preference for failure you can detect early rather than failure that appears later under load.
PMR is useful because it lets you make memory behavior explicit. Instead of sprinkling allocation policy across types and APIs, you pass a resource into the place that needs it.
What PMR actually changes
With ordinary containers, the allocator is usually baked into the type.
With PMR containers, the resource is supplied at runtime.
That means the code that owns the workload can decide whether memory comes from a stack buffer, an arena, a pool, or a custom resource tuned for the device.
char buffer[8192];
std::pmr::monotonic_buffer_resource arena{
buffer,
sizeof(buffer),
std::pmr::null_memory_resource()
};
std::pmr::vector<SensorReading> readings{&arena};
readings.reserve(64);
for (const auto& sample : samples) {
readings.push_back(sample);
}
Two details matter here:
reserve()is still important when you already know the likely size.std::pmr::null_memory_resource()is a good guardrail when you want overflow to fail loudly instead of quietly falling back to the heap.
Where PMR fits best
PMR is strongest when the workload is local, bounded, and repeatable.
1. Request-scoped processing
Parsing a packet, formatting a response, or building a temporary AST are all good examples.
void handle_packet(const Packet& packet) {
char scratch[4096];
std::pmr::monotonic_buffer_resource arena{
scratch,
sizeof(scratch),
std::pmr::null_memory_resource()
};
std::pmr::string payload{&arena};
std::pmr::vector<Field> fields{&arena};
parse(packet, payload, fields);
process(payload, fields);
}
The lifetime matches the scope. You get predictable cleanup and no per-object free logic.
2. Bounded buffers in real-time code
When a task has a known upper bound, PMR lets you express that bound directly in code.
That is much more useful than “hope the heap behaves today.”
3. Systems that need runtime allocator choice
If the same code path sometimes uses a DMA buffer, sometimes shared memory, and sometimes an arena in RAM, PMR gives you one interface for all of it.
That is especially handy in embedded frameworks where memory policy belongs to the platform layer, not each individual container type.
What PMR does not solve
PMR is not a universal upgrade.
It is not the right answer if:
- You can fully size the data structure at compile time.
- You need the absolute lowest overhead and the allocator is known ahead of time.
- You need frequent individual frees across a long-lived structure.
- You want a general-purpose container replacement that improves every workload.
If the allocator is known at compile time, a custom allocator template is often a better fit than PMR.
If the data structure is fixed-size, a static buffer or boost::static_vector
style approach can be simpler and faster.
If the workload is long-lived and fragmented, a pool resource or a purpose-built allocator may be better than a monotonic arena.
The embedded mental model
The biggest benefit of PMR is not raw speed. It is that memory policy becomes a first-class design decision.
That gives you three things embedded teams care about:
- Predictable failure mode when memory is exhausted.
- Cleaner separation between algorithm and memory source.
- Easier tuning per subsystem without rewriting container-heavy code.
flowchart TB
A[Subsystem] --> B[Container logic]
A --> C[Memory policy]
C --> D[Stack buffer]
C --> E[Arena]
C --> F[Pool]
C --> G[DMA / shared memory]
style A fill:#74c0fc,stroke:#1971c2
style C fill:#ffe066,stroke:#f08c00
That separation is the real value proposition. PMR makes it possible to say:
- this path may use memory, but only from here,
- this path must never hit the general heap,
- this path can fail early if the budget is wrong.
A practical buffer sizing strategy
The most common mistake with PMR is not PMR itself. It is undersizing the buffer.
For monotonic resources, the buffer needs room for the working set plus growth. If you let a container grow and the arena is too small, the upstream resource will be used unless you deliberately block that fallback.
That means the safe workflow is:
- Measure the peak memory use of the workload.
- Add headroom for growth and allocator metadata.
- Use
null_memory_resource()in tests to catch bad sizing early. - Only allow fallback if the product really wants it.
Rule of thumb:
- Monotonic buffers: peak workload usage + margin.
- Pool resources: expected concurrent allocations and size classes.
- Safety-critical code: fail fast on exhaustion, then handle it explicitly.
How to think about performance
If you benchmark PMR incorrectly, you can make it look bad even when the design is sound.
The common traps are:
- benchmark code that accidentally falls back to the heap,
- omitted
reserve()in a workload that already knows the size, - debug builds or unoptimized binaries,
- comparing the wrong thing, such as “PMR” versus “std” instead of allocator policy versus allocator policy.
The useful conclusion is not “PMR is always faster” or “PMR is always slower.” It is this:
- PMR makes memory behavior explicit.
- Explicit memory behavior is valuable in embedded systems.
- Whether that is worth the overhead depends on the workload.
The failure mode that matters most
In embedded software, the worst outcome is often not a crash. It is a silent fallback to a slower or less predictable path that still appears to work.
With PMR, that can happen if a monotonic resource runs out of space and is allowed to fall back to an upstream allocator. Your code may still succeed, but now you are timing the wrong thing.
That is why the fail-fast pattern matters:
char buffer[8192];
std::pmr::monotonic_buffer_resource arena{
buffer,
sizeof(buffer),
std::pmr::null_memory_resource()
};
If the buffer is too small, the failure happens where you can see it, instead of quietly changing the timing model.
A stricter option: remove the heap from the path entirely
Some embedded teams go further and treat heap usage as a design error in critical paths.
That can mean:
- banning
new/deletein selected modules, - reviewing allocator usage at the API boundary,
- and forcing all dynamic storage through an approved resource or static policy.
PMR fits well into that style because it makes memory sourcing explicit. You can route requests to a bounded arena, a pool, or a device-specific resource instead of letting each container decide implicitly.
The important point is not that PMR eliminates the heap forever. It is that PMR lets you decide where heap-like behavior is allowed.
PMR versus other embedded options
PMR is only one answer. In many codebases the real choice is between three approaches:
| Approach | Strength | Trade-off | Best fit |
|---|---|---|---|
std::vector / std::string |
Familiar, fast, minimal ceremony | Memory policy is implicit | General-purpose code |
std::pmr containers |
Runtime-selected bounded resources | Slight extra abstraction and tuning | Embedded subsystems with known memory budgets |
| Fixed-capacity containers / ETL-style containers | Fully bounded, no heap dependency | Less flexibility, fixed maximum size | Hard real-time and very constrained systems |
If you already know the maximum size at compile time, a fixed-capacity container can be the cleanest answer. If you need some flexibility but still want to control where memory comes from, PMR is the middle ground.
A simple decision tree
flowchart TD
A[Need container-like storage?] --> B{Is the size fixed at compile time?}
B -->|Yes| C[Use static or fixed-capacity containers]
B -->|No| D{Need runtime control over memory source?}
D -->|No| E[Use std containers or a custom allocator]
D -->|Yes| F{Need bounded / fail-fast allocation?}
F -->|Yes| G[Use std::pmr with null upstream or bounded arena]
F -->|No| H[Use PMR with an appropriate upstream resource]
This is the simplest way to think about the design space:
- fixed size: static wins,
- known allocator type: custom allocator wins,
- runtime-selected bounded memory: PMR wins.
Why I used a custom benchmark harness instead of Google Benchmark
While Google Benchmark is excellent for measuring average throughput, embedded and real-time systems often prioritize tail latency (p95/p99), jitter, and predictable failure modes—metrics that aren’t its primary focus. For this analysis, I needed a harness that:
Explicitly tracks tail behavior (p95/p99 latency, variance across runs). Simulates memory exhaustion (e.g., using null_memory_resource() to test bounded buffers). Generates plots for visualizing trade-offs between speed, consistency, and memory policy. This approach aligns with the needs of embedded engineers, where worst-case behavior matters more than average performance.
The Benchmark plots and what they indicate
What this indicates: PMR is workload-dependent. In some paths it improves both speed and jitter, in others it mainly improves predictability, and in some it can lose on both metrics.
Vector workload
What this indicates: with correctly sized PMR resources, monotonic PMR can be very competitive and often faster for this bounded push-heavy pattern.
What this indicates: percentile spread shows tail behavior. The tighter PMR spread here reflects fewer outlier spikes when memory is pre-bounded.
String workload
What this indicates: string-heavy code can benefit from PMR by reducing allocation churn in repeated temporary construction.
What this indicates: PMR often improves p95 and p99 consistency for concat paths, which matters more than mean in deadline-driven systems.
What this indicates: for short-string dominated paths, PMR and std can be close in mean time, so jitter and memory-policy needs become the deciding factor.
What this indicates: even when averages are similar, percentile tails reveal which allocator path is more stable under repeated short-lived string churn.
Map workload
What this indicates: PMR can help ordered-map int-key inserts, but gains depend on node allocation pattern and resource choice.
What this indicates: map tail latency can diverge from average behavior, so p95/p99 should drive allocator decisions for registry-like structures.
What this indicates: string-key maps are mixed; allocator policy and string storage behavior dominate results, so measure this path specifically in your firmware.
What this indicates: string-heavy node allocations can amplify outliers, and percentile plots show whether PMR is helping or hurting worst-case latency.
Mixed realistic workload
What this indicates: in realistic container mixes, PMR can reduce both runtime and jitter when working sets are bounded and reused per task/scope.
What this indicates: this view highlights deadline risk directly. Smaller p95 to p99 gaps mean fewer long-tail surprises in data-collection loops.
What this indicates: this is the profile to watch for embedded messaging paths. PMR can lower long-tail latency by avoiding general heap behavior during bursts.
What this indicates: queue workloads are often bursty; the percentile chart confirms whether PMR is tightening tail latency where it matters operationally.
When I would recommend STD::PMR
Use PMR when you need one or more of the following:
- A bounded, non-fragmenting allocation model for a subsystem.
- Runtime selection of memory source.
- Cleaner APIs for request-scoped or task-scoped work.
- Standard containers without taking a dependency on the global heap.
- Easier testing of memory exhaustion and fallback paths.
When I would not
Do not reach for PMR just because it sounds modern.
If your use case is fixed-size, a static container or custom structure is often more direct.
If the allocator is part of the type and never changes, a custom allocator is usually cheaper.
If you need a high-throughput general-purpose container, the default STL path may still be the best choice.
Bottom line
PMR is not a performance gimmick.
It is a way to bring memory policy under control in code that still wants the expressiveness of standard containers.
For embedded engineers, that matters because the hard part is often not writing the algorithm. It is making sure the algorithm runs within a fixed memory and latency envelope every single time.
If you already know the exact allocator at compile time, use a custom allocator. If the data is fixed-size, use static storage. If you need runtime control over bounded memory, PMR is the middle ground worth considering.
References and code
The benchmark code and plots that informed this post are in the companion repo: pmr-benchmark