8 releases (4 breaking)

Uses old Rust 2015

0.5.0 Aug 12, 2017
0.4.2 Aug 5, 2017
0.3.0 Aug 4, 2017
0.2.3 Aug 2, 2017
0.1.2 May 26, 2017

#408 in Memory management

23 downloads per month

MIT license

105KB
1.5K SLoC

conc — An efficient concurrent reclamation system

conc builds upon hazard pointers to create a extremely performant system for concurrently handling memory. It is a general, convenient, memory light — and sometimes faster — alternative to epoch-based reclamation (see the trade-off section).

Overview

  • High-level API
    • Atomic<T> for an lockless readable and writable container.
    • sync for basic datastructures implemented through conc.
      • Treiber<T> for concurrent stacks.
      • Stm<T> for a simple implementation of STM.
  • Low-level API
    • add_garbage() for queuing destruction of garbage.
    • Guard<T> for blocking destruction.
  • Runtime control
    • gc() for collecting garbage to reduce memory.
    • settings for reconfiguring the system on-the-go.

Why?

aturon's blog post explains the issues of concurrent memory handling very well, although it take basis in epoch-based reclamation, which this crate is an alternative for.

The gist essentially is that you need to delete objects in most concurrent data structure (otherwise there would be memory leaks), however cannot safely do so, as there is no way to know if another thread is accessing the object in question. This (and other reclamation systems) provides a solution to this problem.

Usage

While the low-level API is available, it is generally sufficient to use the conc::Atomic<T> abstraction. This acts much like familiar Rust APIs. It allows the programmer to concurrently access a value through references, as well as update it, and more. Refer to the respective docs for more information.

If you are interested in implementing your own structures with conc, you must learn how to use Guard<T> and add_garbage. In short,

  • conc::add_garbage() adds a destructor with a pointer, which will be run eventually, when no one is reading the data anymore. In other words, it acts as a concurrent counterpart to Drop::drop().
  • Guard<T> "protects" a pointer from being destroyed. That is, it delays destruction (which is planned by conc::add_garbage()) until the guard is gone.

See their respective API docs for details on usage and behavior.

Debugging

Enable feature debug-tools and set environment variable CONC_DEBUG_MODE. For example, CONC_DEBUG_MODE=1 cargo test --features debug-tools. To get stacktraces after each message, set environment variable CONC_DEBUG_STACKTRACE.

Examples

See the sync source code.

Tradeoffs - Why not crossbeam/epochs?

Epochs (EBR) are generally faster than this crate, however the major advantage this has over epochs is that this doesn't suffer from memory blow-up in intense cases.

The issue with most other and faster solutions is that, if there is a non-trivial amount of threads (say 16) constantly reading/storing some pointer, it will never get to a state, where it can be reclaimed.

In other words, given sufficient amount of threads and frequency, the gaps between the reclamation might be very very long, causing very high memory usage, and potentially OOM crashes.

These issues are not hypothetical. It happened to me while testing the caching system of TFS. Essentially, the to-be-destroyed garbage accumulated several gigabytes, without ever being open to a collection cycle.

It reminds of the MongoDB debate. It might very well be the fastest solution¹, but if it can't even ensure consistency in these cases, what is the point?

That being said, there are cases where the other libraries are perfectly fine (e.g. if you have a bounded number of thread and a medium-long interval between accesses) and also faster.

¹If you want a super fast memory reclamation system, you should try NOP™, and not calling destructors.

Other advantages

  • The API of conc is - by design - more lightweight interface-wise, as it doesn't require the call side to pin epochs or similar. This is particularly nice when you design more complicated structures.
  • conc objects (Guard<T>) is not bound to a lifetime or similar, meaning that it is future-rs compatible among other.
  • In conc, threads can export garbage while there are active objects, meaning that memory won't accumulate on non-stopping usage.
  • conc is runtime configurable through the settings module.
  • Better debugging tools.
  • More-well tested and well-documented.
  • More fine-grained control over the runtime.
  • Access to low-level API.

Disadvantages

  • In many cases, it is slower.
  • Fewer pre-implemented data structures (for now).

Design & internals

It based on hazard pointers, although there are several differences. The idea is essentially that the system keeps track of some number of "hazards". As long as a hazard protects some object, the object cannot be deleted.

Once in a while, a thread performs a garbage collection by scanning the hazards and finding the objects not currently protected by any hazard. These objects are then deleted.

To improve performance, we use a layered approach: Both garbage (objects to be deleted eventually) and hazards are cached thread locally. This reduces the amount of atomic operations and cache misses.

For more details, see this blog post.

Garbage collection

Garbage collection of the concurrently managed object is done automatically between every n frees where n is chosen from some probability distribution.

Note that a garbage collection cycle might not clear all objects. For example, some objects could be protected by hazards. Others might not have been exported from the thread-local cache yet.

Performance

It is worth noting that atomic reads through this library usually requires three atomic CPU instruction, this means that if you are traversing a list or something like that, this library might not be for you.

Settings

You can reconfigure the system on-the-go through the settings module.

There are also presets. For example, if you experience high memory usage, you can do:

conc::settings::set_local(conc::settings::Settings::low_memory());

Dependencies

~0.5–1.2MB
~20K SLoC