#coroutine #async #events #server #stackful-coroutine

coroutines

High performance coroutine library with native experience

4 releases

Uses old Rust 2015

0.2.0 Feb 22, 2018
0.1.2 Feb 20, 2018
0.1.1 Feb 19, 2018
0.1.0 Feb 19, 2018

#1032 in Concurrency

LGPL-3.0

55KB
1.5K SLoC

C 867 SLoC // 0.0% comments Rust 299 SLoC C++ 147 SLoC GNU Style Assembly 40 SLoC Shell 8 SLoC

rust-coroutines

'Native' stackful coroutine library for Rust.

Replace std::thread::spawn with coroutines::spawn and that's all. Everything will just work!

Crates.io Build Status

Documentation

Features

  • Most libstd functions that was blocking get patched and become asynchronous automatically
  • Efficient asynchronous I/O and task scheduling
  • Voluntary preemption and work stealing

How does it work?

rust-coroutines includes a separate runtime library which provides coroutine scheduling and patches the runtime environment, hooking the entries to blocking system calls in libc.

Build and use

The runtime library includes a lot of platform-specific code and writing it in Rust won't provide much benefit. Therefore, I decide to write it in C. Before using the library, you need to build (and optionally 'install') core_impl/unblock_hook first (A modern C compiler is required):

cd core_impl
make unblock_hook
sudo cp libunblock_hook.so /usr/lib/ # Optional (see below)

If you can't or don't want to install libunblock_hook as a system-wide library, you could specify RUSTFLAGS="-L/path/to/core_impl/" for cargo commands and specify LD_LIBRARY_PATH=/path/to/core_impl/ when running the built binaries.

After setting up the runtime library, you will be able to use the coroutines crate in your application.

For example:

sleep.rs

extern crate coroutines;

fn sleep_thread() {
    println!("sleep_thread enter");
    std::thread::sleep(std::time::Duration::from_millis(500));
    println!("sleep_thread exit");
}

fn main() {
    for _ in 0..5 {
        coroutines::spawn(sleep_thread);
    }

    std::thread::sleep(std::time::Duration::from_secs(1));
    println!("main exit");
}

tcp_listener.rs

extern crate coroutines;

use std::net::{TcpListener, TcpStream};
use std::io::{Read, Write};

fn start() {
    let listener = TcpListener::bind("127.0.0.1:9781").unwrap();
    for stream in listener.incoming() {
        if let Ok(stream) = stream {
            coroutines::spawn(move || stream.write("Hello world\n".as_bytes()));
        } else {
            println!("{:?}", stream.unwrap_err());
        }
    }
}

fn main() {
    coroutines::spawn(start).join().unwrap();
}

More examples can be found in examples/.

It should be noted that all data structures related to blocking libstd API calls should be constructed in a coroutine to inform the runtime library to apply necessary patches. Typically, this can be done by spawning a coroutine from main and do everything there. Not doing so may lead to unexpected blocking behaviors (e.g. TcpListener::accept blocks instead of yielding out if the TcpListener object is constructed outside a coroutine and does not have the nonblocking flag bit set. )

Performance

The performance of rust-coroutines is similar to most other (stackful/stackless) coroutine libraries in Rust, including may and futures (with tokio).

With a modified version of corona benchmarks:

test async                        ... bench:  31,028,198 ns/iter (+/- 2,738,679)
test async_cpupool                ... bench:  29,813,426 ns/iter (+/- 14,394,817)
test async_cpupool_cpus           ... bench:  31,497,284 ns/iter (+/- 7,074,686)
test async_cpupool_many           ... bench:  30,785,178 ns/iter (+/- 7,982,811)
test async_cpus                   ... bench:  25,660,063 ns/iter (+/- 204,811,182)
test async_many                   ... bench:  24,347,067 ns/iter (+/- 1,980,457)
test corona                       ... bench:  38,565,408 ns/iter (+/- 1,717,367)
test corona_blocking_wrapper      ... bench:  38,856,394 ns/iter (+/- 2,242,614)
test corona_blocking_wrapper_cpus ... bench:  29,147,673 ns/iter (+/- 89,727,410)
test corona_blocking_wrapper_many ... bench:  28,384,512 ns/iter (+/- 2,480,628)
test corona_cpus                  ... bench:  28,862,550 ns/iter (+/- 2,197,395)
test corona_many                  ... bench:  28,509,142 ns/iter (+/- 2,767,814)
test coroutines                   ... bench:  26,673,276 ns/iter (+/- 3,232,604)
test coroutines_cpus              ... bench:  27,194,849 ns/iter (+/- 2,787,879)
test coroutines_many              ... bench:  26,744,986 ns/iter (+/- 3,161,595)
test futures                      ... bench:  30,695,434 ns/iter (+/- 2,447,777)
test futures_cpupool              ... bench:  29,626,141 ns/iter (+/- 3,090,787)
test futures_cpupool_cpus         ... bench:  30,573,408 ns/iter (+/- 3,549,979)
test futures_cpupool_many         ... bench:  30,276,154 ns/iter (+/- 4,121,736)
test futures_cpus                 ... bench:  24,814,705 ns/iter (+/- 2,107,849)
test futures_many                 ... bench:  24,750,719 ns/iter (+/- 1,875,699)
test may                          ... bench:  27,553,510 ns/iter (+/- 3,164,095)
test may_cpus                     ... bench:  28,315,233 ns/iter (+/- 2,892,440)
test may_many                     ... bench:  27,812,071 ns/iter (+/- 2,814,072)
test threads                      ... bench:  42,440,270 ns/iter (+/- 4,181,660)
test threads_cpus                 ... bench:  40,202,798 ns/iter (+/- 6,837,590)
test threads_many                 ... bench:  40,259,324 ns/iter (+/- 864,916,488)

Work stealing

Migrating coroutines to another OS thread is safe in most cases, but not if the user code makes use of something related to the current thread - or more generally, not Send. Thread Local Storage, for example. Therefore, migration of coroutines ("work stealing") is disabled by default and requires manual configuration to turn on.

Known issues

  • Synchronization primitives are not patched yet and will block the current execution unit while waiting for something to be available (e.g. locks & conditional variables). The runtime will spin up a new executor if it detects that there is no executor available for a period of time, but the workaround takes more time and system resources than a set of properly implemented asynchronous replacements.
  • Thread local storage (TLS) is not yet patched and might produce unexpected results while used in coroutines (not violating Rust's safety rules with safe user code & work stealing disabled, though).
  • Only x86-64 Linux is supported at the moment. Porting to other Unix-like systems and i386 & ARM architectures is planned.

Dependencies

~160KB