15 releases (stable)

new 2.2.3 Nov 6, 2024
2.1.1 Sep 7, 2024
1.0.3 Aug 21, 2024
1.0.1 Jul 15, 2024
0.5.0 Jun 3, 2024

#364 in Memory management

Download history 225/week @ 2024-07-10 38/week @ 2024-07-17 105/week @ 2024-08-07 6/week @ 2024-08-14 154/week @ 2024-08-21 143/week @ 2024-08-28 336/week @ 2024-09-04 67/week @ 2024-09-11 27/week @ 2024-09-18 31/week @ 2024-09-25 213/week @ 2024-10-02 28/week @ 2024-10-09 181/week @ 2024-10-16 5/week @ 2024-10-23

443 downloads per month
Used in 2 crates

GPL-2.0-only

16MB
3K SLoC

Framework to implement sched_ext schedulers running in user-space

scx_rustland_core is a Rust framework designed to facilitate the implementation of user-space schedulers based on the Linux kernel sched_ext feature.

sched_ext allows to dynamic load and execute custom schedulers in the kernel, leveraging BPF to manage scheduling policies.

This crate provides an abstraction layer for sched_ext, enabling developers to write schedulers in Rust without dealing with low-level kernel or BPF details.

Features

  • Generic BPF Abstraction: Interact with BPF components using a high-level Rust API.
  • Task Scheduling: Enqueue and dispatch tasks using provided methods.
  • CPU Selection: Select idle CPUs for task execution with a preference for reusing previous CPUs.
  • Time slice: Assign a specific time slice on a per-task basis.
  • Performance Reporting: Access internal scheduling statistics.

API

BpfScheduler

The BpfScheduler struct is the core interface for interacting with the BPF component.

  • Initialization:

    • BpfScheduler::init registers and initializes the BPF component.
  • Task Management:

    • dequeue_task(): Retrieve tasks that need to be scheduled.
    • dispatch_task(task: &DispatchedTask): Dispatch tasks to specific CPUs.
    • select_cpu(pid: i32, prev_cpu: i32, flags: u64): Select an idle CPU for a task.
  • Completion Notification:

    • notify_complete(nr_pending: u64) reports the number of pending tasks to the BPF component.

Getting Started

  • Installation:
    • Add scx_rustland_core to your Cargo.toml dependencies.
[dependencies]
scx_rustland_core = "0.1"
  • Implementation:

    • Create your scheduler by implementing the provided API.
  • Execution:

    • Compile and run your scheduler. Ensure that your kernel supports sched_ext and is configured to load your BPF programs.

Example

Following you can find a simple example of a fully working FIFO scheduler, implemented using the scx_rustland_core framework:

// Copyright (c) Andrea Righi <andrea.righi@linux.dev>

// This software may be used and distributed according to the terms of the
// GNU General Public License version 2.
mod bpf_skel;
pub use bpf_skel::*;
pub mod bpf_intf;

mod bpf;
use bpf::*;

use scx_utils::UserExitInfo;

use libbpf_rs::OpenObject;

use std::mem::MaybeUninit;
use std::collections::VecDeque;

use anyhow::Result;

const SLICE_US: u64 = 5000;

struct Scheduler<'a> {
    bpf: BpfScheduler<'a>,
    task_queue: VecDeque<QueuedTask>,
}

impl<'a> Scheduler<'a> {
    fn init(open_object: &'a mut MaybeUninit<OpenObject>) -> Result<Self> {
        let bpf = BpfScheduler::init(
            open_object,
            0,     // exit_dump_len (buffer size of exit info, 0 = default)
            false, // partial (false = include all tasks)
            false, // debug (false = debug mode off)
        )?;
        Ok(Self { bpf, task_queue: VecDeque::new() })
    }

    fn consume_all_tasks(&mut self) {
        // Consume all tasks that are ready to run.
        //
        // Each task contains the following details:
        //
        // pub struct QueuedTask {
        //     pub pid: i32,              // pid that uniquely identifies a task
        //     pub cpu: i32,              // CPU where the task is running
        //     pub sum_exec_runtime: u64, // Total cpu time
        //     pub weight: u64,           // Task static priority
        //     pub nvcsw: u64,            // Total amount of voluntary context switches
        //     pub slice: u64,            // Remaining time slice budget
        //     pub vtime: u64,            // Current task vruntime / deadline (set by the scheduler)
        // }
        //
        // Although the FIFO scheduler doesn't use these fields, they can provide valuable data for
        // implementing more sophisticated scheduling policies.
        while let Ok(Some(task)) = self.bpf.dequeue_task() {
            self.task_queue.push_back(task);
        }
    }

     fn dispatch_next_task(&mut self) {
        if let Some(task) = self.task_queue.pop_front() {
            // Create a new task to be dispatched, derived from the received enqueued task.
            //
            // pub struct DispatchedTask {
            //     pub pid: i32,      // pid that uniquely identifies a task
            //     pub cpu: i32,      // target CPU selected by the scheduler
            //     pub flags: u64,    // special dispatch flags
            //     pub slice_ns: u64, // time slice assigned to the task (0 = default)
            // }
            //
            // The dispatched task's information are pre-populated from the QueuedTask and they can
            // be modified before dispatching it via self.bpf.dispatch_task().
            let mut dispatched_task = DispatchedTask::new(&task);

            // Decide where the task needs to run (target CPU).
            //
            // A call to select_cpu() will return the most suitable idle CPU for the task,
            // considering its previously used CPU.
            let cpu = self.bpf.select_cpu(task.pid, task.cpu, 0);
            if cpu >= 0 {
                dispatched_task.cpu = cpu;
            } else {
                dispatched_task.flags |= RL_CPU_ANY;
            }

            // Decide for how long the task needs to run (time slice); if not specified
            // SCX_SLICE_DFL will be used by default.
            dispatched_task.slice_ns = SLICE_US;

            // Dispatch the task on the target CPU.
            self.bpf.dispatch_task(&dispatched_task).unwrap();

            // Notify the BPF component of the number of pending tasks and immediately give a
            // chance to run to the dispatched task.
            self.bpf.notify_complete(self.task_queue.len() as u64);
        }
    }

    fn dispatch_tasks(&mut self) {
        loop {
            // Consume all tasks before dispatching any.
            self.consume_all_tasks();

            // Dispatch one task from the queue.
            self.dispatch_next_task();

            // If no task is ready to run (or in case of error), stop dispatching tasks and notify
            // the BPF component that all tasks have been scheduled / dispatched, with no remaining
            // pending tasks.
            if self.task_queue.is_empty() {
                self.bpf.notify_complete(0);
                break;
            }
        }
    }

    fn run(&mut self) -> Result<UserExitInfo> {
        while !self.bpf.exited() {
            self.dispatch_tasks();
        }
        self.bpf.shutdown_and_report()
    }
}

fn main() -> Result<()> {
    // Initialize and load the FIFO scheduler.
    let mut open_object = MaybeUninit::uninit();
    loop {
        let mut sched = Scheduler::init(&mut open_object)?;
        if !sched.run()?.should_restart() {
            break;
        }
    }

    Ok(())
}

License

This software is licensed under the GNU General Public License version 2. See the LICENSE file for details.

Contributing

Contributions are welcome! Please submit issues or pull requests via GitHub.

Dependencies

~23–34MB
~601K SLoC