13 breaking releases

0.14.0 Oct 21, 2024
0.12.0 Mar 25, 2024
0.10.0 Nov 7, 2023
0.9.0 Jul 4, 2023
0.1.0 Dec 22, 2021

#31 in Operating systems

Download history 11916/week @ 2024-09-27 9601/week @ 2024-10-04 7097/week @ 2024-10-11 8492/week @ 2024-10-18 7479/week @ 2024-10-25 7740/week @ 2024-11-01 7800/week @ 2024-11-08 5140/week @ 2024-11-15 3974/week @ 2024-11-22 6552/week @ 2024-11-29 4886/week @ 2024-12-06 4877/week @ 2024-12-13 3232/week @ 2024-12-20 2870/week @ 2024-12-27 4688/week @ 2025-01-03 4194/week @ 2025-01-10

15,793 downloads per month
Used in 24 crates (21 directly)

Apache-2.0 AND BSD-3-Clause

320KB
6K SLoC

virtio-queue

The virtio-queue crate provides a virtio device implementation for a virtio queue, a virtio descriptor and a chain of such descriptors. Two formats of virtio queues are defined in the specification: split virtqueues and packed virtqueues. The virtio-queue crate offers support only for the split virtqueues format. The purpose of the virtio-queue API is to be consumed by virtio device implementations (such as the block device or vsock device). The main abstraction is the Queue. The crate is also defining a state object for the queue, i.e. QueueState.

Usage

Let’s take a concrete example of how a device would work with a queue, using the MMIO bus.

First, it is important to mention that the mandatory parts of the virtio interface are the following:

  • the device status field → provides an indication of the completed steps of the device initialization routine,
  • the feature bits → the features the driver/device understand(s),
  • notifications,
  • one or more virtqueues → the mechanism for data transport between the driver and device.

Each virtqueue consists of three parts:

  • Descriptor Table,
  • Available Ring,
  • Used Ring.

Before booting the virtual machine (VM), the VMM does the following set up:

  1. initialize an array of Queues using the Queue constructor.
  2. register the device to the MMIO bus, so that the driver can later send read/write requests from/to the MMIO space, some of those requests also set up the queues’ state.
  3. other pre-boot configurations, such as registering a fd for the interrupt assigned to the device, fd which will be later used by the device to inform the driver that it has information to communicate.

After the boot of the VM, the driver starts sending read/write requests to configure things like:

  • the supported features;
  • queue parameters. The following setters are used for the queue set up:
    • set_size → for setting the size of the queue.
    • set_ready → configure the queue to the ready for processing state.
    • set_desc_table_address, set_avail_ring_address, set_used_ring_address → configure the guest address of the constituent parts of the queue.
    • set_event_idx → it is called as part of the features' negotiation in the virtio-device crate, and is enabling or disabling the VIRTIO_F_RING_EVENT_IDX feature.
  • the device activation. As part of this activation, the device can also create a queue handler for the device, that can be later used to process the queue.

Once the queues are ready, the device can be used.

The steady state operation of a virtio device follows a model where the driver produces descriptor chains which are consumed by the device, and both parties need to be notified when new elements have been placed on the associate ring to avoid busy polling. The precise notification mechanism is left up to the VMM that incorporates the devices and queues (it usually involves things like MMIO vm exits and interrupt injection into the guest). The queue implementation is agnostic to the notification mechanism in use, and it exposes methods and functionality (such as iterators) that are called from the outside in response to a notification event.

Data transmission using virtqueues

The basic principle of how the queues are used by the device/driver is the following, as showed in the diagram below as well:

  1. when the guest driver has a new request (buffer), it allocates free descriptor(s) for the buffer in the descriptor table, chaining as necessary.
  2. the driver adds a new entry with the head index of the descriptor chain describing the request, in the available ring entries.
  3. the driver increments the idx with the number of new entries, the diagram shows the simple use case of only one new entry.
  4. the driver sends an available buffer notification to the device if such notifications are not suppressed.
  5. the device will at some point consume that request, by first reading the idx field from the available ring. This can be directly achieved with Queue::avail_idx, but we do not recommend to the consumers of the crate to use this because it is already called behind the scenes by the iterator over all available descriptor chain heads.
  6. the device gets the index of the descriptor chain(s) corresponding to the read idx value.
  7. the device reads the corresponding descriptor(s) from the descriptor table.
  8. the device adds a new entry in the used ring by using Queue::add_used; the entry is defined in the spec as virtq_used_elem, and in virtio-queue as VirtqUsedElem. This structure is holding both the index of the descriptor chain and the number of bytes that were written to the memory as part of serving the request.
  9. the device increments the idx from the used ring; this is done as part of the Queue::add_used that was mentioned above.
  10. the device sends a used buffer notification to the driver if such notifications are not suppressed.

queue

A descriptor is storing four fields, with the first two, addr and len, pointing to the data in memory to which the descriptor refers, as shown in the diagram below. The flags field is useful for indicating if, for example, the buffer is device readable or writable, or if we have another descriptor chained after this one (VIRTQ_DESC_F_NEXT flag set). next field is storing the index of the next descriptor if VIRTQ_DESC_F_NEXT is set.

descriptor

Requirements for device implementation

  • Abstractions from virtio-queue such as DescriptorChain can be used to parse descriptors provided by the device, which represent input or output memory areas for device I/O. A descriptor is essentially an (address, length) pair, which is subsequently used by the device model operation. We do not check the validity of the descriptors, and instead expect any validations to happen when the device implementation is attempting to access the corresponding areas. Early checks can add non-negligible additional costs, and exclusively relying upon them may lead to time-of-check-to-time-of-use race conditions.
  • The device should validate before reading/writing to a buffer that it is device-readable/device-writable.

Design

QueueT is a trait that allows different implementations for a Queue object for single-threaded context and multi-threaded context. The implementations provided in virtio-queue are:

  1. Queue → it is used for the single-threaded context.
  2. QueueSync → it is used for the multi-threaded context, and is simply a wrapper over an Arc<Mutex<Queue>>.

Besides the above abstractions, the virtio-queue crate provides also the following ones:

  • Descriptor → which mostly offers accessors for the members of the Descriptor.
  • DescriptorChain → provides accessors for the DescriptorChain’s members and an Iterator implementation for iterating over the DescriptorChain, there is also an abstraction for iterators over just the device readable or just the device writable descriptors (DescriptorChainRwIter).
  • AvailIter - is a consuming iterator over all available descriptor chain heads in the queue.

Save/Restore Queue

The Queue allows saving the state through the state function which returns a QueueState. Queue objects can be created from a previously saved state by using QueueState::try_from. The VMM should check for errors when restoring a Queue from a previously saved state.

Notification suppression

A big part of the virtio-queue crate consists of the notification suppression support. As already mentioned, the driver can send an available buffer notification to the device when there are new entries in the available ring, and the device can send a used buffer notification to the driver when there are new entries in the used ring. There might be cases when sending a notification each time these scenarios happen is not efficient, for example when the driver is processing the used ring, it would not need to receive another used buffer notification. The mechanism for suppressing the notifications is detailed in the following sections from the specification:

The Queue abstraction is proposing the following sequence of steps for processing new available ring entries:

  1. the device first disables the notifications to make the driver aware it is processing the available ring and does not want interruptions, by using Queue::disable_notification. Notifications are disabled by the device either if VIRTIO_F_EVENT_IDX is not negotiated, and VIRTQ_USED_F_NO_NOTIFY is set in the flags field of the used ring, or if VIRTIO_F_EVENT_IDX is negotiated, and avail_event value is not updated, i.e. it remains set to the latest idx value of the available ring that was already notified by the driver.
  2. the device processes the new entries by using the AvailIter iterator.
  3. the device can enable the notifications now, by using Queue::enable_notification. Notifications are enabled by the device either if VIRTIO_F_EVENT_IDX is not negotiated, and 0 is set in the flags field of the used ring, or if VIRTIO_F_EVENT_IDX is negotiated, and avail_event value is set to the smallest idx value of the available ring that was not already notified by the driver. This way the device makes sure that it won’t miss any notification.

The above steps should be done in a loop to also handle the less likely case where the driver added new entries just before we re-enabled notifications.

On the driver side, the Queue provides the needs_notification method which should be used each time the device adds a new entry to the used ring. Depending on the used_event value and on the last used value (signalled_used), needs_notification returns true to let the device know it should send a notification to the guest.

Assumptions

We assume the users of the Queue implementation won’t attempt to use the queue before checking that the ready bit is set. This can be verified by calling Queue::is_valid which, besides this, is also checking that the three queue parts are valid memory regions. We assume consumers will use AvailIter::go_to_previous_position only in single-threaded contexts. We assume the users will consume the entries from the available ring in the recommended way from the documentation, i.e. device starts processing the available ring entries, disables the notifications, processes the entries, and then re-enables notifications.

License

This project is licensed under either of

Dependencies

~1–1.6MB
~30K SLoC