13 releases

0.1.12 Dec 12, 2023
0.1.11 Nov 15, 2023

#1258 in Network programming

26 downloads per month
Used in 3 crates (2 directly)

ISC license

78KB
1.5K SLoC

an obfuscated UDP transport for sosistab2

sosistab2-obfsudp is an obfuscated, unreliable, connection-oriented protocol built on top of UDP, intended to be used as a backend "pipe" for sosistab2.

It allows servers to listen for pipes, and clients to connect to servers and establish pipes, with a TCP-like API, but the pipes themselves carry unreliable datagrams up to 64 KB in size.

The obfuscation is intended to achieve:

  • Resistance to passive analysis: given a packet trace, it's difficult to tell which packets are obfsudp packets.
  • Resistance to active probing: given a host:port, as well as observation of all packets to and from it, it's difficult to devise a test that will confirm whether or not it is hosting an obfsudp server.
  • Resistance to packet manipulation and injection: both of the above hold true even if attackers can arbitrarily corrupt and inject packets of their own.

Specification

Encryption format

All UDP packets sent by obfsudp are encrypted using ChaCha20-Poly1305 in a way that looks uniformly random and hides the original packet length.

An encrypted packet encrypted to the key k looks like this:

  • Encrypted with ChaCha20-Poly1305 with key k and nonce n:
    • 1 byte: padding length padlen
    • padlen bytes: arbitrary padding bytes
    • variable length: message
  • 12 bytes: random nonce n, separately chosen for every packet

We notate encrypting a message m with key k and a random nonce as seal(k, m), and decryption as open(k, m)

Initial handshake

Initial handshake frames are symmetrically encrypted by keys derived from the server's X25519 public key. The intention here is that only people who know the server's public key server_pk (i.e. not ISP snoopers) can derive these keys:

  • hs_c2s: the client-to-server handshake key, derived as hs_c2s = derive_key("sosistab-2-c2s", server_pk), where derive_key is the key derivation function defined by BLAKE3.
  • hs_s2c: the server-to-server handshake key, derived as hs_s2c = derive_key("sosistab-2-s2c", server_pk).

Handshake frames are stdcode-encoded and encrypted representations of this Rust enum:

pub enum HandshakeFrame {
    ClientHello {
        long_pk: x25519_dalek::PublicKey,
        eph_pk: x25519_dalek::PublicKey,
        version: u64,
        timestamp: u64,
    },
    ServerHello {
        long_pk: x25519_dalek::PublicKey,
        eph_pk: x25519_dalek::PublicKey,
        resume_token: Bytes,
        client_commitment: [u8; 32],
    },
    Finalize {
        resume_token: Bytes,
        metadata: String,
    },
}

There are three phases to the handshake that establishes an obfsudp connection: client hello, server hello, and finalize.

Client hello

The client obtains two X25519 keypairs:

  • long_sk, long_pk: a "long-term" keypair, used to uniquely identify the client. May be randomly generated if the client does not need to authenticate through this handshake.
  • eph_sk, eph_pk: an ephemeral keypair. Must be freshly randomly generated.

then sends seal(hs_c2s, client_hello) to the server, where client_hello is filled with the keypairs, version = 4 and timestamp being the current Unix timestamp in seconds.

Server hello

The server responds with a ServerHello containing:

  • long_pk: its long term, well-known X25519 public key
  • eph_pk: its ephemeral X25519 public key
  • resume_token: a "cookie" that allows the server to reconstruct all the information it needs to initialize the session. Currently, that is the following data, encrypted in such a way that only the server itself and decrypt it:
    • sess_key: symmetric key derived from triple-ECDH between the client keys and the server keys
    • timestamp: current timestamp as Unix seconds
    • version: set to 4
  • client_commitment: blake3(client_hello)

The session key is derived as sess_key = blake3_keyed(key = blake3(metadata), shared_secret), where metadata is the metadata field in the ClientHello and shared_secret is the shared-secret derived from the triple-ECDH key exchange.

At this point, the server does not save information into any sort of table or similar. It throws away all the values calculated and continues processing more incoming packets.

The server must take care to reject replayed ClientHellos. This is done by:

  • Remembering the last 60 seconds of ClientHellos, and rejecting any duplicates
  • Rejecting any ClientHello with timestamp earlier than or later than 30 seconds from the present

Finalize

The client receives the ServerHello and verifies that:

  • long_pk is the expected value
  • client_commitment matches its own, computed as blake3(client_hello)

It then calculates the session key sess_key using triple-ECDH, and sends the Finalize message containing

  • resume_token: the resume_token from the ServerHello
  • metadata: arbitrary metadata related to this connection. This is typically used to indicate which higher-level sosistab2 Multiplex this pipe belongs to.

At this point, both client and server have agreed on the same sess_key. Both now derive:

  • up_key = blake3_keyed(key = "upload--------------------------", sess_key): the upload key
  • dn_key = blake3_keyed(key = "download------------------------", sess_key): the download key

"Steady state" frames

Once the session is established, both sides have the other's host:port, as well as an upload symmetric key and download symmetric key. They now send frames of the format (u64, SessionFrame), stdcode-serialized and encrypted with the right key (e.g. an upload message m will be sent as seal(up_key, m)):

pub enum SessionFrame {
    Data {
        seqno: u64,
        body: Bytes,
    },
    Parity {
        data_frame_first: u64,
        data_count: u8,
        parity_count: u8,
        parity_index: u8,
        pad_size: u16,
        body: Bytes,
    },
    Acks {
        acks: Vec<(u64, u32)>,
        naks: Vec<u64>,
    },
}

The u64 attached to the SessionFrame increases by 1 for every sent packet, and the receiver must use this to implement replay attack prevention.

We discuss the three kinds of frames separately:

Data frames

Whenever a datagram needs to be sent, it's fragmented and encoded into one or more Data frames. This is because we must support datagrams of up to 64 KiB, which is much larger than the typical MTU on Internet links.

Every datagram is split into fragments of the format:

  • 1 byte: which fragment is this
  • 1 byte: total number of fragments
  • up to 1340 bytes: the content of the fragment

For example, a datagram of 12345 bytes will be split into 10 fragments, with the first fragment containing the header 00 19 and the first 1340 bytes of the datagram, the second fragment containing the header 09 19 and the next 1340 bytes, and so on. These fragments will then individually be encoded as bodies of Data frames.

Parity frames

Parity frames are use for forward error correction (FEC), based on 8-bit Reed-Solomon. The important thing to note here is that the sender of data does not pick which Reed-Solomon encoding to use beforehand (say, expand 5 data frames to 5 data + 3 parity). Instead, it only needs to pick the encoding once it decides to send parity frames, and at that point dynamically change the encoding.

This is reflected in the format of the parity frames. Each parity frame essentially "claims" certain data frames as part of the Reed-Solomon encoded group:

  • data_frame_first: The sequence number of the first data frame included in the parity calculation.
  • data_count: The number of data frames included in the calculation.
  • parity_count: The total number of parity frames for the group.
  • parity_index: This parity frame's index. (e.g. 0 means this is the first parity frame)
  • pad_size: The uniform size that all of the frames must be padded to, for Reed-Solomon decoding (note that RS requires every data and parity packet to be the same size)
  • body: The parity data itself.

Ack frames

Because obfsudp is an unreliable datagram transport, acknowledgements are not used for retransmission or congestion control. Instead, they are simply used to help the other end measure connection quality. The format of an ack frame is as follows:

  • acks: a vector of acknowledged seqnos that were received by this sender, associated with the number of milliseconds between when that seqno was received and when the ack is sent. Including the latter number allows the packet receivers to freely delay and batch acks without affecting the sender's ping measurements. = naks: a vector of seqnos that the sender believes are lost forever, calculated by heuristics such as the number of subsequent seqnos confirmed to arrive.

Dependencies

~25–65MB
~1M SLoC