3 releases (breaking)
0.999.99 | Jan 18, 2024 |
---|---|
0.99.9 | Jan 13, 2024 |
0.9.9 | Jan 12, 2024 |
#2020 in Embedded development
59KB
991 lines
Erdnuss Comms
This is the core engine of the Erdnuss RS-485 comms protocol.
Versioning
I plan to get this to v1.0 ASAP. Likely this will be gated on me getting docs finished, and maybe updating a few dep versions.
Until then:
- breaking changes will increment the minor version with another 9, e.g.
0.9.x
->0.99.x
->0.999.x
- non-breaking changes will increment the trivial version with another 9, e.g.
0.x.9
->0.x.99
. This probably won't reset on breaking changes, because that's funnier.
License
MPLv2.0
lib.rs
:
Erdnuss Comms
This is the "netstack" of the erdnuss project. It's intended to be used on an RS-485 bus. Right now it's only really expected to work on bare metal devices at a fixed network speed of 7.812MHz.
This netstack is intended for use on a half-duplex RS-485 bus.
At the moment, only 32 devices on a single bus are supported. This also happens to be the upper limit supported by low cost hardware transcievers.
At the moment, all communications on the bus are either Controller-to-one-Target, or one-target-to-controller. There is no provision yet for Target-to-Target messaging.
Entities
There are two roles in this netstack:
- The Controller (or CON) role, which is responsible for commanding and managing the time slices given to all other devices on the bus
- The Target (or TGT) role, which only responds to commands from the Controller.
This nomenclature matches those used by I3C and other similar bus protocols with a single device "in charge" of driving communications.
At the moment, these roles are expected to be permanently assigned at compile time. There must always be exactly one Controller on any bus.
Message Framing
Messages on the bus are framed by a Line Break, notifying them of an "end of frame" condition. This is intended to allow relatively low active CPU usage, with most nodes in a loop of:
- Start DMA receive, until a Line Break interrupt occurs
- Go off and do whatever
- Once a Line Break occurs, check whether the received message was valid, and addressed to this node. If yes: process it and potentially respond. If no: ignore the frame and return to step 1.
A line break was chosen because it is well supported by the RP2040 hardware UART implementation.
A line break was chosen over 9-bit messages (where the msbit is used as an address/data flag), because the RP2040 doesn't support 9-bit serial, and has no address-match interrupt, which devices like STM32 often have.
A line break was chosen over a "line idle" interrupt, because while the RP2040 DOES have a line idle interrupt, it does not work when using DMA, as the idle interrupt only fires when the line is quiet AND data is present in the receive FIFO, which will not occur when an RX DMA is actively draining bytes from the receive FIFO.
Time Division
As RS-485 is a half-duplex, shared medium, bus; it is necessary to coordinate betwee all senders to avoid message collisions.
This is achieved by having the Controller be "in charge" of a bus. The communication between the Controller and Target is polling-based, and generally looks like this:
- The line is idle, and the Controller decides to send to a specific Target
- The Controller sends a command-address byte with the ID of the controller
- If the Controller has a pending message to that Target, it then sends that payload (zero or one data frames)
- Once the Controller is done sending, it signals "end of frame" with a line break, and begins listening for 1ms, or until a line break occurs, whichever comes first.
- The addressed Target notices it has been addressed, and all other non-addressed Targets go back to listening.
- The addressed Target sends a response-address byte with its own ID
- If the Target has a pending message for the CON, it then sends that payload (zero or one data frames)
- One the Target is done sending, it signals "end of frame" with a line break, and begins listening again
- The Controller hears the line break, processes the received message if any, and goes back to step 1 for the next Target
Automatic logical addressing
All devices are expected to have universally unique 64-bit hardware address, analogous to a MAC-address on ethernet/wifi devices. For RP2040 based nodes, this is typically achieved by using the unique serial number of the QSPI flash chip.
In order to reduce overhead on the bus for addressing, devices are dynamically assigned a 5-bit address (0..32).
When a Target first boots, it does not have a logical address. The Controller will periodically offer unused addresses, and any Targets without an address will random decide whether to claim the address.
As there may be multiple Targets that attempt to claim the address at the same time, the act of being assigned an address takes multiple steps:
- The Controller offers an address, and includes 8 bytes of random data
- A Target with no address randomly decides whether to attempt to claim the address. This random chance is aimed at reducing collisions where two or more nodes attempt to claim the same address.
- If a Target decides to go ahead, it takes the 8 random bytes, and XORs them with its own 8-byte unique hardware ID, and sends a "claim" message back
- If the Controller hears this claim, it takes the received 8 bytes, and XORs them with the original 8 random bytes. If there was not a collision, it should be left with the MAC address of the new Target. The Controller marks this address and unique ID as "pending"
- At a later time, the Controller sends a message to the logical address, containing the MAC address it thinks it heard in step 4, and waits for an acknowledgement.
- If the Target hears the logical address it claimed, AND the unique ID matches its own unique ID, then it sends an acknowledgement, and considers itself as having "joined" the bus, exclusively owning that logical address
- If the controller hears the ACK, it marks that address as fully assigned. If it does not hear an ACK, it marks the address from "pending" to "free".
At the moment, the random chance in step 2 is a 1/8 chance, though this may change in the future.
Controller "steps"
So far, we've described the process of a single CON/TGT communication. This must be carried out for all TGTs on the bus. In general, the CON performs an endless polling loop, consisting of three phases:
- For each known TGT with an assigned logical address:
- Address the TGT, additionally sending it 0 or 1 data frames
- Wait for the TGT to respond.
- If it DOES respond, it will respond with 0 or 1 data frames, and clear the "failure counter".
- If it DOES NOT respond, we increment a "failure counter".
- For each address in the "pending" phase:
- We send the confirmation message (step 5 above)
- Wait for the TGT to respond or a timeout to occur (step 7 above)
- If we have any remaining un-assigned logical addresses:
- Send an "offer message" (step 1 above)
- Wait for the TGT to respond or a timeout to occur (step 4 above)
At the moment, it is up to the application to decide how often to perform a "step". This could be continuously, every N milliseconds, or on some other metric.
More steps/sec means:
- More CPU time spent checking and responding to messages
- Less latency for messages waiting to be transferred from CON to TGT or TGT to CON
- Higher data throughput on the bus
Fewer steps/sec will mean the inverse. In the future, there might be a better way to adaptively poll in a more intelligent manner.
Culling of inactive devices
As all Targets are expected to quickly respond to all queries from the Controller, the Controller uses a "three strikes you're out" rule to avoid wasting bus time on timeouts from unresponsive Targets. If a Target fails to respond three times in a row, it is dropped, and the address is marked as free.
Dependencies
~4.5MB
~82K SLoC