1 unstable release
new 0.2.0 | Apr 2, 2025 |
---|
#346 in HTTP server
1MB
9K
SLoC
Globe-scale Agentic Alignment
Orign makes it simple to train and deploy robust AI agents that can learn from human feedback. It further provides mechanisms for agents to learn interactively and autonomously.
Built on the nebulous runtime, Orign components can be ran on any cloud, and can easily connect across clouds and regions.
Ships as a single binary, performant and lightweight via Rust 🦀
It takes a team to align models, we connect them globally 🌎
Warning
Orign is in alpha, things may break.
Installation
Python
pip install orign
CLI
curl -fsSL -H "Cache-Control: no-cache" https://storage.googleapis.com/orign/releases/install.sh | bash
Usage
Start an orign server
orign serve --docker
Or optionally run on Kubernetes with our helm chart
Replay Buffer
Create a replay buffer which will store the agent experience and launch training jobs.
In this example, once the buffer has 50 examples, it will randomly sample 100 examples and launch a TRL training job on runpod with 1 A100 GPU.
from orign import ReplayBuffer, ContainerRequest
buffer = ReplayBuffer(
name="sql-adapter",
train_every=50,
sample_n=100,
sample_strategy="Random",
train_job=ContainerRequest(
image="huggingface/trl-latest-gpu:latest",
command="trl sft --model_name_or_path $MODEL --dataset_name $DATASET_PATH ...",
platform="runpod",
env={
"MODEL": "Qwen/Qwen2.5-7B-Instruct",
}
accelerators=["1:A100"],
)
)
Orign sets the following env vars in your container when it launches, based on the buffer config:
DATASET_URI
DATASET_PATH
NUM_EPOCHS
For simplicity, Orign also supplies high level framework specific training containers.
from orign import TRL
training_job = TRL(
model="Qwen/Qwen2.5-7B-Instruct",
platform="runpod",
accelerators=["1:H200_SXM"],
)
buffer = ReplayBuffer(
...
train_job=training_job,
)
Send data to the replay buffer
buffer.send(data)
See a list of all replay buffers
orign get buffers
Online LLM
Create an online LLM which is capable of both training and inference.
In this example, the actor will use a vLLM server running on EC2 with 2 H100 GPUs, and the buffer we previously created.
from orign import OnlineLLM, Container
actor = OnlineLLM(
name="sql-actor",
buffer=buffer,
server=Container(
image="vllm/vllm-openai:latest",
command="python3 -m vllm.entrypoints.openai.api_server --model $MODEL ...",
platform="ec2",
env={
"MODEL": "Qwen/Qwen2.5-7B-Instruct",
},
accelerators=["2:H100_SXM"],
)
)
For simplicity, Orign also supplies high level framework specific serving containers.
from orign import VLLM
server = VLLM(
model="Qwen/Qwen2.5-7B-Instruct",
platform="ec2",
accelerators=["2:H100_SXM"],
)
actor = OnlineLLM(
...
server=server,
)
Use the LLM to generate responses.
messages = [
{"role": "user", "content": "Write a SQL query to find all users who joined after January 1, 2023."},
]
response = actor.chat(messages)
print(response)
Send the LLM training examples
messages = [
{"role": "user", "content": "Write a SQL query to find all users who joined after January 1, 2023."},
{"role": "assistant", "content": "sql\nSELECT * FROM users WHERE join_date > '2023-01-01';\n"},
]
actor.learn(messages)
Replay buffers will automatically launch training jobs when they hit the train_every
threshold. However you can launch them manually.
actor.train()
Orign also supplies high level objects for common online LLMs.
from orign import Gemma3
actor = Gemma3(
model="google/gemma3-3b-instruct",
platform="ec2",
accelerators=["1:A100_SXM"],
lora=True,
)
It's also easy to create your own online LLM wrapper.
Human
Connect to a human which is capable of providing feedback to the agent.
In this example, we collect feedback from humans in a slack channel.
from orign import Human
human = Human(
name="sql-adapter-annotator",
medium="slack",
channel="#agent-training",
)
Use the human to provide feedback to the agent.
messages = [
{"role": "user", "content": "Write a SQL query to find all users who joined after January 1, 2023."},
{"role": "assistant", "content": "sql\nSELECT * FROM users WHERE join_date > '2023-01-01';\n"},
]
human.feedback(messages)
Register a callback to run a container when the human provides feedback.
from orign import container
@container(image="python:3.10")
def on_feedback(feedback):
print(feedback)
human.on_feedback(on_feedback)
Verifiers and Autonomous Learning
As a more complex example, use the feedback to train both the agent and a verifier, enabling autonomous learning.
First, lets create a verifier using an online LLM.
from orign import Qwen2_5
verifier = Qwen2_5(
name="sql-adapter-verifier",
model="Qwen/Qwen2.5-7B-Instruct",
platform="ec2",
accelerators=["1:H100_SXM"],
)
Now, lets create a container that will launch when the human provides feedback.
@container(image="agentsea/orign-py:latest")
def on_feedback(feedback):
from orign import ReplayBuffer
# Get the buffers we previously created for our actor and verifier.
actor_buffer = ReplayBuffer.get("sql-adapter-actor")
verifier_buffer = ReplayBuffer.get("sql-adapter-verifier")
# Teach the verifier to judge whether the assistant's response is correct.
verifier_messages = [
{"role": "user", "content": f"Given the conversation {feedback.messages}, please judge whether the assistant's response is correct."},
{"role": "assistant", "content": feedback.correct},
]
verifier_buffer.send(verifier_messages)
# If the assistant's response is correct, train the actor.
if feedback.correct:
actor_buffer.send(feedback.messages)
# Register the callback
human.on_feedback(on_feedback)
Using the previous example, once the verifier is trained, we can use it to train the actor autonomously.
while True:
# implement this function however makes sense for you
task = next_task()
response = actor.chat(task)
# implement this function to format the chat history for the verifier
verifier_messages = get_verifier_messages(task, response)
feedback = verifier.chat(verifier_messages)
if feedback.correct:
actor.learn(feedback.messages)
Roadmap
- MCP support
- Metrics
- More human backends
Contributing
Please open an issue or submit a PR.
Inspiration
- OpenRLHF
- AlignAnything
- TRL
- Nebulous
Dependencies
~211MB
~3.5M SLoC