1 unstable release
new 0.1.0 | Feb 18, 2025 |
---|
#796 in Database interfaces
40KB
799 lines
ragrep 🔍
[!WARNING] WIP big time. This codebase is full of broken glass, sharp edges, and dragons.
A semantic code search tool that uses embeddings to find similar code snippets across your codebase. Use it to search for things like:
- "how do we handle http request errors?"
- "where is our data validation?"
Features
- Semantic code search using embeddings
- Fully local, no API keys or dependencies
- Supports multiple programming languages through tree-sitter
- Fast SQLite-based storage for embeddings and code chunks
- Intelligent code chunking based on AST
Installation
# Not actually live yet - still need to publish it
cargo install ragrep
Building from Source
Prerequisites
- Rust toolchain (1.75.0 or later recommended)
- SQLite 3.x
git clone https://github.com/yourusername/ragrep.git
cd ragrep
cargo build --release
The binary will be available at target/release/ragrep
Usage
[!IMPORTANT] The first time you run ragrep, it will download a model and cache it in is global data directory. This might take a minute and will use about 1.5GB of disk space.
Indexing Your Codebase
Before searching, you need to index your codebase:
# Index the current directory
ragrep index
# Index a specific directory
ragrep index --path /path/to/your/code
Searching Code
# Search for code similar to your query
ragrep "handle http request error"
The search results will show relevant code snippets along with their file locations, formatted in a familiar ripgrep-style output.
Debug Mode
To see similarity scores in the output:
RUST_LOG=debug ragrep "your query"
Supported Languages
- Rust
- Python
- JavaScript
- TypeScript
More languages can be added by including their respective tree-sitter parsers.
How It Works
-
Indexing:
- Scans your codebase for supported files
- Uses tree-sitter to parse code into meaningful chunks
- Generates embeddings for each code chunk
- Stores chunks and embeddings in a SQLite database
-
Searching:
- Converts your search query into an embedding
- Finds code chunks with similar embeddings using vector similarity
- Ranks and displays the most relevant results
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Dependencies
~111MB
~2.5M SLoC