3 releases
0.1.2 | Mar 9, 2024 |
---|---|
0.1.1 | Mar 9, 2024 |
0.1.0 | Mar 9, 2024 |
#1976 in Command line utilities
29KB
520 lines
DMSLite
DMSLite is a secure and lightweight command-line tool for document management. It provides efficient document indexing, searching, and AI-based categorization, ensuring fast performance even with large document volumes, all while keeping operations entirely local on your machine for maximum privacy and security.
Usage
If your roots bin folder is in $PATH you can type dmslite
everywhere to:
- Consume Documents: Add documents to a specified folder to process. (E.g. with the command
c
) - Search Documents: Use the CLI to search for documents by content, title, or creation date (fuzzy word similarity search). (E.g. with the command
s
followed by the search phrase) - Open Documents: Open a Document found with the search with its default application right from the cli tool. (E.g. with the command
o
followed by the id found out by a search before) - Delete Documents: Delete a Document found with the search by its id. (E.g. with the command
d
followed by the id found out by a prior search.)
Installation and Setup
cargo install dmslite
Prerequisites
- PostgreSQL database
- Tesseract installed in your local language
- Ollama setup with a local Model.
- pdftoppm (Installed with
sudo apt install poppler-utils
) - xdg-open. To be able to open Docuemnts right from the terminal.
PostgreSQL Database Setup
- CREATE USER dmslite WITH PASSWORD 'dmslite';
- As a psql superuser, create a PostgreSQL databaseand Schema:
psql -U postgres
CREATE DATABASE dmslite OWNER dmslite;
CREATE SCHEMA dmslite;
- Write your Password under
src/settings.rs
in the StringPSQL_PASSWD
- As the dmslite User, create search Indices, main_table and document_content table
CREATE EXTENSION pg_trgm; -- create indices CREATE INDEX idx_content_trgm ON document_content USING gin (content gin_trgm_ops); CREATE INDEX idx_summary_trgm ON document_content USING gin (summary gin_trgm_ops); CREATE INDEX idx_buzzwords_trgm ON document_content USING gin (buzzwords gin_trgm_ops); -- create tables CREATE TABLE dmslite.main_table ( id SERIAL PRIMARY KEY, upload_date DATE, filepath VARCHAR(255), title TEXT ); CREATE TABLE dmslite.document_content ( id SERIAL PRIMARY KEY, -- Other columns in table2 content TEXT, summary TEXT, buzzwords TEXT, -- Add more columns as needed FOREIGN KEY (id) REFERENCES main_table(id) ON DELETE CASCADE );
Ollama Custom Models Setup
Build custom Ollama models:
For Memory restricted machines the gemma:2b Model is recommended (Default).
Otherwise Choose llama2.
Set this inside the Modelfiles as the FROM <model> Command.
```
ollama create doc_buzzword_generator -f doc_buzzword_generator
ollama create doc_summarizer -f doc_summarizer
ollama create doc_title_generator -f doc_title_generator
```
Settings
- Make a folder for consumation of documents.
- Make a folder for indexed storage of documents.
- Write the two absolute folder paths to the strings
CONSUME_PATH
andSTORAGE_PATH
in the filesrc/settings.rs
.
They must be absolute paths starting with/home/<user>/...
- Set the String
TESSERACT_LANG
to your tesseract Language flag. (E.g. "eng" or "deu")
Uninstall/Delete
Postgres
DROP TABLE document_content;
DROP TABLE main_table;
DROP INDEX IF EXISTS idx_content_trgm;
DROP INDEX IF EXISTS idx_summary_trgm;
DROP INDEX IF EXISTS idx_buzzwords_trgm;
DROP FUNCTION fuzzy_search_document_content;
Ollama
ollama rm doc_buzzword_generator
ollama rm doc_summarizer
ollama rm doc_title_generator
Dependencies
~13–27MB
~414K SLoC