3 releases
0.0.2 | Nov 10, 2023 |
---|---|
0.0.1 | Nov 9, 2023 |
0.0.0 | Nov 7, 2023 |
#798 in Machine learning
35KB
351 lines
Polars AI 📊
Polars AI represents a pioneering utility featuring a command-line interface (CLI) complemented by a sophisticated crate/library. It empowers you to engage in conversational interactions with your Polars DataFrames, harnessing the capabilities of AI for data analysis. Polars AI seamlessly integrates the formidable prowess of OpenAI's GPT-3.5 Turbo, thereby augmenting and optimizing data exploration and manipulation tasks.
Polars AI allows you to:
- Chat with your Polars DataFrames using plain text queries.
- Perform data analysis tasks such as filtering, aggregating through AI-generated Rust code.
- Visualize data using charts and plots (coming soon).
Table of Contents 📚
- Installation 🚀
- Getting Started 🏁
- Usage 🧑💻
- Examples 💡
- Contributing 🤝
- License 📜
Installation 🚀
Install from source
To use Polars AI, you'll need to follow these installation steps:
-
Install Rust (if not already installed) by following the instructions at Rust Install.
-
Fork the repository on GitHub:
- Click the "Fork" button on the top right of the GitHub repository page.
-
Clone the Polars AI repository to your local machine:
$ git clone https://github.com/yourusername/polars-ai.git
-
Build the project using Rust's package manager, Cargo:
$ cd polars-ai $ cargo build --release
-
Set the OpenAI API key:
$ export OPENAI_API_KEY=sk-
-
Run the CLI:
$ ./target/release/polars-ai help
Install using Cargo
To use Polars AI, you can also install it using Cargo, the Rust package manager:
-
Build the project using Rust's package manager, Cargo:
$ cargo install polars-ai
-
Set the OpenAI API key:
$ export OPENAI_API_KEY=sk-
-
Run the CLI:
$ polars-ai help
Getting Started 🏁
Before you begin, make sure you have a Polars DataFrame that you want to analyze and interact with. Polars AI works with Polars DataFrames, so ensure that you have the necessary data loaded.
Usage 🧑💻
Chatting with Your DataFrames
With Polars AI, you can chat with your DataFrames using plain text queries. Simply enter your question or query when prompted by the CLI. For example:
$ export OPENAI_API_KEY=sk-
$ polars-ai input -f examples/datasets/flights.csv show
📊 DataFrame:
shape: (18, 7)
┌────────────┬───────────┬─────────┬─────────────────┬───────────────┬──────────┬──────────┐
│ DayofMonth ┆ DayOfWeek ┆ Carrier ┆ OriginAirportID ┆ DestAirportID ┆ DepDelay ┆ ArrDelay │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞════════════╪═══════════╪═════════╪═════════════════╪═══════════════╪══════════╪══════════╡
│ 19 ┆ 5 ┆ DL ┆ 11433 ┆ 13303 ┆ -3 ┆ 1 │
│ 19 ┆ 5 ┆ DL ┆ 14869 ┆ 12478 ┆ 0 ┆ -8 │
│ 19 ┆ 5 ┆ DL ┆ 14057 ┆ 14869 ┆ -4 ┆ -15 │
│ 19 ┆ 5 ┆ DL ┆ 15016 ┆ 11433 ┆ 28 ┆ 24 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 19 ┆ 5 ┆ DL ┆ 10397 ┆ 12451 ┆ 71 ┆ null │
│ 19 ┆ 5 ┆ DL ┆ 12451 ┆ 10397 ┆ 75 ┆ null │
│ 19 ┆ 5 ┆ DL ┆ 12953 ┆ 10397 ┆ -1 ┆ null │
│ 19 ┆ 5 ┆ DL ┆ 11433 ┆ 12953 ┆ -3 ┆ null │
└────────────┴───────────┴─────────┴─────────────────┴───────────────┴──────────┴──────────┘
$ polars-ai input -f examples/datasets/flights.csv ask -q 'What is the average of the first column?'
🤖 AI Response:
use polars::prelude::*;
fn analyze_data(dfs: Vec<DataFrame>) -> Result<DataFrame> {
let df = &dfs[0];
let avg_first_column = df
.select(&[col("DayofMonth")])
.expect("Column 'DayofMonth' must exist")
.mean()
.unwrap()
.select(&[col("mean")])
.unwrap();
let top_carriers = df
.groupby(&[col("Carrier")])
.expect("Column 'Carrier' must exist")
.mean()
.unwrap()
.sort(&[col("mean")], false)
.expect("Column 'mean' must exist")
.head(Some(5))
.select(&[col("Carrier")])
.unwrap();
let result_df = df
.join(&top_carriers, &[col("Carrier")], &[col("Carrier")], JoinType::Inner)
.expect("Column 'Carrier' must exist")
.sort(&[col("DayofMonth")], false)
.expect("Column 'DayofMonth' must exist")
.head(Some(5));
let final_result = result_df
.select(&[col("Carrier"), col("DayofMonth")])
.unwrap();
Ok(final_result)
}
let result = analyze_data(dfs);
println!("{}", result);
Now, based on the query above, you can run the Rust code.
Data Analysis Workflow
The generated Rust code follows a structured data analysis workflow:
- Prepare: Preprocess and clean the data if required.
- Process: Manipulate the data for analysis (e.g., grouping, filtering, aggregating).
- Analyze: Conduct the analysis.
- Output: Return results in various formats.
You can modify the generated code to customize your analysis.
Examples 💡
Refer to the examples folder to use Polars AI to analyze your data. Polars AI will generate Rust code to perform eda on the data.
Contributing 🤝
We welcome contributions to Polars AI! If you'd like to contribute to this project, please follow these steps:
-
Fork the repository on GitHub:
- Click the "Fork" button on the top right of the GitHub repository page.
-
Create a new branch for your feature or bug fix:
-
Use the following Git command to create a new branch:
$ git checkout -b feature-or-bugfix-branch
-
-
Make your changes and commit them:
-
Edit the files in your local repository and use the following Git commands to commit your changes:
$ git add . $ git commit -m "Your commit message here"
-
-
Create a pull request with a clear description of your changes:
-
Push your branch to your forked repository on GitHub and then create a pull request from there.
$ git push origin feature-or-bugfix-branch
-
Visit your forked repository on GitHub, and you'll see an option to create a pull request for the branch you just pushed.
-
License 📜
This project is licensed under the MIT License - see the LICENSE file for details.
Dependencies
~21–35MB
~558K SLoC