#nlp #polars #dataframe #data-analysis

bin+lib polars-ai

A CLI and a library for interacting with Polars DataFrames using natural language queries and AI

3 releases

0.0.2 Nov 10, 2023
0.0.1 Nov 9, 2023
0.0.0 Nov 7, 2023

#651 in Machine learning

MIT license

35KB
351 lines

Polars AI 📊

Maintenance License made-with-rust Jupyter Notebook Share On Reddit Share On Ycombinator Share On Twitter Share On Facebook Share On Linkedin

Polars AI represents a pioneering utility featuring a command-line interface (CLI) complemented by a sophisticated crate/library. It empowers you to engage in conversational interactions with your Polars DataFrames, harnessing the capabilities of AI for data analysis. Polars AI seamlessly integrates the formidable prowess of OpenAI's GPT-3.5 Turbo, thereby augmenting and optimizing data exploration and manipulation tasks.

Polars AI allows you to:

  1. Chat with your Polars DataFrames using plain text queries.
  2. Perform data analysis tasks such as filtering, aggregating through AI-generated Rust code.
  3. Visualize data using charts and plots (coming soon).

Table of Contents 📚

Installation 🚀

Install from source

To use Polars AI, you'll need to follow these installation steps:

  1. Install Rust (if not already installed) by following the instructions at Rust Install.

  2. Fork the repository on GitHub:

    • Click the "Fork" button on the top right of the GitHub repository page.
  3. Clone the Polars AI repository to your local machine:

    $ git clone https://github.com/yourusername/polars-ai.git
    
  4. Build the project using Rust's package manager, Cargo:

    $ cd polars-ai
    $ cargo build --release
    
  5. Set the OpenAI API key:

    $ export OPENAI_API_KEY=sk-
    
  6. Run the CLI:

    $ ./target/release/polars-ai help
    

Install using Cargo

To use Polars AI, you can also install it using Cargo, the Rust package manager:

  1. Build the project using Rust's package manager, Cargo:

    $ cargo install polars-ai
    
  2. Set the OpenAI API key:

    $ export OPENAI_API_KEY=sk-
    
  3. Run the CLI:

    $ polars-ai help
    

Getting Started 🏁

Before you begin, make sure you have a Polars DataFrame that you want to analyze and interact with. Polars AI works with Polars DataFrames, so ensure that you have the necessary data loaded.

Usage 🧑‍💻

Chatting with Your DataFrames

With Polars AI, you can chat with your DataFrames using plain text queries. Simply enter your question or query when prompted by the CLI. For example:

$ export OPENAI_API_KEY=sk-

$ polars-ai input -f examples/datasets/flights.csv show
📊 DataFrame:
shape: (18, 7)
┌────────────┬───────────┬─────────┬─────────────────┬───────────────┬──────────┬──────────┐
 DayofMonth ┆ DayOfWeek ┆ Carrier ┆ OriginAirportID ┆ DestAirportID ┆ DepDelay ┆ ArrDelay │
 ---        ┆ ---       ┆ ---     ┆ ---             ┆ ---           ┆ ---      ┆ ---      │
 i64        ┆ i64       ┆ str     ┆ i64             ┆ i64           ┆ i64      ┆ i64      │
╞════════════╪═══════════╪═════════╪═════════════════╪═══════════════╪══════════╪══════════╡
 19         ┆ 5         ┆ DL      ┆ 11433           ┆ 13303         ┆ -3       ┆ 1        │
 19         ┆ 5         ┆ DL      ┆ 14869           ┆ 12478         ┆ 0        ┆ -8
 19         ┆ 5         ┆ DL      ┆ 14057           ┆ 14869         ┆ -4 -15
 19         ┆ 5         ┆ DL      ┆ 15016           ┆ 11433         ┆ 28       ┆ 24       │
 …          ┆ …         ┆ …       ┆ …               ┆ …             ┆ …        ┆ …        │
 19         ┆ 5         ┆ DL      ┆ 10397           ┆ 12451         ┆ 71       ┆ null     │
 19         ┆ 5         ┆ DL      ┆ 12451           ┆ 10397         ┆ 75       ┆ null     │
 19         ┆ 5         ┆ DL      ┆ 12953           ┆ 10397         ┆ -1       ┆ null     │
 19         ┆ 5         ┆ DL      ┆ 11433           ┆ 12953         ┆ -3       ┆ null     │
└────────────┴───────────┴─────────┴─────────────────┴───────────────┴──────────┴──────────┘

$ polars-ai input -f examples/datasets/flights.csv ask -q 'What is the average of the first column?'

🤖 AI Response:

use polars::prelude::*;

fn analyze_data(dfs: Vec<DataFrame>) -> Result<DataFrame> {
    let df = &dfs[0];

    let avg_first_column = df
        .select(&[col("DayofMonth")])
        .expect("Column 'DayofMonth' must exist")
        .mean()
        .unwrap()
        .select(&[col("mean")])
        .unwrap();

    let top_carriers = df
        .groupby(&[col("Carrier")])
        .expect("Column 'Carrier' must exist")
        .mean()
        .unwrap()
        .sort(&[col("mean")], false)
        .expect("Column 'mean' must exist")
        .head(Some(5))
        .select(&[col("Carrier")])
        .unwrap();

    let result_df = df
        .join(&top_carriers, &[col("Carrier")], &[col("Carrier")], JoinType::Inner)
        .expect("Column 'Carrier' must exist")
        .sort(&[col("DayofMonth")], false)
        .expect("Column 'DayofMonth' must exist")
        .head(Some(5));

    let final_result = result_df
        .select(&[col("Carrier"), col("DayofMonth")])
        .unwrap();

    Ok(final_result)
}

let result = analyze_data(dfs);
println!("{}", result);

Now, based on the query above, you can run the Rust code.

Data Analysis Workflow

The generated Rust code follows a structured data analysis workflow:

  1. Prepare: Preprocess and clean the data if required.
  2. Process: Manipulate the data for analysis (e.g., grouping, filtering, aggregating).
  3. Analyze: Conduct the analysis.
  4. Output: Return results in various formats.

You can modify the generated code to customize your analysis.

Examples 💡

Refer to the examples folder to use Polars AI to analyze your data. Polars AI will generate Rust code to perform eda on the data.

Contributing 🤝

We welcome contributions to Polars AI! If you'd like to contribute to this project, please follow these steps:

  1. Fork the repository on GitHub:

    • Click the "Fork" button on the top right of the GitHub repository page.
  2. Create a new branch for your feature or bug fix:

    • Use the following Git command to create a new branch:

      $ git checkout -b feature-or-bugfix-branch
      
  3. Make your changes and commit them:

    • Edit the files in your local repository and use the following Git commands to commit your changes:

      $ git add .
      $ git commit -m "Your commit message here"
      
  4. Create a pull request with a clear description of your changes:

    • Push your branch to your forked repository on GitHub and then create a pull request from there.

      $ git push origin feature-or-bugfix-branch
      
    • Visit your forked repository on GitHub, and you'll see an option to create a pull request for the branch you just pushed.

License 📜

This project is licensed under the MIT License - see the LICENSE file for details.

Dependencies

~20–33MB
~556K SLoC