1 unstable release
0.1.0 | Jul 4, 2023 |
---|
#852 in Machine learning
19KB
259 lines
TreeRustler
TreeRustler is a simple implementation of a Decision Tree Classifier using the Rust programming language. This project serves as a beginner's exploration of Rust while building a decision tree classifier. The main modules in this project are the data
module and the tree
module.
Data Module
The data
module provides the Data
struct, which represents the expected data type for the tree implementation. The Data
struct holds the training features required for building the decision tree classifier. This module handles only the necessary data operations for the Decision Tree (i.e. no loading, preprocessing, or splitting yet!).
Tree Module
The tree
module contains the DecisionTreeClassifier
struct. This struct represents the decision tree classifier and provides the following parameters:
max_depth
: Specifies the maximum depth of the decision tree. It controls how deep the tree can grow during training. Setting a smaller value can help prevent overfitting, but too small may result in underfitting.min_samples_split
: Specifies the minimum number of samples required to split an internal node during training. It controls when to stop splitting nodes further. Setting a higher value can prevent overfitting, but too high may result in underfitting.
The DecisionTreeClassifier
struct has the following methods:
fit(x: &Data, y: &Vec<u8>)
: Fits the decision tree classifier to the provided training data. This method trains the classifier using the feature from theData
struct and labels from a different vector.predict_proba(x: &Data) -> Vec<f64>
: Predicts the class probabilities for the provided data using the trained decision tree. It returns a vector of probabilities for each class.
Usage
To use the TreeRustler project, follow these steps:
- Clone the repository:
git clone https://github.com/EduardoPach/treerustler.git
- Navigate to the project directory:
cd treerustler
- Make sure you have Rust installed. If not, install Rust from https://www.rust-lang.org/.
- Load your data and convert your features data to the
Data
struct and your labels to aVec<u8>
. - Create an instance of
DecisionTreeClassifier
from thetree
module, specifying the desiredmax_depth
andmin_samples_split
values. - Call the
fit
method on the classifier instance, passing your training data. - Use the
predict_proba
method to predict class probabilities for new data points.
use treerustler::data::Data;
use treerustler::tree::DecisionTreeClassifier;
fn main() {
// Load your features data and labels
let x: data::Data = data::Data::from_string(&"1 3; 2 3; 3 1; 3 1; 2 3");
let y: Vec<u8> = vec![0, 0, 1, 1, 2];
// Create an instance of DecisionTreeClassifier
let mut classifier = DecisionTreeClassifier::new(max_depth, min_samples_split);
// Fit the classifier to the training data
classifier.fit(&x, &y);
// Predict class probabilities for new data points
let probabilities = classifier.predict_proba(&x);
// Process the predicted probabilities as needed
}
Dependencies
~310KB