#rules #data-science #machine-learning

oner_induction

An implementation of the 1R rule induction algorithm

2 releases

0.2.1 Apr 2, 2020
0.2.0 Mar 20, 2020

#960 in Machine learning

MPL-2.0+

14KB
139 lines

Rust

A 1R implementation in Rust

Re-implementing the 1R algorithm described in Holte (1993).

1R learns a rule (IF...THEN...ELSE) based on one attribute (feature) of the database. This gives a baseline performance for comparing with other algorithms.

This crate is a complement to https://crates.io/crates/oner_quantize, a 1R rule induction implementation.

Documentation and examples

License

Copyright 2020 Richard Dallaway

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at https://mozilla.org/MPL/2.0/.


lib.rs:

The 1R (Holt, 1993) rule learning algorithm.

1R is a baseline rule learning algorithm

The algorithm generates a rule for each attribute in a dataset, and then picks the "one rule" that has the best accuracy.

Each rule (hypothesis) is a set of cases: for every value of the attribute, the prediction (the then part) is the most frequent class for examples with that attribute value.

This is a baseline learner for use in comparison against more sophisticated algorithms. A related idea is "0R" (zero rule), which is the most frequent class in the dataset.

Examples

This crate uses ndarray to represent attributes and classes.

use ndarray::prelude::*;
use oner_induction::{Rule, Case, Accuracy, discover};

let examples = array![
   ["sunny", "summer"],
   ["sunny", "summer"],
   ["cloudy", "winter"],
   ["sunny", "winter"]
];

let classes = array![
    "hot",
    "hot",
    "cold",
    "cold"
];

// Discover the best rule, and the column it applies to:
let rule: Option<(usize, Rule<&str, &str>)> =
  discover(&examples.view(), &classes.view());

// Expected accuracy is 100%
let accuracy = Accuracy(1.0);

// The "rule" is a set of cases (conditions, or "IF...THENs"):
let cases = vec![
    Case { attribute_value: "summer", predicted_class: "hot" },
    Case { attribute_value: "winter", predicted_class: "cold" }
];

// Column 1 is the Season (winter or summer)
assert_eq!(rule, Some( (1, Rule { cases, accuracy }) ));

References

Terminology

I'm following the terminology from Holte (1993):

  • Attribute (a.k.a. feature)
  • Value (the value of an attribute or class)
  • Class (classification, prediction)
  • Example (instance)

In generic parameters, A is for attribute and C is for class.

Limitations

This crate assumes numeric data has already been converted to categorical data.

See https://docs.rs/oner_quantize for an implementation of the 1R qualitzation algorithm.

Dependencies

~2MB
~36K SLoC