3 stable releases

1.0.2	Jun 15, 2022
1.0.0	Jun 7, 2022

#4 in #correlation

MIT license

470KB
10K SLoC

Series2Graph++

Series2Graph++ (S2G++) is a time series anomaly detection algorithm based on the Series2Graph (S2G) and the DADS algorithms. S2G++ can handle multivariate time series whereas S2G and DADS can cope with only univariate time series. Moreover, S2G++ takes ideas from DADS to run distributedly in a computer cluster. S2G++ is written in Rust and leverages the actix and actix-telepathy libraries.

Quick Start

Requirements

Rust 1.58
openblas
(Docker)

To have openblas available to the Rust build process, do the following on Debian (Linux):

sudo apt install build-essential gfortran libopenblas-base libopenblas-dev gcc

Installation

From source

git pull https://gitlab.hpi.de/akita/s2gpp
cd s2gpp
cargo build

Docker

The base image akita/rust-base must be available to your machine.

git pull https://gitlab.hpi.de/akita/s2gpp
cd s2gpp
docker build s2gpp .

Usage (bin)

Parameters

Pattern:

s2gpp --local-host <IP:Port> --pattern-length <Int> --latent <Int> --query-length <Int> --rate <Int> --threads <Int> --cluster-nodes <Int> --score-output-path <Path> [main --data-path <Path> | sub --mainhost <IP:Port>]

S2G++ expects one of two sub-commands with its specific parameters:

main (The head computer in a cluster)
- data-path (The path to the input time series)
sub (The other computers in a cluster; only necessary in a distributed setting)
- mainhost (The ip-address to the main computer in a cluster)

Before these sub-commands are used, general parameters must be defined:

local-host (The ip-address with port to bind the listener on.)
pattern-length (Size of the sliding window, independent of anomaly length, but should in the best case be larger.)
latent (Size of latent embedding space. This space is the input for the PCA calculation afterwards.)
query-length (Size of the sliding windows used to find anomalies (query subsequences). query-length must be >= pattern-length!)
rate (Number of angles used to extract pattern nodes. A higher value will lead to high precision, but at the cost of increased computation time.)
threads (Number of helper threads started besides the main thread. (min=1))
cluster-nodes (Size of the computer cluster.)
score-output-path (Path the score are written to.)
column-start-idx (How many columns to skip)
column-end-idx (Until which column to use (exclusive). Can also take negative numbers to count from the end.)
self-correction (Whether S2G++ will correct the direction of the time embedding if too few transactions are available)

Input Format

The input format of the time series is expected to be a CSV with header. Each column represents a channel of the timeseries. Sometimes, time series files include also the labels and an index. You can skip columns with the column-start-idx / column-end-idx range pattern. It behave like Python ranges.

Usage (lib)

Cargo.toml

[dependencies]
s2gpp = "1.0.2"

your Rust app

fn some_fn(timeseries: Array2<f32>) -> Result<Array1<f32>, ()> {
  let params = s2gpp::Parameters::default();
  let anomaly_score = s2gpp::s2gpp(params, Some(timeseries))?.unwrap();
  Ok(anomaly_score)
}

Python

We have wrapped the Rust code in a Python package, that can be used without installing Rust.

Installation

PyPI

pip install s2gpp

Build with Docker

make build-docker
pip install wheels/s2gpp-*.whl

Build from Source

make install

Usage

Single Machine

from s2gpp import Series2GraphPP
import pandas as pd

ts = pd.read_csv("data/ts_0.csv").values

model = Series2GraphPP(pattern_length=100)
anomaly_scores = model.fit_predict(ts)

Distributed

from s2gpp import DistributedSeries2GraphPP
from pathlib import Path

# run on one machine
def main_node():
    dataset_path = Path("data/ts_0.csv")
  
    model = DistributedSeries2GraphPP.main(local_host="127.0.0.1:1992", n_cluster_nodes=2, pattern_length=100)
    model.fit_predict(dataset_path)

# run on other machine
def sub_node():
    model = DistributedSeries2GraphPP.sub(local_host="127.0.0.1:1993", mainhost="127.0.0.1:1992", n_cluster_nodes=2, pattern_length=100)
    model.fit_predict()

Cite

Please cite this work, when using it!

@software{Wenig_Series2Graph_2022,
  author = {Wenig, Phillip},
  month = {6},
  title = {{Series2Graph++}},
  version = {1.0.2},
  year = {2022}
}

References

[1] P. Boniol and T. Palpanas, Series2Graph: Graph-based Subsequence Anomaly Detection in Time Series, PVLDB (2020) link

[2] Schneider, J., Wenig, P. & Papenbrock, T. Distributed detection of sequential anomalies in univariate time series. The VLDB Journal 30, 579–602 (2021). link

Dependencies

~84–115MB
~1.5M SLoC