3 releases
Uses old Rust 2015
0.0.2 | Sep 26, 2017 |
---|---|
0.0.2-alpha | Sep 13, 2017 |
0.0.1 | Sep 13, 2017 |
#24 in #spdx
175 downloads per month
710KB
549 lines
FOSSlim
FOSSlim stands for Free Open Source Software LIcense Matcher and it matches the text of the OSS license with SPDX id, but user can easily change & update training data with additional EULAs and license text;
It is designed to be modular and to provide many low-level high-speed utilities which libraries written in high-level languages like Ruby & Javascript could benefit; Which means you could take advantage of various models implemented here, but they alone are not enough to provide a response with high-confidence. This task is left for the RubyGem & NPM packages, which are cleaning up a raw-text and combining results from multiple models to increase the confidence of the match result;
It is still under active development, but it will be released as
Rust library ( milestone.1, milestone.3 )RoR gem with example API ( milestone.2 )- LicenseMatcher gem- sample RoR application using the GEM - Fosslim.com
... TBD = release time unknown: priority depends on interests from community 4. NodeJS library with example AWS lambda function, TBD 5. Rust Microservice, TBD 6. commandline tool to scan files, TBD
Models
- NaiveTF - uses simple WordBag model and ranks results by Jaccard similarity
- FingerNgram - splits text into overlapping Ngrams and hashes selected NGrams for fingerprint;
... in near future
- TF/IDF models with Cosine similarity
- Okapi25 model
- Winnowing model
- Simple probabilistic ML models ~ Naive Bayes, HMM, ...?
Usage
use fosslim::index;
use fosslim::document::Document;
use fosslim::naive_tf; // Simple wordbag model with Jaccard similarity
...
let idx_file_path = "data/index.msgpack"; // it is pre-built index from SPDX data, includes ~300 licenses
let mit_txt = r#"
Permission is hereby granted, free of charge, to any person obtaining a copy of this software \
and associated documentation files (the "Software"), to deal in the Software without restriction,\
including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,\
and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,\
subject to the following conditions:\
"#;
let doc1 = Document::new(0, "mit".to_string(), mit_txt.to_string());
// matching document with SPDX label
if let Ok(idx) = index::load(idx_file_path) {
let mdl = naive_tf::from_index(&idx);
mdl::match_document(&doc1);
}
...
check tests
folder for more usage examples;
And yes, you can build your own index with index::build_from_path()
function; you just have to use same file structure
the JSON files in the data/licenses
folder;
Current alternatives
here are some of alternatives you could use already now:
- SPDX lookup - https://github.com/bbqsrc/spdx-lookup-python
- LibrariesIO license normalizer - https://github.com/librariesio/spdx
- Google's license classifier - https://github.com/google/licenseclassifier
- Fossology - https://github.com/fossology/fossology
- LicenseFinder - https://github.com/pivotal/LicenseFinder
Dependencies
~1.1–2.1MB
~43K SLoC