4 releases

0.1.3 Mar 13, 2024
0.1.2 Dec 8, 2023
0.1.1 Jul 24, 2023
0.1.0 Jun 1, 2023

#916 in Development tools

Download history 1377/week @ 2024-07-21 379/week @ 2024-07-28 469/week @ 2024-08-04 291/week @ 2024-08-11 707/week @ 2024-08-18 629/week @ 2024-08-25 651/week @ 2024-09-01 923/week @ 2024-09-08 2201/week @ 2024-09-15 1115/week @ 2024-09-22 1355/week @ 2024-09-29 1152/week @ 2024-10-06 381/week @ 2024-10-13 301/week @ 2024-10-20 757/week @ 2024-10-27 449/week @ 2024-11-03

1,951 downloads per month
Used in cloudproof

Custom license

98KB
2K SLoC

Rust 1.5K SLoC // 0.0% comments Python 297 SLoC // 0.5% comments

Data Anonymization

Data anonymization is the process of transforming data in such a way that it can no longer be used to identify individuals without the use of additional information. This is often done to protect the privacy of individuals whose data is being collected or processed.

Anonymization techniques can include removing identifying information such as names and addresses, replacing identifying information with pseudonyms, and aggregating data so that individual data points cannot be distinguished. It's important to note that while anonymization can reduce the risk of re-identification, it is not foolproof and must be used in conjunction with other security measures to fully protect personal data.

Features

Cosmian anonymization provides multiple methods:

  • Hashing: transforms data into a fixed-length representation that is difficult to reverse and provides a high level of anonymity. Use anonymization::Hasher to apply the various hash functions.

  • Noise Addition: adds random noise to data in order to preserve privacy. Use anonymization::NoiseGenerator to apply various types of noise distributions to float, integer, and date.

  • Word Masking: hides sensitive words in a text. Use anonymization::WordMasker to mask a list of words.

  • Word Tokenization: removes sensitive words from text by replacing them with tokens. Use anonymization::WordTokenizer to replace a list of words.

  • Word Pattern Masking: replaces a sensitive pattern in text with specific characters or strings. Use anonymization::WordPatternMasker to replace specified pattern regex with a replacement string.

  • Number Aggregation: rounds numbers to a desired power of ten. This method is used to reduce the granularity of data and prevent re-identification of individuals. Use anonymization::NumberAggregator to round float and int values.

  • Date Aggregation: rounds dates based on the specified time unit. This helps to preserve the general time frame of the original data while removing specific details that could potentially identify individuals. Use anonymization::DateAggregator to round date.

  • Number Scaling: scales numerical data by a specified factor. This can be useful for anonymizing data while preserving its relative proportions. Use anonymization::NumberScaler to round float and int values.

Date Format

WARNING: The anonymization functions date input is in RFC3339 string format which is slightly different from ISO format.

ISO format RFC 3339
2023-04-07T12:34:56 2023-04-07T12:34:56Z
2023-04-27T16:23:00+00:00 2023-04-27T16:23:00+00:00
2023-04-27T16:23:00+05:00 2023-04-27T16:23:00+05:00
2023-04-27T16:23:00-05:00 2023-04-27T16:23:00-05:00

Dependencies

~6–14MB
~171K SLoC