1 unstable release
0.1.0 | Jul 22, 2022 |
---|
#30 in #sensitive
23KB
461 lines
Join Doe
Join Doe is a tool for replicating database contents between environments while deidentifying sensitive data.
It dumps the source data to an S3 bucket, deidentify it and uploads it to the destination.
Current status
Curerntly the project only works with Redshift.
How to use
Join Doe executes its jobs from a YAML config file.
Example:
source:
connection_uri: $DATABASE_URL
tables:
- name: providers
transform:
- column: identifier
transformer: reverse
- column: first_name
transformer: first-name
- column: last_name
transformer: last-name
- name: orders
transform:
- column: identifier
transformer: reverse
store:
bucket: nw-data-transfer
aws_access_key_id: $AWS_ACCESS_KEY_ID
aws_secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
connection_uri: $TARGET_DATABASE_URL
This config processes two tables from the source database: providers
and orders
. It then modifies a couple of fields using a given transformer, stores it on an S3 bucket and then uploads it to the destination database.
The supported transformers are:
reverse
: reverses the contents of the fieldfirst-name
: replaces the contents of the field by a random first namelast-name
: replaces the contents of the field by a random last name
Dependencies
~20–33MB
~533K SLoC