1 unstable release

0.1.0 Feb 28, 2024

#22 in #iceberg


Used in dashtool

Apache-2.0

22KB
475 lines

Singer Target for Iceberg Tables

This is a Singer target that loads data from Singer streams into Iceberg tables. This reposotory provides mulitple singer targets, one for each iceberg catalog. These are:

Features

  • Creates Iceberg tables automatically if they don't exist
  • Incrementally loads versions into Iceberg tables
  • Converts Singer stream schemas into Iceberg table schemas
  • Validates records against the Singer stream schema
  • Generates metadata about syncs for data governance

Usage

The target ingests singer messages into the Icberg tables and stores the state in the singer-bookmark property.

Sync mode

To run:

target-iceberg-sql --config config.json

Configuration

Example:

{
    "streams": {
      "inventory-orders": { 
        "identifier": "bronze.inventory.orders",
        "replicationMethod": "LOG_BASED"
      },
      "inventory-customers": { 
        "identifier": "bronze.inventory.customers",
        "replicationMethod": "LOG_BASED"
      },
      "inventory-products": { 
        "identifier": "bronze.inventory.products",
        "replicationMethod": "LOG_BASED"
      }
    },
    "bucket": "s3://example-postgres",
    "catalogName": "bronze",
    "catalogUrl": "postgres://postgres:postgres@postgres:5432",
    "awsRegion": "us-east-1",
    "awsAccessKeyId": "AKIAIOSFODNN7EXAMPLE",
    "awsSecretAccessKey": "$AWS_SECRET_ACCESS_KEY",
    "awsEndpoint": "http://localstack:4566",
    "awsAllowHttp": "true"
}

The configuration consists of 3 parts:

  • General parameters
  • Catalog parameters
  • Object store parameters

General parameters

The general parameters apply to each catalog and object store.

Parameter Description
streams A map of streams to replicate. Each stream is a map with the fields: identifier, replicationMethod(optional)
bucket (optional) Object store bucket where the iceberg tables should be stored (optional)

Catalog parameters

Only one set of catalog parameters should be used in the configuration. Choose which catalog you want to use.

SQL catalog

Parameter Description
catalogName The name of the catalog
catalogUrl The connection url of the catalog

Object store parameters

Only one set of object store parameters should be used in the configuration. Choose which object store you want to use.

AWS S3

Parameter Description
awsRegion The region of the bucket
awsAccessKeyId The access key id
awsSecretAccessKey The secret access key
awsEndpoint (optional) The endpoint of the object store
awsAllowHttp (optional) Allow http connections to the object store

Docker containers

Contributing

Feel free to open issues for any feedback or ideas! PRs are welcome.

Dependencies

~56MB
~1M SLoC