Connectors Reference¶

Connectors let you import data from external sources into Tilde repositories. During import, objects are copied from the source into Tilde's storage, so reads are served locally without depending on the original source.

Overview¶

Connectors are managed at the organization level and can be attached to one or more repositories. Once attached, you can run import jobs that stream object metadata from the source into the repository.

Lifecycle¶

Create a connector in your organization with source credentials
Attach the connector to a repository
Import data from the source into the repository
Read imported objects — Tilde proxies reads through the connector

How Imports Work¶

When you import data:

Tilde lists objects from the source (e.g., an S3 prefix)
Each object is copied into Tilde's local block storage (up to 10 objects are transferred concurrently)
A commit is created in the target repository with all imported entries
After import, reads are served directly from Tilde's storage

Reproducibility¶

Because objects are copied during import, the imported data is always a point-in-time snapshot of the source. Source metadata (connector ID, source path, ETag, and optionally version ID) is recorded on each entry for provenance tracking.

Supported Connectors¶

S3¶

Connect to any S3-compatible object store (AWS S3, MinIO, RustFS, etc.).

Configuration¶

Field	Type	Required	Description
`access_key_id`	string	Yes	AWS access key ID
`secret_access_key`	string	Yes	AWS secret access key
`region`	string	No	AWS region (default: `us-east-1`)
`endpoint`	string	No	Custom S3 endpoint URL (for S3-compatible services)

Pythoncurl

import tilde

org = tilde.organizations.get("my-team")
connector = org.connectors.create(
    name="production-s3",
    type="s3",
    source_uri="s3://my-bucket/datasets/",
    config={
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "region": "us-west-2",
    },
)

curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST https://tilde.run/api/v1/organizations/my-team/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-s3",
    "type": "s3",
    "source_uri": "s3://my-bucket/datasets/",
    "config": {
      "access_key_id": "AKIAIOSFODNN7EXAMPLE",
      "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "region": "us-west-2"
    }
  }'

S3-Compatible Services¶

For S3-compatible services like MinIO or RustFS, provide a custom endpoint:

Pythoncurl

connector = org.connectors.create(
    name="my-minio",
    type="s3",
    source_uri="s3://my-bucket/",
    config={
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "region": "us-east-1",
        "endpoint": "https://minio.example.com:9000",
    },
)

curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST https://tilde.run/api/v1/organizations/my-team/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-minio",
    "type": "s3",
    "source_uri": "s3://my-bucket/",
    "config": {
      "access_key_id": "AKIAIOSFODNN7EXAMPLE",
      "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "region": "us-east-1",
      "endpoint": "https://minio.example.com:9000"
    }
  }'

URI Format¶

S3 source paths use the s3://bucket/prefix/ format:

s3://my-bucket/datasets/2025/
s3://my-bucket/raw-data/
s3://data-lake/production/

Versioning Support¶

When use_versioning: true is set on an import job, the S3 connector uses ListObjectVersions instead of ListObjectsV2 and copies the latest version of each object. The source version ID is recorded in the entry's source metadata for provenance.

S3 Versioning Requirement

The source S3 bucket must have versioning enabled for use_versioning to work. If versioning is not enabled on the bucket, the import will proceed but objects will not have version IDs.

GCS¶

Connect to Google Cloud Storage.

Configuration¶

Field	Type	Required	Description
`credentials_json`	string	Yes	Service account JSON key (stringified)
`project_id`	string	No	GCP project ID

Pythoncurl

connector = org.connectors.create(
    name="production-gcs",
    type="gcs",
    source_uri="gs://my-bucket/datasets/",
    config={
        "credentials_json": open("service-account-key.json").read(),
        "project_id": "my-project",
    },
)

curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST https://tilde.run/api/v1/organizations/my-team/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-gcs",
    "type": "gcs",
    "source_uri": "gs://my-bucket/datasets/",
    "config": {
      "credentials_json": "{\"type\":\"service_account\",\"project_id\":\"my-project\",...}",
      "project_id": "my-project"
    }
  }'

URI Format¶

GCS source paths use the gs://bucket/prefix/ format:

gs://my-bucket/datasets/2025/
gs://data-warehouse/exports/

Managing Connectors¶

Attaching to Repositories¶

A connector must be attached to a repository before it can be used for imports.

Pythoncurl

import tilde

repo = tilde.repository("my-team/my-data")

# Attach
repo.connectors.attach(connector_id)

# List
for c in repo.connectors.list():
    print(c.name, c.type)

# Detach
repo.connectors.detach(connector_id)

# Attach a connector to a repository
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors \
  -H "Content-Type: application/json" \
  -d '{"connector_id": "connector-uuid"}'

# List connectors attached to a repository
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors

# Detach a connector
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X DELETE https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors/connector-uuid

Deleting a Connector¶

Deleting a connector is a soft delete. Since imported objects were copied into Tilde's storage, they remain fully readable. The connector is only needed during import, not for subsequent reads.

Running Imports¶

Start an Import Job¶

Pythoncurl

import tilde

repo = tilde.repository("my-team/my-data")
job = repo.imports.create_from_connector(
    connector_id=connector_id,
    destination_path="imported/datasets/",
    source_prefix="datasets/",
    commit_message="Import Q1 datasets",
)

curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST "https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import" \
  -H "Content-Type: application/json" \
  -d '{
    "connector_id": "connector-uuid",
    "destination_path": "imported/datasets/",
    "source_prefix": "datasets/",
    "commit_message": "Import Q1 datasets"
  }'

Poll for Completion¶

Import jobs run asynchronously. Poll the status endpoint until the job completes.

Pythoncurl

import time

while True:
    job.refresh()
    print(f"Status: {job.status}, Objects: {job.objects_imported}")

    if job.status in ("completed", "failed"):
        break
    time.sleep(2)

if job.status == "completed":
    print(f"Import done! Commit: {job.commit_id}")
elif job.status == "failed":
    print(f"Import failed: {job.error}")

curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import/job-uuid

Cross-Repository Imports¶

You can also import data directly from another Tilde repository without needing a connector. Provide the source organization and repository instead of a connector ID.

Pythoncurl

import tilde

repo = tilde.repository("my-team/my-data")
job = repo.imports.create_from_repository(
    repo_path="other-team/source-data",
    destination_path="external/",
    source_prefix="datasets/train/",
    commit_message="Import training data from source-data",
)

curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST "https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import" \
  -H "Content-Type: application/json" \
  -d '{
    "source_organization": "other-team",
    "source_repository": "source-data",
    "destination_path": "external/",
    "source_prefix": "datasets/train/",
    "commit_message": "Import training data from source-data"
  }'

Access Requirements

You must have read access to the source repository to import from it. The request body must contain exactly one of connector_id or (source_organization + source_repository).

Import Job Fields¶

Field	Type	Description
`id`	UUID	Job identifier
`repository_id`	UUID	Target repository
`connector_id`	UUID	Source connector (for connector imports)
`source_organization`	string	Source organization name (for cross-repo imports)
`source_repository`	string	Source repository name (for cross-repo imports)
`source_prefix`	string	Source prefix filter
`destination_path`	string	Destination prefix in the repository
`commit_message`	string	Commit message for the import
`status`	string	`pending`, `running`, `completed`, or `failed`
`objects_imported`	integer	Number of objects imported so far
`commit_id`	string	Commit ID (populated on completion)
`error`	string	Error message (populated on failure)
`created_by`	UUID	User who started the import
`created_at`	timestamp	Job creation time
`updated_at`	timestamp	Last status update time

Reading Imported Objects¶

Imported objects are read through the same GET /object endpoint as any other object. Since data is copied during import, reads are served directly from Tilde's storage — no connector access is needed at read time.

Source Metadata¶

Imported entries include source metadata in their entry record for provenance tracking:

{
  "source_metadata": {
    "connector_id": "...",
    "connector_type": "s3",
    "source_path": "s3://my-bucket/datasets/file.csv",
    "version_id": "abc123",
    "source_etag": "\"def456\"",
    "import_time": "2025-01-15T10:30:00Z",
    "import_job_id": "..."
  }
}

Security¶

Connector configurations (credentials) are AES-encrypted at rest in the database. The encryption key is configured in the server's auth.encryption.keys config. Connector configs are never returned in API responses — only the connector's id, name, type, and disabled status are exposed.