Skip to content

Connectors Reference

Connectors let you import data from external sources into Tilde repositories. During import, objects are copied from the source into Tilde's storage, so reads are served locally without depending on the original source.

Overview

Connectors are managed at the organization level and can be attached to one or more repositories. Once attached, you can run import jobs that stream object metadata from the source into the repository.

Lifecycle

  1. Create a connector in your organization with source credentials
  2. Attach the connector to a repository
  3. Import data from the source into the repository
  4. Read imported objects — Tilde proxies reads through the connector

How Imports Work

When you import data:

  • Tilde lists objects from the source (e.g., an S3 prefix)
  • Each object is copied into Tilde's local block storage (up to 10 objects are transferred concurrently)
  • A commit is created in the target repository with all imported entries
  • After import, reads are served directly from Tilde's storage

Reproducibility

Because objects are copied during import, the imported data is always a point-in-time snapshot of the source. Source metadata (connector ID, source path, ETag, and optionally version ID) is recorded on each entry for provenance tracking.

Supported Connectors

S3

Connect to any S3-compatible object store (AWS S3, MinIO, RustFS, etc.).

Configuration

Field Type Required Description
access_key_id string Yes AWS access key ID
secret_access_key string Yes AWS secret access key
region string No AWS region (default: us-east-1)
endpoint string No Custom S3 endpoint URL (for S3-compatible services)
import tilde

org = tilde.organizations.get("my-team")
connector = org.connectors.create(
    name="production-s3",
    type="s3",
    source_uri="s3://my-bucket/datasets/",
    config={
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "region": "us-west-2",
    },
)
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST https://tilde.run/api/v1/organizations/my-team/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-s3",
    "type": "s3",
    "source_uri": "s3://my-bucket/datasets/",
    "config": {
      "access_key_id": "AKIAIOSFODNN7EXAMPLE",
      "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "region": "us-west-2"
    }
  }'

S3-Compatible Services

For S3-compatible services like MinIO or RustFS, provide a custom endpoint:

connector = org.connectors.create(
    name="my-minio",
    type="s3",
    source_uri="s3://my-bucket/",
    config={
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "region": "us-east-1",
        "endpoint": "https://minio.example.com:9000",
    },
)
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST https://tilde.run/api/v1/organizations/my-team/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-minio",
    "type": "s3",
    "source_uri": "s3://my-bucket/",
    "config": {
      "access_key_id": "AKIAIOSFODNN7EXAMPLE",
      "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "region": "us-east-1",
      "endpoint": "https://minio.example.com:9000"
    }
  }'

URI Format

S3 source paths use the s3://bucket/prefix/ format:

s3://my-bucket/datasets/2025/
s3://my-bucket/raw-data/
s3://data-lake/production/

Versioning Support

When use_versioning: true is set on an import job, the S3 connector uses ListObjectVersions instead of ListObjectsV2 and copies the latest version of each object. The source version ID is recorded in the entry's source metadata for provenance.

S3 Versioning Requirement

The source S3 bucket must have versioning enabled for use_versioning to work. If versioning is not enabled on the bucket, the import will proceed but objects will not have version IDs.


GCS

Connect to Google Cloud Storage.

Configuration

Field Type Required Description
credentials_json string Yes Service account JSON key (stringified)
project_id string No GCP project ID
connector = org.connectors.create(
    name="production-gcs",
    type="gcs",
    source_uri="gs://my-bucket/datasets/",
    config={
        "credentials_json": open("service-account-key.json").read(),
        "project_id": "my-project",
    },
)
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST https://tilde.run/api/v1/organizations/my-team/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-gcs",
    "type": "gcs",
    "source_uri": "gs://my-bucket/datasets/",
    "config": {
      "credentials_json": "{\"type\":\"service_account\",\"project_id\":\"my-project\",...}",
      "project_id": "my-project"
    }
  }'

URI Format

GCS source paths use the gs://bucket/prefix/ format:

gs://my-bucket/datasets/2025/
gs://data-warehouse/exports/

Managing Connectors

Attaching to Repositories

A connector must be attached to a repository before it can be used for imports.

import tilde

repo = tilde.repository("my-team/my-data")

# Attach
repo.connectors.attach(connector_id)

# List
for c in repo.connectors.list():
    print(c.name, c.type)

# Detach
repo.connectors.detach(connector_id)
# Attach a connector to a repository
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors \
  -H "Content-Type: application/json" \
  -d '{"connector_id": "connector-uuid"}'

# List connectors attached to a repository
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors

# Detach a connector
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X DELETE https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors/connector-uuid

Deleting a Connector

Deleting a connector is a soft delete. Since imported objects were copied into Tilde's storage, they remain fully readable. The connector is only needed during import, not for subsequent reads.


Running Imports

Start an Import Job

import tilde

repo = tilde.repository("my-team/my-data")
job = repo.imports.create_from_connector(
    connector_id=connector_id,
    destination_path="imported/datasets/",
    source_prefix="datasets/",
    commit_message="Import Q1 datasets",
)
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST "https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import" \
  -H "Content-Type: application/json" \
  -d '{
    "connector_id": "connector-uuid",
    "destination_path": "imported/datasets/",
    "source_prefix": "datasets/",
    "commit_message": "Import Q1 datasets"
  }'

Poll for Completion

Import jobs run asynchronously. Poll the status endpoint until the job completes.

import time

while True:
    job.refresh()
    print(f"Status: {job.status}, Objects: {job.objects_imported}")

    if job.status in ("completed", "failed"):
        break
    time.sleep(2)

if job.status == "completed":
    print(f"Import done! Commit: {job.commit_id}")
elif job.status == "failed":
    print(f"Import failed: {job.error}")
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import/job-uuid

Cross-Repository Imports

You can also import data directly from another Tilde repository without needing a connector. Provide the source organization and repository instead of a connector ID.

import tilde

repo = tilde.repository("my-team/my-data")
job = repo.imports.create_from_repository(
    repo_path="other-team/source-data",
    destination_path="external/",
    source_prefix="datasets/train/",
    commit_message="Import training data from source-data",
)
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST "https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import" \
  -H "Content-Type: application/json" \
  -d '{
    "source_organization": "other-team",
    "source_repository": "source-data",
    "destination_path": "external/",
    "source_prefix": "datasets/train/",
    "commit_message": "Import training data from source-data"
  }'

Access Requirements

You must have read access to the source repository to import from it. The request body must contain exactly one of connector_id or (source_organization + source_repository).

Import Job Fields

Field Type Description
id UUID Job identifier
repository_id UUID Target repository
connector_id UUID Source connector (for connector imports)
source_organization string Source organization name (for cross-repo imports)
source_repository string Source repository name (for cross-repo imports)
source_prefix string Source prefix filter
destination_path string Destination prefix in the repository
commit_message string Commit message for the import
status string pending, running, completed, or failed
objects_imported integer Number of objects imported so far
commit_id string Commit ID (populated on completion)
error string Error message (populated on failure)
created_by UUID User who started the import
created_at timestamp Job creation time
updated_at timestamp Last status update time

Reading Imported Objects

Imported objects are read through the same GET /object endpoint as any other object. Since data is copied during import, reads are served directly from Tilde's storage — no connector access is needed at read time.

Source Metadata

Imported entries include source metadata in their entry record for provenance tracking:

{
  "source_metadata": {
    "connector_id": "...",
    "connector_type": "s3",
    "source_path": "s3://my-bucket/datasets/file.csv",
    "version_id": "abc123",
    "source_etag": "\"def456\"",
    "import_time": "2025-01-15T10:30:00Z",
    "import_job_id": "..."
  }
}

Security

Connector configurations (credentials) are AES-encrypted at rest in the database. The encryption key is configured in the server's auth.encryption.keys config. Connector configs are never returned in API responses — only the connector's id, name, type, and disabled status are exposed.