Skip to content

Connectors Reference

Connectors let you import data from external sources into Tilde repositories. During import, objects are copied from the source into Tilde's storage, so reads are served locally without depending on the original source.

Overview

Connectors are managed at the organization level and can be attached to one or more repositories. Once attached, you can run import jobs that stream object metadata from the source into the repository.

Lifecycle

  1. Create a connector in your organization with source credentials
  2. Attach the connector to a repository
  3. Import data from the source into the repository
  4. Read imported objects — Tilde proxies reads through the connector

How Imports Work

When you import data:

  • Tilde lists objects from the source (e.g., an S3 prefix)
  • Each object is copied into Tilde's local block storage (up to 10 objects are transferred concurrently)
  • A commit is created in the target repository with all imported entries
  • After import, reads are served directly from Tilde's storage

Reproducibility

Because objects are copied during import, the imported data is always a point-in-time snapshot of the source. Source metadata (connector ID, source path, ETag, and optionally version ID) is recorded on each entry for provenance tracking.

Supported Connectors

S3

Connect to any S3-compatible object store (AWS S3, MinIO, RustFS, etc.).

Configuration

Field Type Required Description
access_key_id string Yes AWS access key ID
secret_access_key string Yes AWS secret access key
region string No AWS region (default: us-east-1)
endpoint string No Custom S3 endpoint URL (for S3-compatible services)
import tilde

org = tilde.organizations.get("my-team")
connector = org.connectors.create(
    name="production-s3",
    type="s3",
    source_uri="s3://my-bucket/datasets/",
    config={
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "region": "us-west-2",
    },
)
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST https://tilde.run/api/v1/organizations/my-team/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-s3",
    "type": "s3",
    "source_uri": "s3://my-bucket/datasets/",
    "config": {
      "access_key_id": "AKIAIOSFODNN7EXAMPLE",
      "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "region": "us-west-2"
    }
  }'

S3-Compatible Services

For S3-compatible services like MinIO or RustFS, provide a custom endpoint:

connector = org.connectors.create(
    name="my-minio",
    type="s3",
    source_uri="s3://my-bucket/",
    config={
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "region": "us-east-1",
        "endpoint": "https://minio.example.com:9000",
    },
)
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST https://tilde.run/api/v1/organizations/my-team/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-minio",
    "type": "s3",
    "source_uri": "s3://my-bucket/",
    "config": {
      "access_key_id": "AKIAIOSFODNN7EXAMPLE",
      "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "region": "us-east-1",
      "endpoint": "https://minio.example.com:9000"
    }
  }'

URI Format

S3 source paths use the s3://bucket/prefix/ format:

s3://my-bucket/datasets/2025/
s3://my-bucket/raw-data/
s3://data-lake/production/

Versioning Support

When use_versioning: true is set on an import job, the S3 connector uses ListObjectVersions instead of ListObjectsV2 and copies the latest version of each object. The source version ID is recorded in the entry's source metadata for provenance.

S3 Versioning Requirement

The source S3 bucket must have versioning enabled for use_versioning to work. If versioning is not enabled on the bucket, the import will proceed but objects will not have version IDs.


GCS

Connect to Google Cloud Storage.

Configuration

Field Type Required Description
credentials_json string Yes Service account JSON key (stringified)
project_id string No GCP project ID
connector = org.connectors.create(
    name="production-gcs",
    type="gcs",
    source_uri="gs://my-bucket/datasets/",
    config={
        "credentials_json": open("service-account-key.json").read(),
        "project_id": "my-project",
    },
)
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST https://tilde.run/api/v1/organizations/my-team/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-gcs",
    "type": "gcs",
    "source_uri": "gs://my-bucket/datasets/",
    "config": {
      "credentials_json": "{\"type\":\"service_account\",\"project_id\":\"my-project\",...}",
      "project_id": "my-project"
    }
  }'

URI Format

GCS source paths use the gs://bucket/prefix/ format:

gs://my-bucket/datasets/2025/
gs://data-warehouse/exports/

Google Drive

Connect to a Google account's My Drive or to a Shared Drive, optionally scoped to a sub-folder. Authentication uses Google OAuth, so creating the connector is a browser flow rather than a raw API call.

Setup Flow

Google Drive connectors must be created through the Tilde web console:

  1. Open your organization's Connectors page in the console.
  2. Click Add Connector and pick Google Drive.
  3. Click Connect Google Drive to start the OAuth flow. Google will ask you to sign in and grant read-only access to Drive.
  4. After consent, you are returned to a setup page that lets you pick:
    • The drive (My Drive or any Shared Drive the account can see).
    • An optional folder inside that drive. Leave empty to import the entire drive.
  5. Click Create to save the connector.

Tilde stores a long-lived OAuth refresh token on the connector and uses it to list and download files during each import. No password or access key is involved, and you can revoke access at any time from your Google Account permissions page.

OAuth scope

The connector requests the drive.readonly scope. Tilde can read file metadata and content, but cannot create, modify, or delete anything in your Drive.

Source URI

Google Drive source paths use the googledrive:// format:

googledrive://my-drive/                  # all of My Drive
googledrive://my-drive/<folder_id>/      # a sub-folder of My Drive
googledrive://<shared_drive_id>/         # an entire Shared Drive
googledrive://<shared_drive_id>/<folder_id>/  # a sub-folder of a Shared Drive

You do not need to construct this URI by hand: the setup page captures it for you. The connector also stores a human-readable label (e.g. MyDrive/Sales/Q3) so the UI never shows opaque IDs.

Workspace File Exports

Native Google Workspace documents are exported to Office formats during import:

Drive type Imported as PDF fallback
Google Docs .docx yes
Google Sheets .xlsx yes
Google Slides .pptx yes
Google Drawings .png no

A Google Doc named Report is imported as Report.docx. If a binary file named Report.docx already exists in the same folder, the import fails fast with a clear collision error so you can rename one of them in Drive before retrying.

If the primary Office export exceeds Google's per-format size limit, Tilde automatically retries the export as .pdf for Docs, Sheets, and Slides. Drawings have no fallback because they are already an image format. Files that fail both exports are skipped with a clear error.

What Gets Skipped

The lister silently skips items that are not real Drive files:

  • Items in the trash (trashed = true).
  • Shortcuts (vnd.google-apps.shortcut).
  • Workspace types with no useful export: Forms, Sites, Maps, Jamboards, etc.

File and folder names are normalized to NFC, / and NUL bytes are replaced with _, and trailing dots or whitespace are trimmed.


Managing Connectors

Attaching to Repositories

A connector must be attached to a repository before it can be used for imports.

import tilde

repo = tilde.repository("my-team/my-data")

# Attach
repo.connectors.attach(connector_id)

# List
for c in repo.connectors.list():
    print(c.name, c.type)

# Detach
repo.connectors.detach(connector_id)
# Attach a connector to a repository
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors \
  -H "Content-Type: application/json" \
  -d '{"connector_id": "connector-uuid"}'

# List connectors attached to a repository
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors

# Detach a connector
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X DELETE https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors/connector-uuid

Deleting a Connector

Deleting a connector is a soft delete. Since imported objects were copied into Tilde's storage, they remain fully readable. The connector is only needed during import, not for subsequent reads.


Running Imports

Start an Import Job

import tilde

repo = tilde.repository("my-team/my-data")
job = repo.imports.create_from_connector(
    connector_id=connector_id,
    destination_path="imported/datasets/",
    source_prefix="datasets/",
    commit_message="Import Q1 datasets",
)
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST "https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import" \
  -H "Content-Type: application/json" \
  -d '{
    "connector_id": "connector-uuid",
    "destination_path": "imported/datasets/",
    "source_prefix": "datasets/",
    "commit_message": "Import Q1 datasets"
  }'

Poll for Completion

Import jobs run asynchronously. Poll the status endpoint until the job completes.

import time

while True:
    job.refresh()
    print(f"Status: {job.status}, Objects: {job.objects_imported}")

    if job.status in ("completed", "failed"):
        break
    time.sleep(2)

if job.status == "completed":
    print(f"Import done! Commit: {job.commit_id}")
elif job.status == "failed":
    print(f"Import failed: {job.error}")
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import/job-uuid

Cross-Repository Imports

You can also import data directly from another Tilde repository without needing a connector. Provide the source organization and repository instead of a connector ID.

import tilde

repo = tilde.repository("my-team/my-data")
job = repo.imports.create_from_repository(
    repo_path="other-team/source-data",
    destination_path="external/",
    source_prefix="datasets/train/",
    commit_message="Import training data from source-data",
)
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
  -X POST "https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import" \
  -H "Content-Type: application/json" \
  -d '{
    "source_organization": "other-team",
    "source_repository": "source-data",
    "destination_path": "external/",
    "source_prefix": "datasets/train/",
    "commit_message": "Import training data from source-data"
  }'

Access Requirements

You must have read access to the source repository to import from it. The request body must contain exactly one of connector_id or (source_organization + source_repository).

Import Job Fields

Field Type Description
id UUID Job identifier
repository_id UUID Target repository
connector_id UUID Source connector (for connector imports)
source_organization string Source organization name (for cross-repo imports)
source_repository string Source repository name (for cross-repo imports)
source_prefix string Source prefix filter
destination_path string Destination prefix in the repository
commit_message string Commit message for the import
status string pending, running, completed, or failed
objects_imported integer Number of objects imported so far
commit_id string Commit ID (populated on completion)
error string Error message (populated on failure)
created_by UUID User who started the import
created_at timestamp Job creation time
updated_at timestamp Last status update time

Reading Imported Objects

Imported objects are read through the same GET /object endpoint as any other object. Since data is copied during import, reads are served directly from Tilde's storage — no connector access is needed at read time.

Source Metadata

Imported entries include source metadata in their entry record for provenance tracking:

{
  "source_metadata": {
    "connector_id": "...",
    "connector_type": "s3",
    "source_path": "s3://my-bucket/datasets/file.csv",
    "version_id": "abc123",
    "source_etag": "\"def456\"",
    "import_time": "2025-01-15T10:30:00Z",
    "import_job_id": "..."
  }
}

Security

Connector configurations (credentials) are AES-encrypted at rest in the database. The encryption key is configured in the server's auth.encryption.keys config. Connector configs are never returned in API responses — only the connector's id, name, type, and disabled status are exposed.