Connectors Reference¶
Connectors let you import data from external sources into Tilde repositories. During import, objects are copied from the source into Tilde's storage, so reads are served locally without depending on the original source.
Overview¶
Connectors are managed at the organization level and can be attached to one or more repositories. Once attached, you can run import jobs that stream object metadata from the source into the repository.
Lifecycle¶
- Create a connector in your organization with source credentials
- Attach the connector to a repository
- Import data from the source into the repository
- Read imported objects — Tilde proxies reads through the connector
How Imports Work¶
When you import data:
- Tilde lists objects from the source (e.g., an S3 prefix)
- Each object is copied into Tilde's local block storage (up to 10 objects are transferred concurrently)
- A commit is created in the target repository with all imported entries
- After import, reads are served directly from Tilde's storage
Reproducibility¶
Because objects are copied during import, the imported data is always a point-in-time snapshot of the source. Source metadata (connector ID, source path, ETag, and optionally version ID) is recorded on each entry for provenance tracking.
Supported Connectors¶
S3¶
Connect to any S3-compatible object store (AWS S3, MinIO, RustFS, etc.).
Configuration¶
| Field | Type | Required | Description |
|---|---|---|---|
access_key_id |
string | Yes | AWS access key ID |
secret_access_key |
string | Yes | AWS secret access key |
region |
string | No | AWS region (default: us-east-1) |
endpoint |
string | No | Custom S3 endpoint URL (for S3-compatible services) |
import tilde
org = tilde.organizations.get("my-team")
connector = org.connectors.create(
name="production-s3",
type="s3",
source_uri="s3://my-bucket/datasets/",
config={
"access_key_id": "AKIAIOSFODNN7EXAMPLE",
"secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"region": "us-west-2",
},
)
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X POST https://tilde.run/api/v1/organizations/my-team/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "production-s3",
"type": "s3",
"source_uri": "s3://my-bucket/datasets/",
"config": {
"access_key_id": "AKIAIOSFODNN7EXAMPLE",
"secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"region": "us-west-2"
}
}'
S3-Compatible Services¶
For S3-compatible services like MinIO or RustFS, provide a custom endpoint:
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X POST https://tilde.run/api/v1/organizations/my-team/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "my-minio",
"type": "s3",
"source_uri": "s3://my-bucket/",
"config": {
"access_key_id": "AKIAIOSFODNN7EXAMPLE",
"secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"region": "us-east-1",
"endpoint": "https://minio.example.com:9000"
}
}'
URI Format¶
S3 source paths use the s3://bucket/prefix/ format:
Versioning Support¶
When use_versioning: true is set on an import job, the S3 connector uses ListObjectVersions instead of ListObjectsV2 and copies the latest version of each object. The source version ID is recorded in the entry's source metadata for provenance.
S3 Versioning Requirement
The source S3 bucket must have versioning enabled for use_versioning to work. If versioning is not enabled on the bucket, the import will proceed but objects will not have version IDs.
GCS¶
Connect to Google Cloud Storage.
Configuration¶
| Field | Type | Required | Description |
|---|---|---|---|
credentials_json |
string | Yes | Service account JSON key (stringified) |
project_id |
string | No | GCP project ID |
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X POST https://tilde.run/api/v1/organizations/my-team/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "production-gcs",
"type": "gcs",
"source_uri": "gs://my-bucket/datasets/",
"config": {
"credentials_json": "{\"type\":\"service_account\",\"project_id\":\"my-project\",...}",
"project_id": "my-project"
}
}'
URI Format¶
GCS source paths use the gs://bucket/prefix/ format:
Managing Connectors¶
Attaching to Repositories¶
A connector must be attached to a repository before it can be used for imports.
# Attach a connector to a repository
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X POST https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors \
-H "Content-Type: application/json" \
-d '{"connector_id": "connector-uuid"}'
# List connectors attached to a repository
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors
# Detach a connector
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X DELETE https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors/connector-uuid
Deleting a Connector¶
Deleting a connector is a soft delete. Since imported objects were copied into Tilde's storage, they remain fully readable. The connector is only needed during import, not for subsequent reads.
Running Imports¶
Start an Import Job¶
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X POST "https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import" \
-H "Content-Type: application/json" \
-d '{
"connector_id": "connector-uuid",
"destination_path": "imported/datasets/",
"source_prefix": "datasets/",
"commit_message": "Import Q1 datasets"
}'
Poll for Completion¶
Import jobs run asynchronously. Poll the status endpoint until the job completes.
import time
while True:
job.refresh()
print(f"Status: {job.status}, Objects: {job.objects_imported}")
if job.status in ("completed", "failed"):
break
time.sleep(2)
if job.status == "completed":
print(f"Import done! Commit: {job.commit_id}")
elif job.status == "failed":
print(f"Import failed: {job.error}")
Cross-Repository Imports¶
You can also import data directly from another Tilde repository without needing a connector. Provide the source organization and repository instead of a connector ID.
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X POST "https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import" \
-H "Content-Type: application/json" \
-d '{
"source_organization": "other-team",
"source_repository": "source-data",
"destination_path": "external/",
"source_prefix": "datasets/train/",
"commit_message": "Import training data from source-data"
}'
Access Requirements
You must have read access to the source repository to import from it. The request body must contain exactly one of connector_id or (source_organization + source_repository).
Import Job Fields¶
| Field | Type | Description |
|---|---|---|
id |
UUID | Job identifier |
repository_id |
UUID | Target repository |
connector_id |
UUID | Source connector (for connector imports) |
source_organization |
string | Source organization name (for cross-repo imports) |
source_repository |
string | Source repository name (for cross-repo imports) |
source_prefix |
string | Source prefix filter |
destination_path |
string | Destination prefix in the repository |
commit_message |
string | Commit message for the import |
status |
string | pending, running, completed, or failed |
objects_imported |
integer | Number of objects imported so far |
commit_id |
string | Commit ID (populated on completion) |
error |
string | Error message (populated on failure) |
created_by |
UUID | User who started the import |
created_at |
timestamp | Job creation time |
updated_at |
timestamp | Last status update time |
Reading Imported Objects¶
Imported objects are read through the same GET /object endpoint as any other object. Since data is copied during import, reads are served directly from Tilde's storage — no connector access is needed at read time.
Source Metadata¶
Imported entries include source metadata in their entry record for provenance tracking:
{
"source_metadata": {
"connector_id": "...",
"connector_type": "s3",
"source_path": "s3://my-bucket/datasets/file.csv",
"version_id": "abc123",
"source_etag": "\"def456\"",
"import_time": "2025-01-15T10:30:00Z",
"import_job_id": "..."
}
}
Security¶
Connector configurations (credentials) are AES-encrypted at rest in the database. The encryption key is configured in the server's auth.encryption.keys config. Connector configs are never returned in API responses — only the connector's id, name, type, and disabled status are exposed.