Connectors Reference¶
Connectors let you import data from external sources into Tilde repositories. During import, objects are copied from the source into Tilde's storage, so reads are served locally without depending on the original source.
Overview¶
Connectors are managed at the organization level and can be attached to one or more repositories. Once attached, you can run import jobs that stream object metadata from the source into the repository.
Lifecycle¶
- Create a connector in your organization with source credentials
- Attach the connector to a repository
- Import data from the source into the repository
- Read imported objects — Tilde proxies reads through the connector
How Imports Work¶
When you import data:
- Tilde lists objects from the source (e.g., an S3 prefix)
- Each object is copied into Tilde's local block storage (up to 10 objects are transferred concurrently)
- A commit is created in the target repository with all imported entries
- After import, reads are served directly from Tilde's storage
Reproducibility¶
Because objects are copied during import, the imported data is always a point-in-time snapshot of the source. Source metadata (connector ID, source path, ETag, and optionally version ID) is recorded on each entry for provenance tracking.
Supported Connectors¶
S3¶
Connect to any S3-compatible object store (AWS S3, MinIO, RustFS, etc.).
Configuration¶
| Field | Type | Required | Description |
|---|---|---|---|
access_key_id |
string | Yes | AWS access key ID |
secret_access_key |
string | Yes | AWS secret access key |
region |
string | No | AWS region (default: us-east-1) |
endpoint |
string | No | Custom S3 endpoint URL (for S3-compatible services) |
import tilde
org = tilde.organizations.get("my-team")
connector = org.connectors.create(
name="production-s3",
type="s3",
source_uri="s3://my-bucket/datasets/",
config={
"access_key_id": "AKIAIOSFODNN7EXAMPLE",
"secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"region": "us-west-2",
},
)
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X POST https://tilde.run/api/v1/organizations/my-team/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "production-s3",
"type": "s3",
"source_uri": "s3://my-bucket/datasets/",
"config": {
"access_key_id": "AKIAIOSFODNN7EXAMPLE",
"secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"region": "us-west-2"
}
}'
S3-Compatible Services¶
For S3-compatible services like MinIO or RustFS, provide a custom endpoint:
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X POST https://tilde.run/api/v1/organizations/my-team/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "my-minio",
"type": "s3",
"source_uri": "s3://my-bucket/",
"config": {
"access_key_id": "AKIAIOSFODNN7EXAMPLE",
"secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"region": "us-east-1",
"endpoint": "https://minio.example.com:9000"
}
}'
URI Format¶
S3 source paths use the s3://bucket/prefix/ format:
Versioning Support¶
When use_versioning: true is set on an import job, the S3 connector uses ListObjectVersions instead of ListObjectsV2 and copies the latest version of each object. The source version ID is recorded in the entry's source metadata for provenance.
S3 Versioning Requirement
The source S3 bucket must have versioning enabled for use_versioning to work. If versioning is not enabled on the bucket, the import will proceed but objects will not have version IDs.
GCS¶
Connect to Google Cloud Storage.
Configuration¶
| Field | Type | Required | Description |
|---|---|---|---|
credentials_json |
string | Yes | Service account JSON key (stringified) |
project_id |
string | No | GCP project ID |
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X POST https://tilde.run/api/v1/organizations/my-team/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "production-gcs",
"type": "gcs",
"source_uri": "gs://my-bucket/datasets/",
"config": {
"credentials_json": "{\"type\":\"service_account\",\"project_id\":\"my-project\",...}",
"project_id": "my-project"
}
}'
URI Format¶
GCS source paths use the gs://bucket/prefix/ format:
Google Drive¶
Connect to a Google account's My Drive or to a Shared Drive, optionally scoped to a sub-folder. Authentication uses Google OAuth, so creating the connector is a browser flow rather than a raw API call.
Setup Flow¶
Google Drive connectors must be created through the Tilde web console:
- Open your organization's Connectors page in the console.
- Click Add Connector and pick Google Drive.
- Click Connect Google Drive to start the OAuth flow. Google will ask you to sign in and grant read-only access to Drive.
- After consent, you are returned to a setup page that lets you pick:
- The drive (My Drive or any Shared Drive the account can see).
- An optional folder inside that drive. Leave empty to import the entire drive.
- Click Create to save the connector.
Tilde stores a long-lived OAuth refresh token on the connector and uses it to list and download files during each import. No password or access key is involved, and you can revoke access at any time from your Google Account permissions page.
OAuth scope
The connector requests the drive.readonly scope. Tilde can read file metadata and content, but cannot create, modify, or delete anything in your Drive.
Source URI¶
Google Drive source paths use the googledrive:// format:
googledrive://my-drive/ # all of My Drive
googledrive://my-drive/<folder_id>/ # a sub-folder of My Drive
googledrive://<shared_drive_id>/ # an entire Shared Drive
googledrive://<shared_drive_id>/<folder_id>/ # a sub-folder of a Shared Drive
You do not need to construct this URI by hand: the setup page captures it for you. The connector also stores a human-readable label (e.g. MyDrive/Sales/Q3) so the UI never shows opaque IDs.
Workspace File Exports¶
Native Google Workspace documents are exported to Office formats during import:
| Drive type | Imported as | PDF fallback |
|---|---|---|
| Google Docs | .docx |
yes |
| Google Sheets | .xlsx |
yes |
| Google Slides | .pptx |
yes |
| Google Drawings | .png |
no |
A Google Doc named Report is imported as Report.docx. If a binary file named Report.docx already exists in the same folder, the import fails fast with a clear collision error so you can rename one of them in Drive before retrying.
If the primary Office export exceeds Google's per-format size limit, Tilde automatically retries the export as .pdf for Docs, Sheets, and Slides. Drawings have no fallback because they are already an image format. Files that fail both exports are skipped with a clear error.
What Gets Skipped¶
The lister silently skips items that are not real Drive files:
- Items in the trash (
trashed = true). - Shortcuts (
vnd.google-apps.shortcut). - Workspace types with no useful export: Forms, Sites, Maps, Jamboards, etc.
File and folder names are normalized to NFC, / and NUL bytes are replaced with _, and trailing dots or whitespace are trimmed.
Managing Connectors¶
Attaching to Repositories¶
A connector must be attached to a repository before it can be used for imports.
# Attach a connector to a repository
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X POST https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors \
-H "Content-Type: application/json" \
-d '{"connector_id": "connector-uuid"}'
# List connectors attached to a repository
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors
# Detach a connector
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X DELETE https://tilde.run/api/v1/organizations/my-team/repositories/my-data/connectors/connector-uuid
Deleting a Connector¶
Deleting a connector is a soft delete. Since imported objects were copied into Tilde's storage, they remain fully readable. The connector is only needed during import, not for subsequent reads.
Running Imports¶
Start an Import Job¶
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X POST "https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import" \
-H "Content-Type: application/json" \
-d '{
"connector_id": "connector-uuid",
"destination_path": "imported/datasets/",
"source_prefix": "datasets/",
"commit_message": "Import Q1 datasets"
}'
Poll for Completion¶
Import jobs run asynchronously. Poll the status endpoint until the job completes.
import time
while True:
job.refresh()
print(f"Status: {job.status}, Objects: {job.objects_imported}")
if job.status in ("completed", "failed"):
break
time.sleep(2)
if job.status == "completed":
print(f"Import done! Commit: {job.commit_id}")
elif job.status == "failed":
print(f"Import failed: {job.error}")
Cross-Repository Imports¶
You can also import data directly from another Tilde repository without needing a connector. Provide the source organization and repository instead of a connector ID.
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
-X POST "https://tilde.run/api/v1/organizations/my-team/repositories/my-data/import" \
-H "Content-Type: application/json" \
-d '{
"source_organization": "other-team",
"source_repository": "source-data",
"destination_path": "external/",
"source_prefix": "datasets/train/",
"commit_message": "Import training data from source-data"
}'
Access Requirements
You must have read access to the source repository to import from it. The request body must contain exactly one of connector_id or (source_organization + source_repository).
Import Job Fields¶
| Field | Type | Description |
|---|---|---|
id |
UUID | Job identifier |
repository_id |
UUID | Target repository |
connector_id |
UUID | Source connector (for connector imports) |
source_organization |
string | Source organization name (for cross-repo imports) |
source_repository |
string | Source repository name (for cross-repo imports) |
source_prefix |
string | Source prefix filter |
destination_path |
string | Destination prefix in the repository |
commit_message |
string | Commit message for the import |
status |
string | pending, running, completed, or failed |
objects_imported |
integer | Number of objects imported so far |
commit_id |
string | Commit ID (populated on completion) |
error |
string | Error message (populated on failure) |
created_by |
UUID | User who started the import |
created_at |
timestamp | Job creation time |
updated_at |
timestamp | Last status update time |
Reading Imported Objects¶
Imported objects are read through the same GET /object endpoint as any other object. Since data is copied during import, reads are served directly from Tilde's storage — no connector access is needed at read time.
Source Metadata¶
Imported entries include source metadata in their entry record for provenance tracking:
{
"source_metadata": {
"connector_id": "...",
"connector_type": "s3",
"source_path": "s3://my-bucket/datasets/file.csv",
"version_id": "abc123",
"source_etag": "\"def456\"",
"import_time": "2025-01-15T10:30:00Z",
"import_job_id": "..."
}
}
Security¶
Connector configurations (credentials) are AES-encrypted at rest in the database. The encryption key is configured in the server's auth.encryption.keys config. Connector configs are never returned in API responses — only the connector's id, name, type, and disabled status are exposed.