Syncing & Scheduling
API connectors (SaaS tools, databases, MCP servers) need to sync their data into Dinobase. File connectors (parquet, CSV) skip syncing entirely — DuckDB reads them at query time.
One-time sync
Section titled “One-time sync”# Sync all connectorsdinobase sync
# Sync one connectordinobase sync stripeOutput shows progress per connector:
stripe: synced 4 tables (12,450 rows) hubspot: synced 3 tables (8,320 rows)
Done. 7 tables, 20,770 rows total.Scheduled sync (daemon mode)
Section titled “Scheduled sync (daemon mode)”Run Dinobase as a daemon that syncs on configured intervals:
# Default: check every minute, sync connectors every 1 hourdinobase sync --schedule
# Custom intervaldinobase sync --schedule --interval 30m
# Higher concurrencydinobase sync --schedule --max-workers 20The scheduler:
- Checks which connectors are due for a sync every 60 seconds
- Syncs due connectors concurrently (up to
--max-workersat a time) - Respects per-connector intervals set during
dinobase add - Catches errors per-connector without crashing
- Logs everything to stderr
Per-connector intervals
Section titled “Per-connector intervals”Each connector can have its own sync interval:
dinobase add stripe --api-key ... --sync-interval 15mdinobase add hubspot --api-key ... --sync-interval 1hdinobase add postgres --connection-string ... --sync-interval 6hThe scheduler uses these intervals, falling back to the global --interval default.
Supported formats: 30s, 5m, 1h, 6h, 1d.
Background sync with MCP server
Section titled “Background sync with MCP server”Run sync alongside the MCP server:
dinobase serve --sync --sync-interval 30mThis starts the MCP server and a background sync thread. Agents always query fresh data.
Concurrent syncing
Section titled “Concurrent syncing”Connectors sync in parallel using a thread pool. Each connector gets its own dlt pipeline and database connection to avoid conflicts.
# Up to 20 connectors syncing at oncedinobase sync --max-workers 20Default is 10 concurrent workers. Increase for many connectors; decrease if you hit API rate limits.
What happens during sync
Section titled “What happens during sync”- dlt pipeline runs — fetches data from the upstream API, handles pagination and rate limiting
- Data writes to parquet — stored in
~/.dinobase/as parquet files - Metadata extraction — column descriptions fetched from the upstream API (Stripe OpenAPI, HubSpot Properties API, Postgres catalog)
- Annotations stored — metadata saved to
_dinobase.columnstable - Sync logged — start time, end time, status, table/row counts recorded in
_dinobase.sync_log
Monitoring syncs
Section titled “Monitoring syncs”# See last sync times and row countsdinobase status --prettySync history is stored in the _dinobase.sync_log table. You can query it directly:
dinobase query " SELECT source_name, status, tables_synced, rows_synced, started_at, finished_at, error_message FROM _dinobase.sync_log ORDER BY started_at DESC LIMIT 10" --prettyFreshness thresholds
Section titled “Freshness thresholds”Each connector has a freshness threshold — the maximum age before data is considered stale. The list_connectors MCP tool and dinobase status show freshness for each connector.
Defaults:
| Connector category | Default threshold |
|---|---|
| SaaS APIs (Stripe, HubSpot, etc.) | 1h |
| Databases (Postgres, MySQL, etc.) | 6h |
| File connectors (parquet, CSV) | never stale |
Override per connector:
dinobase add stripe --api-key ... --freshness 30mOr edit config.yaml directly:
sources: stripe: type: stripe credentials: { api_key: sk_... } freshness_threshold: 30m(The config key remains sources: for backwards compatibility.)
Refreshing stale connectors
Section titled “Refreshing stale connectors”Use dinobase refresh to re-sync stale connectors:
dinobase refresh stripe # refresh one connectordinobase refresh --stale # refresh all stale connectorsdinobase refresh --stale --prettyThe refresh MCP tool lets agents trigger re-syncs:
Agent: refresh("stripe")→ Re-syncs stripe, returns new freshness info + row countsLive fetch for single records
Section titled “Live fetch for single records”When data is stale and the agent queries a single record by primary key, Dinobase automatically calls the upstream API instead of returning stale parquet data. This is fully transparent — the agent just writes SQL.
-- If intercom data is stale, this triggers GET /contacts/12345 on the Intercom APISELECT * FROM intercom.contacts WHERE id = '12345'The response includes "_freshness": "live" so the agent knows it got real-time data:
{ "columns": ["id", "name", "email"], "rows": [{"id": "12345", "name": "Alice", "email": "alice@acme.com"}], "_freshness": "live", "_source": "intercom API"}When live fetch triggers:
- Connector data is stale (exceeds freshness threshold)
- Query is a simple
SELECT ... FROM schema.table WHERE id = 'value' - The connector has a YAML config in
sources/configs/
When it does NOT trigger:
- Data is fresh (normal parquet query)
- Query has JOINs, multiple conditions, or aggregations
- Connector has no YAML config (e.g., custom dlt sources)
- API call fails (graceful fallback to parquet)
This covers 55 connectors with YAML configs including Intercom, Chargebee, Linear, Amplitude, and more.
File connectors skip sync
Section titled “File connectors skip sync”File connectors (parquet, CSV) create DuckDB views that read files at query time. They never appear in dinobase sync output:
dinobase add parquet --path ./data/ --name analytics # instant, no syncdinobase sync # skips analytics, only syncs API connectors