Skip to content

Configuration

Dinobase stores everything in ~/.dinobase/ by default:

~/.dinobase/
config.yaml # Source configuration
dinobase.duckdb # DuckDB database (metadata + synced data)

Override with DINOBASE_DIR:

Terminal window
export DINOBASE_DIR=/path/to/custom/dir

The main configuration file. Created by dinobase init, updated by dinobase add.

# Optional: store data in cloud storage instead of locally
storage:
url: "s3://my-bucket/dinobase/"
sources:
stripe:
type: stripe
credentials:
api_key: sk_live_...
sync_interval: 1h
hubspot:
type: hubspot
credentials:
api_key: pat-na1-...
sync_interval: 30m
freshness_threshold: 1h
analytics:
type: parquet
credentials:
path: ./data/events/
format: parquet

Storage (optional):

FieldRequiredDescription
storage.urlNoCloud storage URL (e.g., s3://bucket/dinobase/, gs://bucket/dinobase/, az://container/dinobase/). When set, data is stored in cloud storage instead of locally. Can also be set via DINOBASE_STORAGE_URL env var.

Per source:

FieldRequiredDescription
typeYesSource type (e.g., stripe, postgres, parquet)
credentialsYesSource-specific credentials
sync_intervalNoHow often to sync (e.g., 30m, 1h)
freshness_thresholdNoMax age before data is considered stale (e.g., 1h, 30m). Defaults: 1h for SaaS APIs, 6h for databases. File sources are never stale.

Credential keys vary by source type:

Source typeCredential keys
SaaS APIsapi_key, token, etc.
Databasesconnection_string
File sourcespath, format

You can edit config.yaml directly to:

  • Change credentials
  • Update sync intervals
  • Remove sources
  • Add sources manually

Changes take effect on the next dinobase sync or dinobase serve.

Data is stored in dinobase.duckdb alongside the config file. No additional setup needed.

When storage.url is configured, Dinobase uses an in-memory DuckDB that reads/writes parquet files to cloud storage. Metadata is persisted as parquet files in a _meta/ prefix. See the Cloud Storage Backend guide for setup instructions.

Supported providers: Amazon S3, Google Cloud Storage, Azure Blob Storage, and S3-compatible services (MinIO, Cloudflare R2).

In local mode, dinobase.duckdb contains:

Each source gets its own schema (e.g., stripe, hubspot). Tables within contain synced data or views over files.

Internal tables for tracking sync state and column annotations:

_dinobase.sync_log

ColumnTypeDescription
idINTEGERAuto-increment primary key
source_nameVARCHARSource name
source_typeVARCHARSource type
started_atTIMESTAMPSync start time
finished_atTIMESTAMPSync end time
statusVARCHARrunning, success, or error
tables_syncedINTEGERTables loaded
rows_syncedBIGINTRows loaded
error_messageVARCHARError details (if failed)

_dinobase.tables

ColumnTypeDescription
source_nameVARCHARSource name
schema_nameVARCHARSchema name
table_nameVARCHARTable name
row_countBIGINTRow count at last sync
last_syncTIMESTAMPLast sync time

_dinobase.columns

ColumnTypeDescription
source_nameVARCHARSource name
schema_nameVARCHARSchema name
table_nameVARCHARTable name
column_nameVARCHARColumn name
column_typeVARCHARDuckDB data type
is_nullableBOOLEANWhether nullable
descriptionVARCHARHuman-readable description
noteVARCHARAdditional notes (format, enums, etc.)

You can query these tables directly:

Terminal window
dinobase query "SELECT * FROM _dinobase.sync_log ORDER BY started_at DESC LIMIT 5" --pretty
dinobase query "SELECT * FROM _dinobase.columns WHERE description IS NOT NULL" --pretty