Querying Data
Dinobase uses DuckDB as its query engine. All queries use DuckDB SQL syntax, which is PostgreSQL-compatible with extra analytical functions.
Basics
Section titled “Basics”Tables are referenced as schema.table:
SELECT * FROM stripe.customers LIMIT 10From the CLI
Section titled “From the CLI”# JSON output (default, agent-friendly)dinobase query "SELECT * FROM stripe.customers LIMIT 5"
# Human-readable tabledinobase query "SELECT * FROM stripe.customers LIMIT 5" --pretty
# Limit result sizedinobase query "SELECT * FROM hubspot.contacts" --max-rows 1000From MCP
Section titled “From MCP”Agents use the query tool:
{ "tool": "query", "arguments": { "sql": "SELECT * FROM stripe.customers LIMIT 5", "max_rows": 200 }}Cross-source joins
Section titled “Cross-source joins”The core power of Dinobase. Join tables from different sources on shared columns:
Two-source join
Section titled “Two-source join”SELECT s.email, s.name, h.companyFROM stripe.customers sJOIN hubspot.contacts h ON s.email = h.emailThree-source join
Section titled “Three-source join”SELECT s.email, h.company, d.amount, d.dealstageFROM stripe.customers sJOIN hubspot.contacts h ON s.email = h.emailJOIN hubspot.deals d ON h.id = d.contact_idWHERE d.dealstage = 'closedwon'ORDER BY d.amount DESCFinding join keys
Section titled “Finding join keys”Use dinobase describe to find columns that work as join keys. Dinobase annotates likely join columns:
dinobase describe stripe.customers --pretty# Look for: email (marked as "Potential join key across sources")Common join patterns:
| Column | Sources | Notes |
|---|---|---|
email | Most sources | Best cross-source join key |
*_id | Within a source | Foreign keys (e.g., contact_id) |
domain | CRM sources | Company matching |
Aggregations
Section titled “Aggregations”-- Count customers per companySELECT h.company, COUNT(DISTINCT s.email) as customersFROM stripe.customers sJOIN hubspot.contacts h ON s.email = h.emailGROUP BY h.companyORDER BY customers DESC-- Revenue by deal stageSELECT d.dealstage, SUM(d.amount) as total, COUNT(*) as dealsFROM hubspot.deals dGROUP BY d.dealstageORDER BY total DESCFinding unmatched records
Section titled “Finding unmatched records”Use LEFT JOIN to find data in one source but not another:
-- Stripe customers without a HubSpot contactSELECT s.email, s.nameFROM stripe.customers sLEFT JOIN hubspot.contacts h ON s.email = h.emailWHERE h.email IS NULLWorking with timestamps
Section titled “Working with timestamps”Stripe stores timestamps as Unix integers. Use DuckDB’s to_timestamp():
SELECT email, to_timestamp(created) as created_dateFROM stripe.customersWHERE to_timestamp(created) > '2024-01-01'ORDER BY created DESCHubSpot uses ISO 8601 strings, which DuckDB handles natively:
SELECT email, createdateFROM hubspot.contactsWHERE createdate > '2024-01-01'DuckDB-specific features
Section titled “DuckDB-specific features”DuckDB has powerful analytical functions beyond standard SQL:
Window functions
Section titled “Window functions”SELECT email, amount, SUM(amount) OVER (ORDER BY created) as running_totalFROM stripe.chargesList aggregation
Section titled “List aggregation”SELECT company, LIST(email) as all_emails, COUNT(*) as contact_countFROM hubspot.contactsGROUP BY companyRegex matching
Section titled “Regex matching”SELECT email FROM stripe.customersWHERE regexp_matches(email, '.*@gmail\.com$')Reading parquet directly
Section titled “Reading parquet directly”DuckDB can also query parquet files inline:
SELECT * FROM read_parquet('path/to/file.parquet') LIMIT 10Result truncation
Section titled “Result truncation”By default, queries return up to 200 rows. If there are more, the result includes a truncated flag and total_rows count:
{ "rows": [...], "row_count": 200, "total_rows": 1547, "truncated": true, "message": "Showing 200 of 1547 rows. Add LIMIT or a WHERE clause to narrow results."}Use --max-rows (CLI) or the max_rows parameter (MCP) to adjust, up to 10,000.
Mutations
Section titled “Mutations”UPDATE and INSERT statements sent through query are intercepted and routed to the mutation engine. Instead of executing immediately, they return a preview with a mutation_id. See the Mutations guide for details.
dinobase query "UPDATE stripe.customers SET name = 'Acme' WHERE id = 'cus_123'"# Returns preview, not executiondinobase confirm mut_abc123def456# Now it executesError handling
Section titled “Error handling”SQL errors are returned in the result rather than thrown:
{ "error": "Catalog Error: Table with name 'nonexistent' does not exist!"}Check for the error key before processing results.