Skip to content

Cloud Storage Sources

Cloud storage sources use dlt’s filesystem connector to sync files incrementally. Unlike file sources which read at query time, these copy data locally and support incremental loading.

Terminal window
dinobase add s3 --bucket-url s3://my-bucket/data/ --access-key AKIA... --secret-key ...
OptionEnv varDescription
--bucket-urlS3_BUCKET_URLS3 URL (s3://bucket/prefix/)
--access-keyAWS_ACCESS_KEY_IDAWS access key ID
--secret-keyAWS_SECRET_ACCESS_KEYAWS secret key
Terminal window
dinobase add gcs --bucket-url gs://my-bucket/data/ --credentials-file ./sa.json
OptionEnv varDescription
--bucket-urlGCS_BUCKET_URLGCS URL (gs://bucket/prefix/)
--credentials-fileGOOGLE_APPLICATION_CREDENTIALSService account JSON path
Terminal window
dinobase add azure --container-url az://mycontainer/ --account-name myaccount --account-key ...
OptionEnv varDescription
--container-urlAZURE_STORAGE_URLAzure URL (az://container/)
--account-nameAZURE_STORAGE_ACCOUNT_NAMEStorage account name
--account-keyAZURE_STORAGE_ACCOUNT_KEYStorage account key
Terminal window
dinobase add sftp --url sftp://host/path/ --username user --password ...
OptionEnv varDescription
--urlSFTP_URLSFTP URL (sftp://host/path/)
--usernameSFTP_USERNAMEUsername
--passwordSFTP_PASSWORDPassword

Cloud storage sources require syncing:

Terminal window
dinobase sync # sync all
dinobase sync s3 # sync just S3

dlt handles incremental loading — only new or changed files are downloaded on subsequent syncs.

Use cloud storage sources when:

  • Files are added incrementally (new files appear over time)
  • You want data cached locally for faster queries
  • You need sync scheduling

Use file sources (dinobase add parquet --path s3://...) when:

  • You want zero-copy reads (data stays in cloud)
  • Files don’t change often
  • You want instant setup with no sync step