DuckVizBeta
Reference

Supported File Formats

All file formats that DuckViz can ingest and analyze.

DuckViz ingests structured and semi-structured files. Every parser runs in the browser (DuckDB-WASM, Papaparse, SheetJS, fast-xml-parser, or the Rust→WASM log parser) — file bytes never leave the tab.

Tabular data

FormatExtensionsEngineNotes
CSV.csvDuckDB + PapaparseAuto-detects delimiter (comma, tab, semicolon, pipe). Handles quoted fields, BOM, newlines in values.
TSV.tsvDuckDB + PapaparseTab-separated.
Excel.xlsx, .xlsSheetJSReads the first sheet by default. Column types inferred from cell formats.
JSON.jsonBuilt-inArray of objects, or single root object. Nested objects flattened to dot-paths.
JSONL / NDJSON.jsonl, .ndjsonBuilt-inOne JSON object per line — efficient for large files.
XML.xmlfast-xml-parserWindows Event exports + generic XML. Single-quote attributes supported.
Parquet.parquetDuckDB (native)Zero-copy ingest. Cheapest format per MB. Supported end-to-end since CLI 0.2.0.

Log files

Detection is local-first: the 47-format catalog in @duckviz/parser@0.2.0 is matched before the LLM is ever called. Once a format is chosen, the Rust→WASM parser does the actual parsing.

FamilyExtensionsExamples
Syslog.log, .syslogRFC 3164, RFC 5424, systemd journal
Apache / Nginx.log, .access, .clfCommon Log Format, Combined, Nginx default
Windows Event.evtx (XML export), .xmlSysmon, Security, PowerShell, DNS Server
JSON logs.json, .jsonl, .ndjsonCloudflare, AWS CloudTrail, structured app logs
Delimited custom.log, .tsv, .txtTab / pipe / custom delimiter
Regex custom.log, .txtAnything you save at /settings/log-formats

When the catalog misses, a 10-line sample goes to /api/detect-log-format (the only log-content case where data leaves the browser). The LLM's answer is auto-saved as a custom format on your account, so next time a similar file drops, detection is local-only. See Custom Log Formats and Log Analysis.

Format detection flow

Tabular

Extension → parser:

  • .csv / .tsv → Papaparse (streaming)
  • .xlsx / .xls → SheetJS (XLSX → CSV in a worker)
  • .json / .jsonl / .ndjson → built-in JSON parser
  • .xml → fast-xml-parser
  • .parquet → DuckDB native

Logs

  1. Catalog lookup (local, explorer 0.15.0+) — regex + column sniffing against the 47-format catalog
  2. LLM fallback — 10-line sample → /api/detect-log-format{ logType, formatName, regex, columns }
  3. Auto-save — LLM hits are stored on your account for future local-only matching
  4. WASM parse — Rust-compiled parser walks the file in a Web Worker

The _raw column on log tables

Every log table has a _raw (or raw) VARCHAR column holding the original log line for each row, alongside the parsed columns. It's preserved so you can drop into SQL and inspect the source line whenever a parsed column looks wrong:

SELECT _raw FROM t_app_log WHERE level = 'ERROR' LIMIT 5;

The _raw column is hidden from the AI widget pipeline so it never ends up driving a chart axis. The match is intentionally narrow — ^_?raw$ (case-insensitive) — so columns like raw_count or raw_bytes from your data still surface to the AI normally.

Size limits

Browser JS heap is the limiter. See the ingest size matrix for realistic per-format caps. Quick summary:

System RAMPractical file size
4 GB~50 MB
8 GB~100–200 MB
16 GB~500 MB
32 GB+~1 GB+

DuckViz monitors memory and warns before the tab freezes. The largeFileWarnMB prop triggers a pre-ingest confirmation for files above a host-chosen threshold.

Extension allowlist

The CLI and drag-drop UI filter files by extension:

.csv  .tsv
.json  .jsonl  .ndjson
.xlsx  .xls
.xml
.log  .syslog  .clf  .evtx
.txt                   (parser catalog often matches)
.parquet               (zero-copy)

Anything else is silently skipped during folder walks.

Unsupported

  • Binary analytics formats (Avro, ORC) — Parquet is the only binary columnar format supported
  • Images, audio, video
  • PDF documents
  • Compressed archives (.zip, .gz, .tar) — extract first
  • SQL dumps (.sql)