Supported File Formats
All file formats that DuckViz can ingest and analyze.
DuckViz ingests structured and semi-structured files. Every parser runs in the browser (DuckDB-WASM, Papaparse, SheetJS, fast-xml-parser, or the Rust→WASM log parser) — file bytes never leave the tab.
Tabular data
| Format | Extensions | Engine | Notes |
|---|---|---|---|
| CSV | .csv | DuckDB + Papaparse | Auto-detects delimiter (comma, tab, semicolon, pipe). Handles quoted fields, BOM, newlines in values. |
| TSV | .tsv | DuckDB + Papaparse | Tab-separated. |
| Excel | .xlsx, .xls | SheetJS | Reads the first sheet by default. Column types inferred from cell formats. |
| JSON | .json | Built-in | Array of objects, or single root object. Nested objects flattened to dot-paths. |
| JSONL / NDJSON | .jsonl, .ndjson | Built-in | One JSON object per line — efficient for large files. |
| XML | .xml | fast-xml-parser | Windows Event exports + generic XML. Single-quote attributes supported. |
| Parquet | .parquet | DuckDB (native) | Zero-copy ingest. Cheapest format per MB. Supported end-to-end since CLI 0.2.0. |
Log files
Detection is local-first: the 47-format catalog in @duckviz/parser@0.2.0 is matched before the LLM is ever called. Once a format is chosen, the Rust→WASM parser does the actual parsing.
| Family | Extensions | Examples |
|---|---|---|
| Syslog | .log, .syslog | RFC 3164, RFC 5424, systemd journal |
| Apache / Nginx | .log, .access, .clf | Common Log Format, Combined, Nginx default |
| Windows Event | .evtx (XML export), .xml | Sysmon, Security, PowerShell, DNS Server |
| JSON logs | .json, .jsonl, .ndjson | Cloudflare, AWS CloudTrail, structured app logs |
| Delimited custom | .log, .tsv, .txt | Tab / pipe / custom delimiter |
| Regex custom | .log, .txt | Anything you save at /settings/log-formats |
When the catalog misses, a 10-line sample goes to /api/detect-log-format (the only log-content case where data leaves the browser). The LLM's answer is auto-saved as a custom format on your account, so next time a similar file drops, detection is local-only. See Custom Log Formats and Log Analysis.
Format detection flow
Tabular
Extension → parser:
.csv/.tsv→ Papaparse (streaming).xlsx/.xls→ SheetJS (XLSX → CSV in a worker).json/.jsonl/.ndjson→ built-in JSON parser.xml→ fast-xml-parser.parquet→ DuckDB native
Logs
- Catalog lookup (local, explorer
0.15.0+) — regex + column sniffing against the 47-format catalog - LLM fallback — 10-line sample →
/api/detect-log-format→{ logType, formatName, regex, columns } - Auto-save — LLM hits are stored on your account for future local-only matching
- WASM parse — Rust-compiled parser walks the file in a Web Worker
The _raw column on log tables
Every log table has a _raw (or raw) VARCHAR column holding the original log line for each row, alongside the parsed columns. It's preserved so you can drop into SQL and inspect the source line whenever a parsed column looks wrong:
SELECT _raw FROM t_app_log WHERE level = 'ERROR' LIMIT 5;The _raw column is hidden from the AI widget pipeline so it never ends up driving a chart axis. The match is intentionally narrow — ^_?raw$ (case-insensitive) — so columns like raw_count or raw_bytes from your data still surface to the AI normally.
Size limits
Browser JS heap is the limiter. See the ingest size matrix for realistic per-format caps. Quick summary:
| System RAM | Practical file size |
|---|---|
| 4 GB | ~50 MB |
| 8 GB | ~100–200 MB |
| 16 GB | ~500 MB |
| 32 GB+ | ~1 GB+ |
DuckViz monitors memory and warns before the tab freezes. The largeFileWarnMB prop triggers a pre-ingest confirmation for files above a host-chosen threshold.
Extension allowlist
The CLI and drag-drop UI filter files by extension:
.csv .tsv
.json .jsonl .ndjson
.xlsx .xls
.xml
.log .syslog .clf .evtx
.txt (parser catalog often matches)
.parquet (zero-copy)Anything else is silently skipped during folder walks.
Unsupported
- Binary analytics formats (Avro, ORC) — Parquet is the only binary columnar format supported
- Images, audio, video
- PDF documents
- Compressed archives (
.zip,.gz,.tar) — extract first - SQL dumps (
.sql)