Supported File Formats

DuckViz ingests structured and semi-structured files. Every parser runs in the browser (DuckDB-WASM, Papaparse, SheetJS, fast-xml-parser, or the Rust→WASM log parser) — file bytes never leave the tab.

Tabular data

Format	Extensions	Engine	Notes
CSV	`.csv`	DuckDB + Papaparse	Auto-detects delimiter (comma, tab, semicolon, pipe). Handles quoted fields, BOM, newlines in values.
TSV	`.tsv`	DuckDB + Papaparse	Tab-separated.
Excel	`.xlsx`, `.xls`	SheetJS	Reads the first sheet by default. Column types inferred from cell formats.
JSON	`.json`	Built-in	Array of objects, or single root object. Nested objects flattened to dot-paths.
JSONL / NDJSON	`.jsonl`, `.ndjson`	Built-in	One JSON object per line — efficient for large files.
XML	`.xml`	fast-xml-parser	Windows Event exports + generic XML. Single-quote attributes supported.
Parquet	`.parquet`	DuckDB (native)	Zero-copy ingest. Cheapest format per MB. Supported end-to-end since CLI `0.2.0`.

Log files

Detection is local-first: the 47-format catalog in @duckviz/parser@0.2.0 is matched before the LLM is ever called. Once a format is chosen, the Rust→WASM parser does the actual parsing.

Family	Extensions	Examples
Syslog	`.log`, `.syslog`	RFC 3164, RFC 5424, systemd journal
Apache / Nginx	`.log`, `.access`, `.clf`	Common Log Format, Combined, Nginx default
Windows Event	`.evtx` (XML export), `.xml`	Sysmon, Security, PowerShell, DNS Server
JSON logs	`.json`, `.jsonl`, `.ndjson`	Cloudflare, AWS CloudTrail, structured app logs
Delimited custom	`.log`, `.tsv`, `.txt`	Tab / pipe / custom delimiter
Regex custom	`.log`, `.txt`	Anything you save at `/settings/log-formats`

When the catalog misses, a 10-line sample goes to /api/detect-log-format (the only log-content case where data leaves the browser). The LLM's answer is auto-saved as a custom format on your account, so next time a similar file drops, detection is local-only. See Custom Log Formats and Log Analysis.

Format detection flow

Tabular

Extension → parser:

.csv / .tsv → Papaparse (streaming)
.xlsx / .xls → SheetJS (XLSX → CSV in a worker)
.json / .jsonl / .ndjson → built-in JSON parser
.xml → fast-xml-parser
.parquet → DuckDB native

Logs

Catalog lookup (local, explorer 0.15.0+) — regex + column sniffing against the 47-format catalog
LLM fallback — 10-line sample → /api/detect-log-format → { logType, formatName, regex, columns }
Auto-save — LLM hits are stored on your account for future local-only matching
WASM parse — Rust-compiled parser walks the file in a Web Worker

The `_raw` column on log tables

Every log table has a _raw (or raw) VARCHAR column holding the original log line for each row, alongside the parsed columns. It's preserved so you can drop into SQL and inspect the source line whenever a parsed column looks wrong:

SELECT _raw FROM t_app_log WHERE level = 'ERROR' LIMIT 5;

The _raw column is hidden from the AI widget pipeline so it never ends up driving a chart axis. The match is intentionally narrow — ^_?raw$ (case-insensitive) — so columns like raw_count or raw_bytes from your data still surface to the AI normally.

Size limits

Browser JS heap is the limiter. See the ingest size matrix for realistic per-format caps. Quick summary:

System RAM	Practical file size
4 GB	~50 MB
8 GB	~100–200 MB
16 GB	~500 MB
32 GB+	~1 GB+

DuckViz monitors memory and warns before the tab freezes. The largeFileWarnMB prop triggers a pre-ingest confirmation for files above a host-chosen threshold.

Extension allowlist

The CLI and drag-drop UI filter files by extension:

.csv  .tsv
.json  .jsonl  .ndjson
.xlsx  .xls
.xml
.log  .syslog  .clf  .evtx
.txt                   (parser catalog often matches)
.parquet               (zero-copy)

Anything else is silently skipped during folder walks.

Unsupported

Binary analytics formats (Avro, ORC) — Parquet is the only binary columnar format supported
Images, audio, video
PDF documents
Compressed archives (.zip, .gz, .tar) — extract first
SQL dumps (.sql)

On this page