How to reshape json into columnar format for parquet and arrow
- Step 1Start with a flat JSON array — Flatten nested JSON first using the JSON Flattener so all fields are at the top level. Nested objects produce incorrect schema inference when loaded into Arrow directly.
- Step 2Transpose to columnar format — Paste the flat array and transpose. The output { "id": [1,2,3], "name": ["a","b","c"] } is in columnar format.
- Step 3Load into Apache Arrow (JavaScript) — const table = arrow.tableFromArrays({ id: Int32Array.from(transposed.id), name: transposed.name }). Use the correct Arrow type (Int32Array, Float64Array) for numeric columns to get typed schema inference.
- Step 4Load into DuckDB — DuckDB's JSON reader (read_json_auto) reads both row-oriented and columnar JSON. For Parquet conversion: duckdb -c "COPY (SELECT * FROM read_json_auto('data.json')) TO 'data.parquet' (FORMAT PARQUET)".
Frequently asked questions
Is there a performance advantage to transposing JSON before loading into Parquet?+
The primary advantage is schema clarity — columnar format makes the type inference for each column explicit before the Parquet writer assigns types. For very large datasets (100k+ rows), transposing in memory before writing can reduce Parquet writer overhead. For most use cases, DuckDB's read_json_auto with direct row-oriented JSON is more practical.
How do I handle heterogeneous types in a column — for example, a field that is sometimes a string and sometimes null?+
Null values in the column array are handled as the Arrow null type by default. If a field has mixed string and null values, Arrow infers a nullable Utf8 type. If a field genuinely has mixed string and integer values, cast all values to string in the column array before loading into Arrow.
Is the analytics dataset transmitted to JAD Apps?+
No. Transposition runs entirely in your browser. Analytical datasets and business records are never transmitted to JAD Apps servers.
Privacy first
Conversion runs locally in your browser. No file is uploaded — only metadata counters are saved for signed-in dashboard stats.