-
-
Storage structure
CSV files are row-oriented, while Parquet files are column-oriented. In a CSV file, each line is a record, and each record is made up of fields separated by commas. In a Parquet file, data is stored in columns.
-
Compression
Parquet’s columnar structure allows for better compression, which results in smaller file sizes.
-
-
Performance
Parquet is better suited for analytical workloads, while CSV is better for OLTP workloads. Parquet is efficient for write-once, read-many analytics, and supports data skipping.
-
Ease of use
CSV is simple and widely used, and is found in Excel and Google Sheets. Accurate CSV formatting is important for data reliability and manipulation.
-
Origin
Parquet was developed in 2013 by Twitter and Cloudera.