
Apache Parquet is an open-source, column-oriented data file format initially designed for the Apache Hadoop ecosystem. It’s widely used as the underlying file format in modern cloud-based data lake architectures. Cloud storage systems such as Amazon S3, Azure Data Lake Storage, and Google Cloud Storage commonly store data in Parquet format due to its efficient columnar representation and retrieval capabilities. Major companies using Parquet include Netflix, Uber, Airbnb, and LinkedIn.
The problem pertains to deserialization of untrusted data and enables an attacker to gain full control over a vulnerable system using a specially crafted Parquet file.
In other words, to exploit this bug, an attacker has to trick the victim into importing a malicious Parquet file. If successful, the hacker can steal and modify data, disrupt services, or deploy malicious payloads (e.g. ransomware).
The vulnerability whose identifier is CVE-2025-30065 was discovered by an Amazon specialist and fixed in Apache Parquet 1.15.1.
” Schema parsing in the parquet-avro module of Apache Parquet 1.15.0 and previous versions allows bad actors to execute arbitrary code. Users are recommended to upgrade to version 1.15.1, which fixes the issue,” – Openwall.
According to Endor Labs, CVE-2025-30065 can impact data pipelines and analytics systems that import Parquet files, particularly when those files come from external or untrusted sources.
Endor Labs experts believe that the vulnerability was introduced in version 1.8.0, although older versions could be affected as well.
If, for some reason, an immediate upgrade to Apache Parquet 1.15.1 isn’t possible, it’s recommended to avoid any untrusted Parquet files and check them thoroughly prior to processing.