Parquet

Version 1.14.1

Release Notes - Parquet - Version 1.14.1

Bug

PARQUET-2468 - ParquetMetadata.toPrettyJSON throws exception on file read when LOG.isDebugEnabled()
PARQUET-2498 - Hadoop vector IO API doesn't handle empty list of ranges

Version 1.14.0

Release Notes - Parquet - Version 1.14.0

Bug

PARQUET-2260 - Bloom filter bytes size shouldn't be larger than maxBytes size in the configuration
PARQUET-2266 - Fix support for files without ColumnIndexes
PARQUET-2276 - ParquetReader reads do not work with Hadoop version 2.8.5
PARQUET-2300 - Update jackson-core 2.13.4 to a version without CVE PRISMA-2023-0067
PARQUET-2325 - Fix parquet-cli's dictionary subcommand to work with FIXED_LEN_BYTE_ARRAY
PARQUET-2329 - Fix wrong help messages of parquet-cli subcommands
PARQUET-2330 - Fix convert-csv to show the correct position of the invalid record
PARQUET-2332 - Fix unexpectedly disabled tests to be executed
PARQUET-2336 - Add caching key to CodecFactory
PARQUET-2342 - Parquet writer produced a corrupted file due to page value count overflow
PARQUET-2343 - Fixes NPE when rewriting file with multiple rowgroups
PARQUET-2348 - Recompression/Re-encrypt should rewrite bloomfilter
PARQUET-2354 - Apparent race condition in CharsetValidator
PARQUET-2363 - ParquetRewriter should encrypt the V2 page header
PARQUET-2365 - Fixes NPE when rewriting column without column index
PARQUET-2408 - Fix license header in .gitattributes
PARQUET-2420 - ThriftParquetWriter converts thrift byte to int32 without adding logical type
PARQUET-2429 - Direct buffer churn in NonBlockedDecompressor
PARQUET-2438 - Fixes minMaxSize for BinaryColumnIndexBuilder
PARQUET-2442 - Remove Parquet Site from parquet-mr
PARQUET-2448 - parquet-avro does not support nested logical-type for avro <= 1.8
PARQUET-2449 - Writing using LocalOutputFile creates a large buffer
PARQUET-2450 - ParquetAvroReader throws exception projecting a single field of a repeated record type
PARQUET-2456 - avro schema conversion may fail with name conflict when using fixed types
PARQUET-2457 - Missing maven-scala-plugin version
PARQUET-2458 - Java compiler should use release instead of source/target
PARQUET-2465 - Fall back to Hadoop Configuration

New Feature

PARQUET-1647 - Java support for Arrow's float16
PARQUET-2171 - Implement vectored IO in parquet file format
PARQUET-2318 - Implement a tool to list page headers

Improvement

PARQUET-1629 - Page-level CRC checksum verification for DataPageV2
PARQUET-1822 - Parquet without Hadoop dependencies
PARQUET-1942 - Bump Apache Arrow 2.0.0
PARQUET-2060 - Parquet corruption can cause infinite loop with Snappy
PARQUET-2212 - Add ByteBuffer api for decryptors to allow direct memory to be decrypted
PARQUET-2254 - Build a BloomFilter with a more precise size
PARQUET-2263 - Upgrade maven-shade-plugin to 3.4.1
PARQUET-2265 - AvroParquetWriter should default to data supplier model from Configuration
PARQUET-2267 - Add dependabot to update dependencies
PARQUET-2268 - Bump Thrift to 0.18.1
PARQUET-2272 - Bump protobuf-java from 3.17.3 to 3.19.6
PARQUET-2273 - Remove Travis from the repository
PARQUET-2274 - Remove Yetus
PARQUET-2275 - Upgrade cyclonedx-maven-plugin to 2.7.6
PARQUET-2277 - Bump hadoop.version from 3.2.3 to 3.3.5
PARQUET-2278 - Bump re2j from 1.1 to 1.7
PARQUET-2279 - Bump slf4j.version from 1.7.22 to 1.7.33
PARQUET-2280 - Bump h2 from 2.1.210 to 2.1.214
PARQUET-2282 - Dont initialize HadoopCodec
PARQUET-2283 - Remove Hadoop HiddenFileFilter
PARQUET-2290 - Add CI for Hadoop 2
PARQUET-2291 - Remove lingering japicmp exclusions
PARQUET-2292 - Improve default SpecificRecord model selection for Avro{Write,Read}Support
PARQUET-2293 - Bump guava from 27.0.1-jre to 31.1-jre
PARQUET-2294 - Bump fastutil from 8.4.2 to 8.5.12
PARQUET-2295 - Bump truth-proto-extension from 1.0 to 1.1.3
PARQUET-2296 - Bump easymock from 3.4 to 5.1.0
PARQUET-2297 - Encrypted files should not be checked for delta encoding problem
PARQUET-2301 - Add missing argument in ParquetRewriter logging
PARQUET-2302 - Bump joda-time from 2.9.7 to 2.12.5
PARQUET-2303 - Bump cyclonedx-maven-plugin from 2.7.6 to 2.7.9
PARQUET-2304 - Bump buildnumber-maven-plugin from 1.1 to 3.1.0
PARQUET-2305 - Allow Parquet to Proto conversion even though Target Schema has less fields
PARQUET-2307 - Bump zero-allocation-hashing from 0.9 to 0.16
PARQUET-2308 - Bump powermock.version from 2.0.2 to 2.0.9
PARQUET-2309 - Bump site-maven-plugin from 0.8 to 0.12
PARQUET-2312 - Bump snappy-java from 1.1.8.3 to 1.1.10.1 in /parquet-hadoop
PARQUET-2314 - Bump jackson.version from 2.15.0 to 2.15.2
PARQUET-2319 - Upgrade Avro to version 1.11.2
PARQUET-2320 - Bump jackson-databind from 2.14.2 to 2.15.2
PARQUET-2322 - Bump h2 from 2.1.214 to 2.2.220 in /parquet-column
PARQUET-2324 - Bump cobertura-maven-plugin from 2.5.2 to 2.7
PARQUET-2326 - Bump jcommander from 1.72 to 1.82
PARQUET-2328 - Add overwrite option to the parquet-cli's rewrite subcommand
PARQUET-2331 - Allow convert-csv to take multiple input files
PARQUET-2333 - Support bzip2 and xz compressions in the to-avro subcommand
PARQUET-2334 - Allow the cat subcommand to take multiple files
PARQUET-2335 - Allow the scan subcommand to take multiple files
PARQUET-2347 - Add interface layer between Parquet and Hadoop Configuration
PARQUET-2349 - Move from deprecated BytesCompressor/Decompressor to BytesInputCompressor/Decompressor
PARQUET-2357 - Modest refactor of CapacityByteArrayOutputStream
PARQUET-2359 - Simple Parquet Configuration implementation
PARQUET-2364 - Encrypt all columns option
PARQUET-2366 - Optimize random seek during rewriting
PARQUET-2368 - Update japicmp to 1.18.1
PARQUET-2370 - Crypto factory activation of "all column encryption" mode
PARQUET-2371 - Resolve japicmp failure for CI
PARQUET-2372 - Avoid unnecessary reading of RowGroup data during rewriting
PARQUET-2373 - Improve I/O performance with bloom_filter_length
PARQUET-2374 - Add metrics support for parquet file reader
PARQUET-2375 - Extend vectorized bit unpacking benchmark for various bit sizes.
PARQUET-2380 - Decouple RewriteOptions from Hadoop classes
PARQUET-2383 - Bump parquet-format to 2.10.0
PARQUET-2384 - Mark toOriginalType as deprecated
PARQUET-2385 - Don't initialize CodecFactory in ParquetWriter
PARQUET-2386 - More consistent code style in parquet-mr
PARQUET-2387 - Simplify hasFieldsIgnored expression
PARQUET-2388 - Deprecate CHARSETS on PlainValuesWriter
PARQUET-2389 - Remove redundant initializers
PARQUET-2390 - Replace anonymouse functions with lambda's
PARQUET-2391 - Remove unnecessary unboxing
PARQUET-2392 - Remove StringBuilder in LogicalTypeAnnotation
PARQUET-2393 - Make ColumnIOCreatorVisitor static
PARQUET-2394 - Use computeIfAbsent in MessageColumnIO
PARQUET-2395 - Prefer singletonList over asList
PARQUET-2396 - Refactor ColumnIndexBuilder
PARQUET-2397 - Make use of isEmpty
PARQUET-2398 - Make static variables final
PARQUET-2399 - Use deprecated tag in Javadoc
PARQUET-2400 - Update Spotless command in PR prompt to include vector plugins
PARQUET-2401 - Synchronize on final fields
PARQUET-2406 - Remove redundant valueOf calls
PARQUET-2407 - Add custom .asf.yaml for finer-grained control of email notifications
PARQUET-2410 - Use row count instead of value count to get row count from OffsetIndex
PARQUET-2413 - Support custom file footer metadata via ParquetWriter
PARQUET-2417 - Update NOTICE
PARQUET-2419 - Reduce noisy logging when running test suite
PARQUET-2422 - Prevent unwrapping of Hadoop filestreams
PARQUET-2425 - AvroSchemaConverter doesn't support non-grouped repeated fields
PARQUET-2426 - Add lz4_raw compression to README
PARQUET-2428 - Make RawPagesReader support specified columns
PARQUET-2432 - Use ByteBufferAllocator instead of hardcoded heap allocation
PARQUET-2436 - More optimal memory usage in compression codecs
PARQUET-2437 - Avoid flushing at Parquet writes after an exception
PARQUET-2439 - Upgrade ZSTD-JNI to 1.5.5-11
PARQUET-2445 - Fix log exception when FieldsMarker.visitedIndexes is empty
PARQUET-2446 - ProtoParquetWriter Not Support DynamicMessage
PARQUET-2451 - Add BYTE_STREAM_SPLIT support for FIXED_LEN_BYTE_ARRAY, INT32 and INT64
PARQUET-2453 - Add build-helper-maven-plugin for parquet-column/common module
PARQUET-2454 - Invoking flush before closing the output stream in ParquetFileWriter
PARQUET-2463 - Bump japicmp to 0.21.0

Test

PARQUET-2361 - Reduce failure rate of unit test testParquetFileWithBloomFilterWithFpp

Task

PARQUET-2418 - Add integration test for BYTE_STREAM_SPLIT

Version 1.13.1

Release Notes - Parquet - Version 1.13.1

Improvement

PARQUET-2276 - Bring back support for Hadoop 2.7.3
PARQUET-2297 - Skip delta problem check
PARQUET-2292 - Improve default SpecificRecord model selection for Avro {Write,Read}Support
PARQUET-2290 - Add CI for Hadoop 2
PARQUET-2282 - Don't initialize HadoopCodec
PARQUET-2283 - Remove Hadoop HiddenFileFilter
PARQUET-2081 - Fix support for rewriting files without ColumnIndexes

Version 1.13.0

Release Notes - Parquet - Version 1.13.0

New Feature

PARQUET-1020 - Add support for Dynamic Messages in parquet-protobuf

Task

PARQUET-2230 - Add a new rewrite command powered by ParquetRewriter
PARQUET-2228 - ParquetRewriter supports more than one input file
PARQUET-2229 - ParquetRewriter supports masking and encrypting the same column
PARQUET-2227 - Refactor different file rewriters to use single implementation

Improvement

PARQUET-2258 - Storing toString fields in FilterPredicate instances can lead to memory pressure
PARQUET-2252 - Make some methods public to allow external projects to implement page skipping
PARQUET-2159 - Vectorized BytePacker decoder using Java VectorAPI
PARQUET-2246 - Add short circuit logic to column index filter
PARQUET-2226 - Support merge Bloom Filters
PARQUET-2224 - Publish SBOM artifacts
PARQUET-2208 - Add details to nested column encryption config doc and exception text
PARQUET-2195 - Add scan command to parquet-cli
PARQUET-2196 - Support LZ4_RAW codec
PARQUET-2176 - Column index/statistics truncation in ParquetWriter
PARQUET-2197 - Document uniform encryption
PARQUET-2191 - Upgrade Scala to 2.12.17
PARQUET-2169 - Upgrade Avro to version 1.11.1
PARQUET-2155 - Upgrade protobuf version to 3.17.3
PARQUET-2158 - Upgrade Hadoop dependency to version 3.2.0
PARQUET-2138 - Add ShowBloomFilterCommand to parquet-cli
PARQUET-2157 - Add BloomFilter fpp config

Bug

PARQUET-2202 - Redundant String allocation on the hot path in CapacityByteArrayOutputStream.setByte
PARQUET-2164 - CapacityByteArrayOutputStream overflow while writing causes negative row group sizes to be written
PARQUET-2103 - Fix crypto exception in print toPrettyJSON
PARQUET-2251 - Avoid generating Bloomfilter when all pages of a column are encoded by dictionary
PARQUET-2243 - Support zstd-jni in DirectCodecFactory
PARQUET-2247 - Fail-fast if CapacityByteArrayOutputStream write overflow
PARQUET-2241 - Fix ByteStreamSplitValuesReader with nulls
PARQUET-2244 - Fix notIn for columns with null values
PARQUET-2173 - Fix parquet build against hadoop 3.3.3+
PARQUET-2219 - ParquetFileReader skips empty row group
PARQUET-2198 - Updating jackson data bind version to fix CVEs
PARQUET-2177 - Fix parquet-cli not to fail showing descriptions
PARQUET-1711 - Support recursive proto schemas by limiting recursion depth
PARQUET-2142 - parquet-cli without hadoop throws java.lang.NoSuchMethodError on any parquet file access command
PARQUET-2160 - Close decompression stream to free off-heap memory in time
PARQUET-2185 - ParquetReader constructed using builder fails to read encrypted files
PARQUET-2167 - CLI show footer command fails if Parquet file contains date fields
PARQUET-2134 - Incorrect type checking in HadoopStreams.wrap
PARQUET-2161 - Fix row index generation in combination with range filtering
PARQUET-2154 - ParquetFileReader should close its input stream when filterRowGroups throw Exception in constructor

Test

PARQUET-2192 - Add Java 17 build test to GitHub action

Version 1.12.3

Release Notes - Parquet - Version 1.12.3

New Feature

PARQUET-2117 - Add rowPosition API in parquet record readers

Task

PARQUET-2081 - Encryption translation tool - Parquet-hadoop

Improvement

PARQUET-2040 - Uniform encryption
PARQUET-2076 - Improve Travis CI build Performance
PARQUET-2105 - Refactor the test code of creating the test file
PARQUET-2106 - BinaryComparator should avoid doing ByteBuffer.wrap in the hot-path
PARQUET-2112 - Fix typo in MessageColumnIO
PARQUET-2121 - Remove descriptions for the removed modules
PARQUET-2127 - Security risk in latest parquet-jackson-1.12.2.jar
PARQUET-2128 - Bump Thrift to 0.16.0
PARQUET-2129 - Add uncompressedSize to "meta" output
PARQUET-2136 - File writer construction with encryptor

Bug

PARQUET-2101 - Fix wrong descriptions about the default block size
PARQUET-2102 - Typo in ColumnIndexBase toString
PARQUET-2107 - Travis failures
PARQUET-2120 - parquet-cli dictionary command fails on pages without dictionary encoding
PARQUET-2144 - Fix ColumnIndexBuilder for notIn predicate
PARQUET-2148 - Enable uniform decryption with plaintext footer

Version 1.12.2

Release Notes - Parquet - Version 1.12.2

Bug

PARQUET-2094 - Handle negative values in page headers

Version 1.12.1

Release Notes - Parquet - Version 1.12.1

Bug

PARQUET-1633 - Fix integer overflow
PARQUET-2022 - ZstdDecompressorStream should close zstdInputStream
PARQUET-2027 - Fix calculating directory offset for merge
PARQUET-2052 - Integer overflow when writing huge binary using dictionary encoding
PARQUET-2054 - fix TCP leaking when calling ParquetFileWriter.appendFile
PARQUET-2072 - Do Not Determine Both Min/Max for Binary Stats
PARQUET-2073 - Fix estimate remaining row count in ColumnWriteStoreBase.
PARQUET-2078 - Failed to read parquet file after writing with the same parquet version

Improvement

PARQUET-2064 - Make Range public accessible in RowRanges

Version 1.12.0

Release Notes - Parquet - Version 1.12.0

Sub-task

PARQUET-1228 - parquet-format-structures encryption
PARQUET-1229 - parquet-mr code changes for encryption support
PARQUET-1286 - Crypto package in parquet-mr
PARQUET-1328 - [java]Bloom filter read/write implementation
PARQUET-1391 - [java] Integrate Bloom filter logic
PARQUET-1516 - Store Bloom filters near to footer.
PARQUET-1740 - Make ParquetFileReader.getFilteredRecordCount public
PARQUET-1744 - Some filters throws ArrayIndexOutOfBoundsException
PARQUET-1807 - Encryption: Interop and Function test suite for Java version
PARQUET-1884 - Merge encryption branch into master
PARQUET-1915 - Add null command

Bug

PARQUET-1438 - [C++] corrupted files produced on 32-bit architecture (i686)
PARQUET-1493 - maven protobuf plugin not work properly
PARQUET-1455 - [parquet-protobuf] Handle "unknown" enum values for parquet-protobuf
PARQUET-1554 - Compilation error when upgrading Scrooge version
PARQUET-1599 - Fix to-avro to respect the overwrite option
PARQUET-1684 - [parquet-protobuf] default protobuf field values are stored as nulls
PARQUET-1699 - Could not resolve org.apache.yetus:audience-annotations:0.11.0
PARQUET-1741 - APIs backward compatibility issues cause master branch build failure
PARQUET-1765 - Invalid filteredRowCount in InternalParquetRecordReader
PARQUET-1794 - Random data generation may cause flaky tests
PARQUET-1803 - Could not find FilleInputSplit in ParquetInputSplit
PARQUET-1808 - SimpleGroup.toString() uses String += and so has poor performance
PARQUET-1818 - Fix collision of encryption and bloom filters in format-structure Util
PARQUET-1850 - toParquetMetadata method in ParquetMetadataConverter does not set dictionary page offset bit
PARQUET-1851 - ParquetMetadataConveter throws NPE in an Iceberg unit test
PARQUET-1868 - Parquet reader options toggle for bloom filter toggles dictionary filtering
PARQUET-1879 - Apache Arrow can not read a Parquet File written with Parqet-Avro 1.11.0 with a Map field
PARQUET-1893 - H2SeekableInputStream readFully() doesn't respect start and len
PARQUET-1894 - Please fix the related Shaded Jackson Databind CVEs
PARQUET-1896 - [Maven] parquet-tools build is broken
PARQUET-1910 - Parquet-cli is broken after TransCompressionCommand was added
PARQUET-1917 - [parquet-proto] default values are stored in oneOf fields that aren't set
PARQUET-1920 - Fix issue with reading parquet files with too large column chunks
PARQUET-1923 - parquet-tools 1.11.0: TestSimpleRecordConverter fails with ExceptionInInitializerError on openjdk 15
PARQUET-1928 - Interpret Parquet INT96 type as FIXED[12] AVRO Schema
PARQUET-1944 - Unable to download transitive dependency hadoop-lzo
PARQUET-1947 - DeprecatedParquetInputFormat in CombineFileInputFormat would produce wrong data
PARQUET-1949 - Mark Parquet-1872 with not support bloom filter yet
PARQUET-1954 - TCP connection leak in parquet dump
PARQUET-1963 - DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the first sub-split is empty
PARQUET-1966 - Fix build with JDK11 for JDK8
PARQUET-1970 - Make minor releases source compatible
PARQUET-1971 - Flaky test in github action
PARQUET-1975 - Test failure on ARM64 CPU architecture
PARQUET-1977 - Invalid data_page_offset
PARQUET-1979 - Optional bloom_filter_offset is filled if no bloom filter is present
PARQUET-1984 - Some tests fail on windows
PARQUET-1992 - Cannot build from tarball because of git submodules
PARQUET-1999 - NPE might occur if OutputFile is implemented by the client

New Feature

PARQUET-41 - Add bloom filters to parquet statistics
PARQUET-1373 - Encryption key management tools
PARQUET-1396 - Example of using EncryptionPropertiesFactory and DecryptionPropertiesFactory
PARQUET-1622 - Add BYTE_STREAM_SPLIT encoding
PARQUET-1784 - Column-wise configuration
PARQUET-1817 - Crypto Properties Factory
PARQUET-1854 - Properties-Driven Interface to Parquet Encryption

Improvement

PARQUET-313 - Implement 3 level list writing rule for Parquet-Thrift
PARQUET-1528 - Add JSON support to `parquet-tools head`
PARQUET-1593 - Replace the example usage in parquet-cli's help message with an actually existent subcommand
PARQUET-1660 - [java] Align Bloom filter implementation with format
PARQUET-1666 - Remove Unused Modules
PARQUET-1696 - Remove unused hadoop-1 profile
PARQUET-1710 - Use Objects.requireNonNull
PARQUET-1723 - Read From Maps Without Using Contains
PARQUET-1724 - Use ConcurrentHashMap for Cache in DictionaryPageReader
PARQUET-1725 - Replace Usage of Strings.join with JDK Functionality in ColumnPath Class
PARQUET-1726 - Use Java 8 Multi Exception Handling
PARQUET-1727 - Do Not Swallow InterruptedException in ParquetLoader
PARQUET-1728 - Simplify NullPointerException Handling in AvroWriteSupport
PARQUET-1729 - Avoid AutoBoxing in EncodingStats
PARQUET-1730 - Use switch Statement in AvroIndexedRecordConverter for Enums
PARQUET-1731 - Use JDK 8 Facilities to Simplify FilteringRecordMaterializer
PARQUET-1732 - Call toArray With Empty Array
PARQUET-1735 - Clean Up parquet-columns Module
PARQUET-1736 - Use StringBuilder instead of StringBuffer
PARQUET-1737 - Replace Test Class RandomStr with Apache Commons Lang
PARQUET-1738 - Remove unused imports in parquet-column
PARQUET-1743 - Add equals to BlockSplitBloomFilter
PARQUET-1749 - Use Java 8 Streams for Empty PrimitiveIterator
PARQUET-1750 - Reduce Memory Usage of RowRanges Class
PARQUET-1751 - Fix Protobuf Build Warning
PARQUET-1756 - Remove Dependency on Maven Plugin semantic-versioning
PARQUET-1759 - InternalParquetRecordReader Use Singleton Set
PARQUET-1763 - Add SLF4J to TestCircularReferences
PARQUET-1764 - The ParquetProperties constructor parameter list is so long
PARQUET-1775 - Deprecate AvroParquetWriter Builder Hadoop Path
PARQUET-1778 - Do Not Consider Class for Avro Generic Record Reader
PARQUET-1782 - Use Switch Statement in AvroRecordConverter
PARQUET-1790 - ParquetFileWriter missing Api for DataPageV2
PARQUET-1791 - Add 'prune' command to parquet-tools
PARQUET-1801 - Add column index support for 'prune' command in Parquet-tools/cli
PARQUET-1802 - CompressionCodec class not found if the codec class is not in the same defining classloader as the CodecFactory class
PARQUET-1805 - Refactor the configuration for bloom filters
PARQUET-1821 - Add 'column-size' command to parquet-cli and parquet-tools
PARQUET-1826 - Document hadoop configuration options
PARQUET-1827 - UUID type currently not supported by parquet-mr
PARQUET-1853 - Minimize the parquet-avro fastutil shaded jar
PARQUET-1863 - Remove use of add-test-source mojo in parquet-protobuf
PARQUET-1866 - Replace Hadoop ZSTD with JNI-ZSTD
PARQUET-1890 - Upgrade to Avro 1.10.0
PARQUET-1891 - Encryption-related light fixes
PARQUET-1914 - Allow ProtoParquetReader To Support InputFile
PARQUET-1924 - Do not Instantiate a New LongHashFunction
PARQUET-1926 - Add LogicalType support to ThriftType.I64Type
PARQUET-1929 - Bump Snappy to 1.1.8
PARQUET-1930 - Bump Apache Thrift to 0.13.0
PARQUET-1931 - Bump Junit 4.13.1
PARQUET-1932 - Bump Fastutil to 8.4.2
PARQUET-1938 - Option to get KMS details from key material (in key rotation)
PARQUET-1939 - Fix RemoteKmsClient API ambiguity
PARQUET-1940 - Make KeyEncryptionKey length configurable
PARQUET-1941 - Bump Commons CLI from 1.3.1 to 1.4
PARQUET-1951 - Allow different strategies to combine key values when merging parquet files
PARQUET-1952 - Upgrade Avro to 1.10.1
PARQUET-1961 - Bump Jackson to 2.11.4
PARQUET-1964 - Properly handle missing/null filter
PARQUET-1967 - Upgrade Zstd-jni to 1.4.8-3
PARQUET-1969 - Test by GithubAction
PARQUET-1973 - Support ZSTD JNI BufferPool
PARQUET-1988 - Upgrade to ZSTD 1.4.8-6
PARQUET-1994 - Upgrade ZSTD JNI to 1.4.9-1

Test

PARQUET-1832 - Travis fails with too long output
PARQUET-1980 - Build and test Apache Parquet on ARM64 CPU architecture

Wish

PARQUET-1717 - parquet-thrift converts Thrift i16 to parquet INT32 instead of INT_16

Task

PARQUET-1676 - Remove hive modules
PARQUET-1703 - Update API compatibility check
PARQUET-1796 - Bump Apache Avro to 1.9.2
PARQUET-1842 - Update Jackson Databind version to address CVE
PARQUET-1844 - Removed Hadoop transitive dependency on commons-lang
PARQUET-1895 - Update jackson-databind
PARQUET-1898 - Release parquet-mr 1.12.0

Version 1.11.0

Release Notes - Parquet - Version 1.11.0

Bug

PARQUET-138 - Parquet should allow a merge between required and optional schemas
PARQUET-952 - Avro union with single type fails with 'is not a group'
PARQUET-1128 - [Java] Upgrade the Apache Arrow version to 0.8.0 for SchemaConverter
PARQUET-1281 - Jackson dependency
PARQUET-1285 - [Java] SchemaConverter should not convert from TimeUnit.SECOND AND TimeUnit.NANOSECOND of Arrow
PARQUET-1293 - Build failure when using Java 8 lambda expressions
PARQUET-1296 - Travis kills build after 10 minutes, because "no output was received"
PARQUET-1297 - [Java] SchemaConverter should not convert from Timestamp(TimeUnit.SECOND) and Timestamp(TimeUnit.NANOSECOND) of Arrow
PARQUET-1303 - Avro reflect @Stringable field write error if field not instanceof CharSequence
PARQUET-1304 - Release 1.10 contains breaking changes for Hive
PARQUET-1305 - Backward incompatible change introduced in 1.8
PARQUET-1309 - Parquet Java uses incorrect stats and dictionary filter properties
PARQUET-1311 - Update README.md
PARQUET-1317 - ParquetMetadataConverter throw NPE
PARQUET-1341 - Null count is suppressed when columns have no min or max and use unsigned sort order
PARQUET-1344 - Type builders don't honor new logical types
PARQUET-1368 - ParquetFileReader should close its input stream for the failure in constructor
PARQUET-1371 - Time/Timestamp UTC normalization parameter doesn't work
PARQUET-1407 - Data loss on duplicate values with AvroParquetWriter/Reader
PARQUET-1417 - BINARY_AS_SIGNED_INTEGER_COMPARATOR fails with IOBE for the same arrays with the different length
PARQUET-1421 - InternalParquetRecordWriter logs debug messages at the INFO level
PARQUET-1440 - Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale
PARQUET-1441 - SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter
PARQUET-1456 - Use page index, ParquetFileReader throw ArrayIndexOutOfBoundsException
PARQUET-1460 - Fix javadoc errors and include javadoc checking in Travis checks
PARQUET-1461 - Third party code does not compile after parquet-mr minor version update
PARQUET-1470 - Inputstream leakage in ParquetFileWriter.appendFile
PARQUET-1472 - Dictionary filter fails on FIXED_LEN_BYTE_ARRAY
PARQUET-1475 - DirectCodecFactory's ParquetCompressionCodecException drops a passed in cause in one constructor
PARQUET-1478 - Can't read spec compliant, 3-level lists via parquet-proto
PARQUET-1480 - INT96 to avro not yet implemented error should mention deprecation
PARQUET-1485 - Snappy Decompressor/Compressor may cause direct memory leak
PARQUET-1488 - UserDefinedPredicate throw NPE
PARQUET-1496 - [Java] Update Scala for JDK 11 compatibility
PARQUET-1497 - [Java] javax annotations dependency missing for Java 11
PARQUET-1498 - [Java] Add instructions to install thrift via homebrew
PARQUET-1510 - Dictionary filter skips null values when evaluating not-equals.
PARQUET-1514 - ParquetFileWriter Records Compressed Bytes instead of Uncompressed Bytes
PARQUET-1527 - [parquet-tools] cat command throw java.lang.ClassCastException
PARQUET-1529 - Shade fastutil in all modules where used
PARQUET-1531 - Page row count limit causes empty pages to be written from MessageColumnIO
PARQUET-1533 - TestSnappy() throws OOM exception with Parquet-1485 change
PARQUET-1534 - [parquet-cli] Argument error: Illegal character in opaque part at index 2 on Windows
PARQUET-1544 - Possible over-shading of modules
PARQUET-1550 - CleanUtil does not work in Java 11
PARQUET-1555 - Bump snappy-java to 1.1.7.3
PARQUET-1596 - PARQUET-1375 broke parquet-cli's to-avro command
PARQUET-1600 - Fix shebang in parquet-benchmarks/run.sh
PARQUET-1615 - getRecordWriter shouldn't hardcode CREAT mode when new ParquetFileWriter
PARQUET-1637 - Builds are failing because default jdk changed to openjdk11 on Travis
PARQUET-1644 - Clean up some benchmark code and docs.
PARQUET-1691 - Build fails due to missing hadoop-lzo

New Feature

PARQUET-1201 - Column indexes
PARQUET-1253 - Support for new logical type representation
PARQUET-1388 - Nanosecond precision time and timestamp - parquet-mr

Improvement

PARQUET-1135 - upgrade thrift and protobuf dependencies
PARQUET-1280 - [parquet-protobuf] Use maven protoc plugin
PARQUET-1321 - LogicalTypeAnnotation.LogicalTypeAnnotationVisitor#visit methods should have a return value
PARQUET-1335 - Logical type names in parquet-mr are not consistent with parquet-format
PARQUET-1336 - PrimitiveComparator should implements Serializable
PARQUET-1365 - Don't write page level statistics
PARQUET-1375 - Upgrade to supported version of Jackson
PARQUET-1383 - Parquet tools should indicate UTC parameter for time/timestamp types
PARQUET-1390 - [Java] Upgrade to Arrow 0.10.0
PARQUET-1399 - Move parquet-mr related code from parquet-format
PARQUET-1410 - Refactor modules to use the new logical type API
PARQUET-1414 - Limit page size based on maximum row count
PARQUET-1418 - Run integration tests in Travis
PARQUET-1435 - Benchmark filtering column-indexes
PARQUET-1444 - Prefer ArrayList over LinkedList
PARQUET-1445 - Remove Files.java
PARQUET-1462 - Allow specifying new development version in prepare-release.sh
PARQUET-1466 - Upgrade to the latest guava 27.0-jre
PARQUET-1474 - Less verbose and lower level logging for missing column/offset indexes
PARQUET-1476 - Don't emit a warning message for files without new logical type
PARQUET-1487 - Do not write original type for timezone-agnostic timestamps
PARQUET-1489 - Insufficient documentation for UserDefinedPredicate.keep(T)
PARQUET-1490 - Add branch-specific Travis steps
PARQUET-1492 - Remove protobuf install in travis build
PARQUET-1499 - [parquet-mr] Add Java 11 to Travis
PARQUET-1500 - Remove the Closables
PARQUET-1502 - Convert FIXED_LEN_BYTE_ARRAY to arrow type in logicalTypeAnnotation if it is not null
PARQUET-1503 - Remove Ints Utility Class
PARQUET-1504 - Add an option to convert Parquet Int96 to Arrow Timestamp
PARQUET-1505 - Use Java 7 NIO StandardCharsets
PARQUET-1506 - Migrate from maven-thrift-plugin to thrift-maven-plugin
PARQUET-1507 - Bump Apache Thrift to 0.12.0
PARQUET-1509 - Update Docs for Hive Deprecation
PARQUET-1513 - HiddenFileFilter Streamline
PARQUET-1518 - Bump Jackson2 version of parquet-cli
PARQUET-1530 - Remove Dependency on commons-codec
PARQUET-1542 - Merge multiple I/O to one time I/O when read footer
PARQUET-1557 - Replace deprecated Apache Avro methods
PARQUET-1558 - Use try-with-resource in Apache Avro tests
PARQUET-1576 - Upgrade to Avro 1.9.0
PARQUET-1577 - Remove duplicate license
PARQUET-1578 - Introduce Lambdas
PARQUET-1579 - Add Github PR template
PARQUET-1580 - Page-level CRC checksum verification for DataPageV1
PARQUET-1601 - Add zstd support to parquet-cli to-avro
PARQUET-1604 - Bump fastutil from 7.0.13 to 8.2.3
PARQUET-1605 - Bump maven-javadoc-plugin from 2.9 to 3.1.0
PARQUET-1606 - Fix invalid tests scope
PARQUET-1607 - Remove duplicate maven-enforcer-plugin
PARQUET-1616 - Enable Maven batch mode
PARQUET-1650 - Implement unit test to validate column/offset indexes
PARQUET-1654 - Remove unnecessary options when building thrift
PARQUET-1661 - Upgrade to Avro 1.9.1
PARQUET-1662 - Upgrade Jackson to version 2.9.10
PARQUET-1665 - Upgrade zstd-jni to 1.4.0-1
PARQUET-1669 - Disable compiling all libraries when building thrift
PARQUET-1671 - Upgrade Yetus to 0.11.0
PARQUET-1682 - Maintain forward compatibility for TIME/TIMESTAMP
PARQUET-1683 - Remove unnecessary string converting in readFooter method
PARQUET-1685 - Truncate the stored min and max for String statistics to reduce the footer size

Test

PARQUET-1536 - [parquet-cli] Add simple tests for each command

Wish

PARQUET-1552 - upgrade protoc-jar-maven-plugin to 3.8.0
PARQUET-1673 - Upgrade parquet-mr format version to 2.7.0

Task

PARQUET-968 - Add Hive/Presto support in ProtoParquet
PARQUET-1294 - Update release scripts for the new Apache policy
PARQUET-1434 - Release parquet-mr 1.11.0
PARQUET-1436 - TimestampMicrosStringifier shows wrong microseconds for timestamps before 1970
PARQUET-1452 - Deprecate old logical types API
PARQUET-1551 - Support Java 11 - top-level JIRA
PARQUET-1570 - Publish 1.11.0 to maven central
PARQUET-1585 - Update old external links in the code base
PARQUET-1645 - Bump Apache Avro to 1.9.1
PARQUET-1649 - Bump Jackson Databind to 2.9.9.3
PARQUET-1687 - Update release process

Version 1.10.1

Release Notes - Parquet - Version 1.10.1

Bug

PARQUET-1510 - Dictionary filter skips null values when evaluating not-equals.
PARQUET-1309 - Parquet Java uses incorrect stats and dictionary filter properties

Version 1.10.0

Release Notes - Parquet - Version 1.10.0

Bug

PARQUET-196 - parquet-tools command to get rowcount & size
PARQUET-357 - Parquet-thrift generates wrong schema for Thrift binary fields
PARQUET-765 - Upgrade Avro to 1.8.1
PARQUET-783 - H2SeekableInputStream does not close its underlying FSDataInputStream, leading to connection leaks
PARQUET-786 - parquet-tools README incorrectly has 'java jar' instead of 'java -jar'
PARQUET-791 - Predicate pushing down on missing columns should work on UserDefinedPredicate too
PARQUET-1005 - Fix DumpCommand parsing to allow column projection
PARQUET-1028 - [JAVA] When reading old Spark-generated files with INT96, stats are reported as valid when they aren't
PARQUET-1065 - Deprecate type-defined sort ordering for INT96 type
PARQUET-1077 - [MR] Switch to long key ids in KEYs file
PARQUET-1141 - IDs are dropped in metadata conversion
PARQUET-1152 - Parquet-thrift doesn't compile with Thrift 0.9.3
PARQUET-1153 - Parquet-thrift doesn't compile with Thrift 0.10.0
PARQUET-1156 - dev/merge_parquet_pr.py problems
PARQUET-1185 - TestBinary#testBinary unit test fails after PARQUET-1141
PARQUET-1191 - Type.hashCode() takes originalType into account but Type.equals() does not
PARQUET-1208 - Occasional endless loop in unit test
PARQUET-1217 - Incorrect handling of missing values in Statistics
PARQUET-1246 - Ignore float/double statistics in case of NaN
PARQUET-1258 - Update scm developer connection to github

New Feature

PARQUET-1025 - Support new min-max statistics in parquet-mr

Improvement

PARQUET-220 - Unnecessary warning in ParquetRecordReader.initialize
PARQUET-321 - Set the HDFS padding default to 8MB
PARQUET-386 - Printing out the statistics of metadata in parquet-tools
PARQUET-423 - Make writing Avro to Parquet less noisy
PARQUET-755 - create parquet-arrow module with schema converter
PARQUET-777 - Add new Parquet CLI tools
PARQUET-787 - Add a size limit for heap allocations when reading
PARQUET-801 - Allow UserDefinedPredicates in DictionaryFilter
PARQUET-852 - Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder
PARQUET-884 - Add support for Decimal datatype to Parquet-Pig record reader
PARQUET-969 - Decimal datatype support for parquet-tools output
PARQUET-990 - More detailed error messages in footer parsing
PARQUET-1024 - allow for case insensitive parquet-xxx prefix in PR title
PARQUET-1026 - allow unsigned binary stats when min == max
PARQUET-1115 - Warn users when misusing parquet-tools merge
PARQUET-1135 - upgrade thrift and protobuf dependencies
PARQUET-1142 - Avoid leaking Hadoop API to downstream libraries
PARQUET-1149 - Upgrade Avro dependancy to 1.8.2
PARQUET-1170 - Logical-type-based toString for proper representeation in tools/logs
PARQUET-1183 - AvroParquetWriter needs OutputFile based Builder
PARQUET-1197 - Log rat failures
PARQUET-1198 - Bump java source and target to java8
PARQUET-1215 - Add accessor for footer after a file is closed
PARQUET-1263 - ParquetReader's builder should use Configuration from the InputFile

Task

PARQUET-768 - Add Uwe L. Korn to KEYS
PARQUET-1189 - Release Parquet Java 1.10

Version 1.9.0

Bug

PARQUET-182 - FilteredRecordReader skips rows it shouldn't for schema with optional columns
PARQUET-212 - Implement nested type read rules in parquet-thrift
PARQUET-241 - ParquetInputFormat.getFooters() should return in the same order as what listStatus() returns
PARQUET-305 - Logger instantiated for package org.apache.parquet may be GC-ed
PARQUET-335 - Avro object model should not require MAP_KEY_VALUE
PARQUET-340 - totalMemoryPool is truncated to 32 bits
PARQUET-346 - ThriftSchemaConverter throws for unknown struct or union type
PARQUET-349 - VersionParser does not handle versions like "parquet-mr 1.6.0rc4"
PARQUET-352 - Add tags to "created by" metadata in the file footer
PARQUET-353 - Compressors not getting recycled while writing parquet files, causing memory leak
PARQUET-360 - parquet-cat json dump is broken for maps
PARQUET-363 - Cannot construct empty MessageType for ReadContext.requestedSchema
PARQUET-367 - "parquet-cat -j" doesn't show all records
PARQUET-372 - Parquet stats can have awkwardly large values
PARQUET-373 - MemoryManager tests are flaky
PARQUET-379 - PrimitiveType.union erases original type
PARQUET-380 - Cascading and scrooge builds fail when using thrift 0.9.0
PARQUET-385 - PrimitiveType.union accepts fixed_len_byte_array fields with different lengths when strict mode is on
PARQUET-387 - TwoLevelListWriter does not handle null values in array
PARQUET-389 - Filter predicates should work with missing columns
PARQUET-395 - System.out is used as logger in org.apache.parquet.Log
PARQUET-396 - The builder for AvroParquetReader loses the record type
PARQUET-400 - Error reading some files after PARQUET-77 bytebuffer read path
PARQUET-409 - InternalParquetRecordWriter doesn't use min/max row counts
PARQUET-410 - Fix subprocess hang in merge_parquet_pr.py
PARQUET-413 - Test failures for Java 8
PARQUET-415 - ByteBufferBackedBinary serialization is broken
PARQUET-422 - Fix a potential bug in MessageTypeParser where we ignore and overwrite the initial value of a method parameter
PARQUET-425 - Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly
PARQUET-426 - Throw Exception when predicate contains columns not specified in prejection, to prevent filtering out data improperly
PARQUET-430 - Change to use Locale parameterized version of String.toUpperCase()/toLowerCase
PARQUET-431 - Make ParquetOutputFormat.memoryManager volatile
PARQUET-495 - Fix mismatches in Types class comments
PARQUET-509 - Incorrect number of args passed to string.format calls
PARQUET-511 - Integer overflow on counting values in column
PARQUET-528 - Fix flush() for RecordConsumer and implementations
PARQUET-529 - Avoid evoking job.toString() in ParquetLoader
PARQUET-540 - Cascading3 module doesn't build when using thrift 0.9.0
PARQUET-544 - ParquetWriter.close() throws NullPointerException on second call, improper implementation of Closeable contract
PARQUET-560 - Incorrect synchronization in SnappyCompressor
PARQUET-569 - ParquetMetadataConverter offset filter is broken
PARQUET-571 - Fix potential leak in ParquetFileReader.close()
PARQUET-580 - Potentially unnecessary creation of large int[] in IntList for columns that aren't used
PARQUET-581 - Min/max row count for page size check are conflated in some places
PARQUET-584 - show proper command usage when there's no arguments
PARQUET-612 - Add compression to FileEncodingIT tests
PARQUET-623 - DeltaByteArrayReader has incorrect skip behaviour
PARQUET-642 - Improve performance of ByteBuffer based read / write paths
PARQUET-645 - DictionaryFilter incorrectly handles null
PARQUET-651 - Parquet-avro fails to decode array of record with a single field name "element" correctly
PARQUET-660 - Writing Protobuf messages with extensions results in an error or data corruption.
PARQUET-663 - Link are Broken in README.md
PARQUET-674 - Add an abstraction to get the length of a stream
PARQUET-685 - Deprecated ParquetInputSplit constructor passes parameters in the wrong order.
PARQUET-726 - TestMemoryManager consistently fails
PARQUET-743 - DictionaryFilters can re-use StreamBytesInput when compressed

Improvement

PARQUET-77 - Improvements in ByteBuffer read path
PARQUET-99 - Large rows cause unnecessary OOM exceptions
PARQUET-146 - make Parquet compile with java 7 instead of java 6
PARQUET-318 - Remove unnecessary objectmapper from ParquetMetadata
PARQUET-327 - Show statistics in the dump output
PARQUET-341 - Improve write performance with wide schema sparse data
PARQUET-343 - Caching nulls on group node to improve write performance on wide schema sparse data
PARQUET-358 - Add support for temporal logical types to AVRO/Parquet conversion
PARQUET-361 - Add prerelease logic to semantic versions
PARQUET-384 - Add Dictionary Based Filtering to Filter2 API
PARQUET-386 - Printing out the statistics of metadata in parquet-tools
PARQUET-397 - Pig Predicate Pushdown using Filter2 API
PARQUET-421 - Fix mismatch of javadoc names and method parameters in module encoding, column, and hadoop
PARQUET-427 - Push predicates into the whole read path
PARQUET-432 - Complete a todo for method ColumnDescriptor.compareTo()
PARQUET-460 - Parquet files concat tool
PARQUET-480 - Update for Cascading 3.0
PARQUET-484 - Warn when Decimal is stored as INT64 while could be stored as INT32
PARQUET-543 - Remove BoundedInt encodings
PARQUET-585 - Slowly ramp up sizes of int[]s in IntList to keep sizes small when data sets are small
PARQUET-654 - Make record-level filtering optional
PARQUET-668 - Provide option to disable auto crop feature in DumpCommand output
PARQUET-727 - Ensure correct version of thrift is used
PARQUET-740 - Introduce editorconfig

New Feature

PARQUET-225 - INT64 support for Delta Encoding
PARQUET-382 - Add a way to append encoded blocks in ParquetFileWriter
PARQUET-429 - Enables predicates collecting their referred columns
PARQUET-548 - Add Java metadata for PageEncodingStats
PARQUET-669 - Allow reading file footers from input streams when writing metadata files

Task

PARQUET-392 - Release Parquet-mr 1.9.0
PARQUET-404 - Replace git@github.com.apache for HTTPS URL on dev/README.md to avoid permission issues
PARQUET-696 - Move travis download from google code (defunct) to github

Test

PARQUET-355 - Create Integration tests to validate statistics
PARQUET-378 - Add thoroughly parquet test encodings

Version 1.8.1

Bug

PARQUET-331 - Merge script doesn't surface stderr from failed sub processes
PARQUET-336 - ArrayIndexOutOfBounds in checkDeltaByteArrayProblem
PARQUET-337 - binary fields inside map/set/list are not handled in parquet-scrooge
PARQUET-338 - Readme references wrong format of pull request title

Improvement

PARQUET-279 - Check empty struct in the CompatibilityChecker util

Task

PARQUET-339 - Add Alex Levenson to KEYS file

Version 1.8.0

Bug

PARQUET-151 - Null Pointer exception in parquet.hadoop.ParquetFileWriter.mergeFooters
PARQUET-152 - Encoding issue with fixed length byte arrays
PARQUET-164 - Warn when parquet memory manager kicks in
PARQUET-199 - Add a callback when the MemoryManager adjusts row group size
PARQUET-201 - Column with OriginalType INT_8 failed at filtering
PARQUET-227 - Parquet thrift can write unions that have 0 or more than 1 set value
PARQUET-246 - ArrayIndexOutOfBoundsException with Parquet write version v2
PARQUET-251 - Binary column statistics error when reuse byte[] among rows
PARQUET-252 - parquet scrooge support should support nested container type
PARQUET-254 - Wrong exception message for unsupported INT96 type
PARQUET-269 - Restore scrooge-maven-plugin to 3.17.0 or greater
PARQUET-284 - Should use ConcurrentHashMap instead of HashMap in ParquetMetadataConverter
PARQUET-285 - Implement nested types write rules in parquet-avro
PARQUET-287 - Projecting unions in thrift causes TExceptions in deserializatoin
PARQUET-296 - Set master branch version back to 1.8.0-SNAPSHOT
PARQUET-297 - created_by in file meta data doesn't contain parquet library version
PARQUET-314 - Fix broken equals implementation(s)
PARQUET-316 - Run.sh is broken in parquet-benchmarks
PARQUET-317 - writeMetaDataFile crashes when a relative root Path is used
PARQUET-320 - Restore semver checks
PARQUET-324 - row count incorrect if data file has more than 2^31 rows
PARQUET-325 - Do not target row group sizes if padding is set to 0
PARQUET-329 - ThriftReadSupport#THRIFT_COLUMN_FILTER_KEY was removed (incompatible change)

Improvement

PARQUET-175 - Allow setting of a custom protobuf class when reading parquet file using parquet-protobuf.
PARQUET-223 - Add Map and List builiders
PARQUET-245 - Travis CI runs tests even if build fails
PARQUET-248 - Simplify ParquetWriters's constructors
PARQUET-253 - AvroSchemaConverter has confusing Javadoc
PARQUET-259 - Support Travis CI in parquet-cpp
PARQUET-264 - Update README docs for graduation
PARQUET-266 - Add support for lists of primitives to Pig schema converter
PARQUET-272 - Updates docs decscription to match data model
PARQUET-274 - Updates URLs to link against the apache user instead of Parquet on github
PARQUET-276 - Updates CONTRIBUTING file with new repo info
PARQUET-286 - Avro object model should use Utf8
PARQUET-288 - Add dictionary support to Avro converters
PARQUET-289 - Allow object models to extend the ParquetReader builders
PARQUET-290 - Add Avro data model to the reader builder
PARQUET-306 - Improve alignment between row groups and HDFS blocks
PARQUET-308 - Add accessor to ParquetWriter to get current data size
PARQUET-309 - Remove unnecessary compile dependency on parquet-generator
PARQUET-321 - Set the HDFS padding default to 8MB
PARQUET-327 - Show statistics in the dump output

New Feature

PARQUET-229 - Make an alternate, stricter thrift column projection API
PARQUET-243 - Add avro-reflect support

Task

PARQUET-262 - When 1.7.0 is released, restore semver plugin config
PARQUET-292 - Release Parquet 1.8.0

Version 1.7.0

PARQUET-23 - Rename to org.apache.

Version 1.6.0

Bug

PARQUET-3 - tool to merge pull requests based on Spark
PARQUET-4 - Use LRU caching for footers in ParquetInputFormat.
PARQUET-8 - [parquet-scrooge] mvn eclipse:eclipse fails on parquet-scrooge
PARQUET-9 - InternalParquetRecordReader will not read multiple blocks when filtering
PARQUET-18 - Cannot read dictionary-encoded pages with all null values
PARQUET-19 - NPE when an empty file is included in a Hive query that uses CombineHiveInputFormat
PARQUET-21 - Fix reference to 'github-apache' in dev docs
PARQUET-56 - Added an accessor for the Long column type in example Group
PARQUET-62 - DictionaryValuesWriter dictionaries are corrupted by user changes.
PARQUET-63 - Fixed-length columns cannot be dictionary encoded.
PARQUET-66 - InternalParquetRecordWriter int overflow causes unnecessary memory check warning
PARQUET-69 - Add committer doc and REVIEWERS files
PARQUET-70 - PARQUET #36: Pig Schema Storage to UDFContext
PARQUET-75 - String decode using 'new String' is slow
PARQUET-80 - upgrade semver plugin version to 0.9.27
PARQUET-82 - ColumnChunkPageWriteStore assumes pages are smaller than Integer.MAX_VALUE
PARQUET-88 - Fix pre-version enforcement.
PARQUET-94 - ParquetScroogeScheme constructor ignores klass argument
PARQUET-96 - parquet.example.data.Group is missing some methods
PARQUET-97 - ProtoParquetReader builder factory method not static
PARQUET-101 - Exception when reading data with parquet.task.side.metadata=false
PARQUET-104 - Parquet writes empty Rowgroup at the end of the file
PARQUET-106 - Relax InputSplit Protections
PARQUET-107 - Add option to disable summary metadata aggregation after MR jobs
PARQUET-114 - Sample NanoTime class serializes and deserializes Timestamp incorrectly
PARQUET-122 - make parquet.task.side.metadata=true by default
PARQUET-124 - parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
PARQUET-132 - AvroParquetInputFormat should use a parameterized type
PARQUET-135 - Input location is not getting set for the getStatistics in ParquetLoader when using two different loaders within a Pig script.
PARQUET-136 - NPE thrown in StatisticsFilter when all values in a string/binary column trunk are null
PARQUET-142 - parquet-tools doesn't filter _SUCCESS file
PARQUET-145 - InternalParquetRecordReader.close() should not throw an exception if initialization has failed
PARQUET-150 - Merge script requires ':' in PR names
PARQUET-157 - Divide by zero in logging code
PARQUET-159 - paquet-hadoop tests fail to compile
PARQUET-162 - ParquetThrift should throw when unrecognized columns are passed to the column projection API
PARQUET-168 - Wrong command line option description in parquet-tools
PARQUET-173 - StatisticsFilter doesn't handle And properly
PARQUET-174 - Fix Java6 compatibility
PARQUET-176 - Parquet fails to parse schema contains '\r'
PARQUET-180 - Parquet-thrift compile issue with 0.9.2.
PARQUET-184 - Add release scripts and documentation
PARQUET-186 - Poor performance in SnappyCodec because of string concat in tight loop
PARQUET-187 - parquet-scrooge doesn't compile under 2.11
PARQUET-188 - Parquet writes columns out of order (compared to the schema)
PARQUET-189 - Support building parquet with thrift 0.9.0
PARQUET-196 - parquet-tools command to get rowcount & size
PARQUET-197 - parquet-cascading and the mapred API does not create metadata file
PARQUET-202 - Typo in the connection info in the pom prevents publishing an RC
PARQUET-207 - ParquetInputSplit end calculation bug
PARQUET-208 - revert PARQUET-197
PARQUET-214 - Avro: Regression caused by schema handling
PARQUET-215 - Parquet Thrift should discard records with unrecognized union members
PARQUET-216 - Decrease the default page size to 64k
PARQUET-217 - Memory Manager's min allocation heuristic is not valid for schemas with many columns
PARQUET-232 - minor compilation issue
PARQUET-234 - Restore ParquetInputSplit methods from 1.5.0
PARQUET-235 - Fix compatibility of parquet.metadata with 1.5.0
PARQUET-236 - Check parquet-scrooge compatibility
PARQUET-237 - Check ParquetWriter constructor compatibility with 1.5.0
PARQUET-239 - Make AvroParquetReader#builder() static
PARQUET-242 - AvroReadSupport.setAvroDataSupplier is broken

Improvement

PARQUET-2 - Adding Type Persuasion for Primitive Types
PARQUET-25 - Pushdown predicates only work with hardcoded arguments
PARQUET-52 - Improve the encoding fall back mechanism for Parquet 2.0
PARQUET-57 - Make dev commit script easier to use
PARQUET-61 - Avoid fixing protocol events when there is not required field missing
PARQUET-74 - Use thread local decoder cache in Binary toStringUsingUTF8()
PARQUET-79 - Add thrift streaming API to read metadata
PARQUET-84 - Add an option to read the rowgroup metadata on the task side.
PARQUET-87 - Better and unified API for projection pushdown on cascading scheme
PARQUET-89 - All Parquet CI tests should be run against hadoop-2
PARQUET-92 - Parallel Footer Read Control
PARQUET-105 - Refactor and Document Parquet Tools
PARQUET-108 - Parquet Memory Management in Java
PARQUET-115 - Pass a filter object to user defined predicate in filter2 api
PARQUET-116 - Pass a filter object to user defined predicate in filter2 api
PARQUET-117 - implement the new page format for Parquet 2.0
PARQUET-119 - add data_encodings to ColumnMetaData to enable dictionary based predicate push down
PARQUET-121 - Allow Parquet to build with Java 8
PARQUET-128 - Optimize the parquet RecordReader implementation when: A. filterpredicate is pushed down , B. filterpredicate is pushed down on a flat schema
PARQUET-133 - Upgrade snappy-java to 1.1.1.6
PARQUET-134 - Enhance ParquetWriter with file creation flag
PARQUET-140 - Allow clients to control the GenericData object that is used to read Avro records
PARQUET-141 - improve parquet scrooge integration
PARQUET-160 - Simplify CapacityByteArrayOutputStream
PARQUET-165 - A benchmark module for Parquet would be nice
PARQUET-177 - MemoryManager ensure minimum Column Chunk size
PARQUET-181 - Scrooge Write Support
PARQUET-191 - Avro schema conversion incorrectly converts maps with nullable values.
PARQUET-192 - Avro maps drop null values
PARQUET-193 - Avro: Implement read compatibility rules for nested types
PARQUET-203 - Consolidate PathFilter for hidden files
PARQUET-204 - Directory support for parquet-schema
PARQUET-210 - JSON output for parquet-cat

New Feature

PARQUET-22 - Parquet #13: Backport of HIVE-6938
PARQUET-49 - Create a new filter API that supports filtering groups of records based on their statistics
PARQUET-64 - Add new logical types to parquet-column
PARQUET-123 - Add dictionary support to AvroIndexedRecordReader
PARQUET-198 - parquet-cascading Add Parquet Avro Scheme

Task

PARQUET-50 - Remove items from semver blacklist
PARQUET-139 - Avoid reading file footers in parquet-avro InputFormat
PARQUET-190 - Fix an inconsistent Javadoc comment of ReadSupport.prepareForRead
PARQUET-230 - Add build instructions to the README

Version 1.5.0

ISSUE 399: Fixed resetting stats after writePage bug, unit testing of readFooter
ISSUE 397: Fixed issue with column pruning when using requested schema
ISSUE 389: Added padding for requested columns not found in file schema
ISSUE 392: Value stats fixes
ISSUE 338: Added statistics to Parquet pages and rowGroups
ISSUE 351: Fix bug #350, fixed length argument out of order.
ISSUE 378: configure semver to enforce semantic versioning
ISSUE 355: Add support for DECIMAL type annotation.
ISSUE 336: protobuf dependency version changed from 2.4.1 to 2.5.0
ISSUE 337: issue #324, move ParquetStringInspector to org.apache.hadoop.hive.serde...

Version 1.4.3

ISSUE 381: fix metadata concurency problem

Version 1.4.2

ISSUE 359: Expose values in SimpleRecord
ISSUE 335: issue #290, hive map conversion to parquet schema
ISSUE 365: generate splits by min max size, and align to HDFS block when possible
ISSUE 353: Fix bug: optional enum field causing ScroogeSchemaConverter to fail
ISSUE 362: Fix output bug during parquet-dump command
ISSUE 366: do not call schema converter to generate projected schema when projection is not set
ISSUE 367: make ParquetFileWriter throw IOException in invalid state case
ISSUE 352: Parquet thrift storer
ISSUE 349: fix header bug

Version 1.4.1

ISSUE 344: select * from parquet hive table containing map columns runs into exception. Issue #341.
ISSUE 347: set reading length in ThriftBytesWriteSupport to avoid potential OOM cau...
ISSUE 346: stop using strings and b64 for compressed input splits
ISSUE 345: set cascading version to 2.5.3
ISSUE 342: compress kv pairs in ParquetInputSplits

Version 1.4.0

ISSUE 333: Compress schemas in split
ISSUE 329: fix filesystem resolution
ISSUE 320: Spelling fix
ISSUE 319: oauth based authentication; fix grep change
ISSUE 310: Merge parquet tools
ISSUE 314: Fix avro schema conv for arrays of optional type for #312.
ISSUE 311: Avro null default values bug
ISSUE 316: Update poms to use thrift.exectuable property.
ISSUE 285: [CASCADING] Provide the sink implementation for ParquetTupleScheme
ISSUE 264: Native Protocol Buffer support
ISSUE 293: Int96 support
ISSUE 313: Add hadoop Configuration to Avro and Thrift writers (#295).
ISSUE 262: Scrooge schema converter and projection pushdown in Scrooge
ISSUE 297: Ports HIVE-5783 to the parquet-hive module
ISSUE 303: Avro read schema aliases
ISSUE 299: Fill in default values for new fields in the Avro read schema
ISSUE 298: Bugfix reorder thrift fields causing writting nulls
ISSUE 289: first use current thread's classloader to load a class, if current threa...
ISSUE 292: Added ParquetWriter() that takes an instance of Hadoop's Configuration.
ISSUE 282: Avro default read schema
ISSUE 280: style: junit.framework to org.junit
ISSUE 270: Make ParquetInputSplit extend FileSplit

Version 1.3.2

ISSUE 271: fix bug: last enum index throws DecodingSchemaMismatchException
ISSUE 268: fixes #265: add semver validation checks to non-bundle builds
ISSUE 269: Bumps parquet-jackson parent version
ISSUE 260: Shade jackson only once for all parquet modules

Version 1.3.1

ISSUE 267: handler only handle ignored field, exception during will be thrown as Sk...
ISSUE 266: upgrade parquet-mr to elephant-bird 4.4

Version 1.3.0

ISSUE 258: Optimize scan
ISSUE 259: add delta length byte arrays and delta byte arrays encodings
ISSUE 249: make summary files read in parallel; improve memory footprint of metadata; avoid unnecessary seek
ISSUE 257: Create parquet-hadoop-bundle which will eventually replace parquet-hive-bundle
ISSUE 253: Delta Binary Packing for Int
ISSUE 254: Add writer version flag to parquet and make initial changes for supported parquet 2.0 encodings
ISSUE 256: Resolves issue #251 by doing additional checks if Hive returns "Unknown" as a version
ISSUE 252: refactor error handler for BufferedProtocolReadToWrite to be non-static

Version 1.2.11

ISSUE 250: pretty_print_json_for_compatibility_checker
ISSUE 243: add parquet cascading integration documentation
ISSUE 248: More Hadoop 2 compatibility fixes

Version 1.2.10

ISSUE 247: fix bug: when field index is greater than zero
ISSUE 244: Feature/error handler
ISSUE 187: Plumb OriginalType
ISSUE 245: integrate parquet format 2.0

Version 1.2.9

ISSUE 242: upgrade elephant-bird version to 4.3
ISSUE 240: fix loader cache
ISSUE 233: use latest stable release of cascading: 2.5.1
ISSUE 241: Update reference to 0.10 in Hive012Binding javadoc
ISSUE 239: Fix hive map and array inspectors with null containers
ISSUE 234: optimize chunk scan; fix compressed size
ISSUE 237: Handle codec not found
ISSUE 238: fix pom version caused by bad merge
ISSUE 235: Not write pig meta data only when pig is not avaliable
ISSUE 227: Breaks parquet-hive up into several submodules, creating infrastructure ...
ISSUE 229: add changelog tool
ISSUE 236: Make cascading a provided dependency

Version 1.2.8

ISSUE 228: enable globing files for parquetTupleScheme, refactor unit tests and rem...
ISSUE 224: Changing read and write methods in ParquetInputSplit so that they can de...

Version 1.2.8

ISSUE 228: enable globing files for parquetTupleScheme, refactor unit tests and rem...
ISSUE 224: Changing read and write methods in ParquetInputSplit so that they can de...

Version 1.2.7

ISSUE 223: refactor encoded values changes and test that resetDictionary works
ISSUE 222: fix bug: set raw data size to 0 after reset

Version 1.2.6

ISSUE 221: make pig, hadoop and log4j jars provided
ISSUE 220: parquet-hive should ship and uber jar
ISSUE 213: group parquet-format version in one property
ISSUE 215: Fix Binary.equals().
ISSUE 210: ParquetWriter ignores enable dictionary and validating flags.
ISSUE 202: Fix requested schema when recreating splits in hive
ISSUE 208: Improve dic fall back
ISSUE 207: Fix offset
ISSUE 206: Create a "Powered by" page

Version 1.2.5

ISSUE 204: ParquetLoader.inputFormatCache as WeakHashMap
ISSUE 203: add null check for EnumWriteProtocol
ISSUE 205: use cascading 2.2.0
ISSUE 199: simplify TupleWriteSupport constructor
ISSUE 164: Dictionary changes
ISSUE 196: Fixes to the Hive SerDe
ISSUE 197: RLE decoder reading past the end of the stream
ISSUE 188: Added ability to define arbitrary predicate functions
ISSUE 194: refactor serde to remove some unecessary boxing and include dictionary awareness
ISSUE 190: NPE in DictionaryValuesWriter.

Version 1.2.4

ISSUE 191: Add compatibility checker for ThriftStruct to check for backward compatibility of two thrift structs

Version 1.2.3

ISSUE 186: add parquet-pig-bundle
ISSUE 184: Update ParquetReader to take Configuration as a constructor argument.
ISSUE 183: Disable the time read counter check in DeprecatedInputFormatTest.
ISSUE 182: Fix a maven warning about a missing version number.
ISSUE 181: FIXED_LEN_BYTE_ARRAY support
ISSUE 180: Support writing Avro records with maps with Utf8 keys
ISSUE 179: Added Or/Not logical filters for column predicates
ISSUE 172: Add sink support for parquet.cascading.ParquetTBaseScheme
ISSUE 169: Support avro records with empty maps and arrays
ISSUE 162: Avro schema with empty arrays and maps

Version 1.2.2

ISSUE 175: fix problem with projection pushdown in parquetloader
ISSUE 174: improve readability by renaming variables
ISSUE 173: make numbers in log messages easy to read in InternalParquetRecordWriter
ISSUE 171: add unit test for parquet-scrooge
ISSUE 165: distinguish recoverable exception in BufferedProtocolReadToWrite
ISSUE 166: support projection when required fields in thrift class are not projected

Version 1.2.1

ISSUE 167: fix oom error dues to bad estimation

Version 1.2.0

ISSUE 154: improve thrift error message
ISSUE 161: support schema evolution
ISSUE 160: Resource leak in parquet.hadoop.ParquetFileReader.readFooter(Configurati...
ISSUE 163: remove debugging code from hot path
ISSUE 155: Manual pushdown for thrift read support
ISSUE 159: Counter for mapred
ISSUE 156: Fix site
ISSUE 153: Fix projection required field

Version 1.1.1

ISSUE 150: add thrift validation on read

Version 1.1.0

ISSUE 149: changing default block size to 128mb
ISSUE 146: Fix and add unit tests for Hive nested types
ISSUE 145: add getStatistics method to parquetloader
ISSUE 144: Map key fields should allow other types than strings
ISSUE 143: Fix empty encoding col metadata
ISSUE 142: Fix total size row group
ISSUE 141: add parquet counters for benchmark
ISSUE 140: Implemented partial schema for GroupReadSupport
ISSUE 138: fix bug of wrong column metadata size
ISSUE 137: ParquetMetadataConverter bug
ISSUE 133: Update plugin versions for maven aether migration - fixes #125
ISSUE 130: Schema validation should not validate the root element's name
ISSUE 127: Adding dictionary encoding for non string types.. #99
ISSUE 125: Unable to build
ISSUE 124: Fix Short and Byte types in Hive SerDe.
ISSUE 123: Fix Snappy compressor in parquet-hadoop.
ISSUE 120: Fix RLE bug with partial literal groups at end of stream.
ISSUE 118: Refactor column reader
ISSUE 115: Map key fields should allow other types than strings
ISSUE 103: Map key fields should allow other types than strings
ISSUE 99: Dictionary encoding for non string types (float double int long boolean)
ISSUE 47: Add tests for parquet-scrooge and parquet-cascading

Version 1.0.1

ISSUE 126: Unit tests for parquet cascading
ISSUE 121: fix wrong RecordConverter for ParquetTBaseScheme
ISSUE 119: fix compatibility with thrift remove unused dependency

Files

CHANGES.md

Latest commit

History

CHANGES.md

File metadata and controls

Parquet

Version 1.14.1

Bug

Version 1.14.0

Bug

New Feature

Improvement

Test

Task

Version 1.13.1

Improvement

Version 1.13.0

New Feature

Task

Improvement

Bug

Test

Version 1.12.3

New Feature

Task

Improvement

Bug

Version 1.12.2

Bug

Version 1.12.1

Bug

Improvement

Version 1.12.0

Sub-task

Bug

New Feature

Improvement

Test

Wish

Task

Version 1.11.0

Bug

New Feature

Improvement

Test

Wish

Task

Version 1.10.1

Bug

Version 1.10.0

Bug

New Feature

Improvement

Task

Version 1.9.0

Bug

Improvement

New Feature

Task

Test

Version 1.8.1

Bug

Improvement

Task

Version 1.8.0

Bug

Improvement

New Feature

Task

Version 1.7.0

Version 1.6.0

Bug

Improvement

New Feature

Task

Version 1.5.0

Version 1.4.3

Version 1.4.2

Version 1.4.1