Releases · modin-project/modin

16 Jun 13:45

vnlitvinov

0.15.1

efdc97c

Modin 0.15.1

This release pins Ray < 1.13.0 to avoid deserialization race condition.

Key Features and Updates

Stability and Bugfixes
- FIX-#4566: Pin Ray < 1.13.0 to avoid deserialization race condition. (#4567)

Contributors

@mvashishtha

Contributors

mvashishtha

Assets 2

08 Jun 16:38

RehanSD

0.15.0

efdfbac

Modin 0.15.0

This release includes updated support for pandas 1.4.2, new Batch and Logging APIs, and a plethora
of bug fixes and documentation improvements.

Key Features and Updates

Stability and Bugfixes
- FIX-#4376: Upgrade pandas to 1.4.2 (#4377)
- FIX-#3615: Relax some deps in development env (#4365)
- FIX-#4370: Fix broken docstring links (#4375)
- FIX-#4392: Align Modin XGBoost with xgb>=1.6 (#4393)
- FIX-#4385: Get rid of use-deprecated option in pip (#4386)
- FIX-#3527: Fix parquet partitioning issue causing negative row length partitions (#4368)
- FIX-#4330: Override the memory limit to start ray 1.11.0 on Macs (#4335)
- FIX-#4407: Align insert function with pandas in case of numpy array with several columns (#4408)
- FIX-#4373: Fix invalid file path when trying read_csv_glob with usecols parameter (#4405)
- FIX-#4394: Fix issue with multiindex metadata desync (#4395)
- FIX-#4438: Fix reindex function that doesn't preserve initial index metadata (#4442)
- FIX-#4425: Add parameters to groupby pct_change (#4429)
- FIX-#4457: Fix loc in case when need reindex item (#4457)
- FIX-#4414: Add missing f prefix on f-strings found at https://codereview.doctor/ (#4415)
- FIX-#4461: Fix S3 CSV data path (#4462)
- FIX-#4467: drop_duplicates no longer removes items based on index values (#4468)
- FIX-#4449: Drain the call queue before waiting on result in benchmark mode (#4472)
- FIX-#4518: Fix Modin Logging to report specific Modin warnings/errors (#4519)
- FIX-#4481: Allow clipping with a Modin Series of bounds (#4486)
- FIX-#4504: Support na_action in applymap (#4505)
- FIX-#4503: Stop the memory logging thread after session exit (#4515)
- FIX-#4531: Fix a makedirs race condition in to_parquet (#4533)
- FIX-#4464: Refactor Ray utils and quick fix groupby.count failing on virtual partitions (#4490)
- FIX-#4436: Fix to_pydatetime dtype for timezone None (#4437)
- FIX-#4541: Fix merge_asof with non-unique right index (#4542)
Performance enhancements
- FEAT-#4320: Add connectorx as an alternative engine for read_sql (#4346)
- PERF-#4493: Use partition size caches more in Modin dataframe (#4495)
Benchmarking enhancements
- FEAT-#4371: Add logging to Modin (#4372)
- FEAT-#4501: Add RSS Memory Profiling to Modin Logging (#4502)
- FEAT-#4524: Split Modin API and Memory log files (#4526)
Refactor Codebase
- REFACTOR-#4284: use variable length unpacking when getting results from deploy function (#4285)
- REFACTOR-#3642: Move PyArrow storage format usage from main feature to experimental ones (#4374)
- REFACTOR-#4003: Delete the deprecated cloud mortgage example (#4406)
- REFACTOR-#4513: Fix spelling mistakes in docs and docstrings (#4514)
- REFACTOR-#4510: Align experimental and regular IO modules initializations (#4511)
Developer API enhancements
- FEAT-#4359: Add dataframe method to the protocol dataframe (#4360)
Update testing suite
- TEST-#4363: Use Ray from pypi in CI (#4364)
- FIX-#4422: get rid of case sensitivity for warns_that_defaulting_to_pandas (#4423)
- TEST-#4426: Stop passing is_default kwarg to Modin and pandas (#4428)
- FIX-#4439: Fix flake8 CI fail (#4440)
- FIX-#4409: Fix eval_insert utility that doesn't actually check results of insert function (#4410)
- TEST-#4482: Fix getitem and loc with series of bools (#4483).
Documentation improvements
- DOCS-#4296: Fix docs warnings (#4297)
- DOCS-#4388: Turn off fail_on_warning option for docs build (#4389)
- DOCS-#4469: Say that commit messages can start with PERF (#4470).
- DOCS-#4466: Recommend GitHub issues over bug_reports@modin.org (#4474).
- DOCS-#4487: Recommend GitHub issues over feature_requests@modin.org (#4489).
Dependencies
- FIX-#4327: Update min pin for xgboost version (#4328)
- FIX-#4383: Remove pathlib from deps (#4384)
- FIX-#4390: Add redis to Modin dependencies (#4396)
- FIX-#3689: Add black and flake8 into development environment files (#4480)
- TEST-#4516: Add numpydoc to developer requirements (#4517)
New Features
- FEAT-#4412: Add Batch Pipeline API to Modin (#4452)

Contributors

@YarShev
@Garra1980
@prutskov
@alexander3774
@amyskov
@wangxiaoying
@jeffreykennethli
@mvashishtha
@anmyachev
@dchigarev
@devin-petersohn
@jrsacher
@orcahmlee
@naren-ponder
@RehanSD

Contributors

wangxiaoying, mvashishtha, and 13 other contributors

Assets 2

04 May 15:39

devin-petersohn

0.14.1

d7eb019

Modin 0.14.1

This release contains a few key bugfixes and pandas version update.

Key Features and Updates

FIX-#4376: Upgrade pandas to 1.4.2 (#4377)
FIX-#4390: Add redis to Modin dependencies (#4396)
FIX-#3527: Fix parquet partitioning issue causing negative row length partitions (#4368)
FIX-#4330: Override the memory limit to start ray 1.11.0 on Macs. (#4335)
FIX-#4394: Fix issue with multiindex metadata desync (#4395)
FIX-#4373: fix usage of 'read_csv_glob' with 'usecols' parameter (#4405)
FIX-#4425: Add parameters to groupby pct_change. (#4429)

Contributors

@Garra1980, @devin-petersohn, @dchigarev, @jeffreykennethli, @mvashishtha, @YarShev, @anmyachev

Contributors

mvashishtha, devin-petersohn, and 5 other contributors

Assets 2

29 Mar 16:54

YarShev

0.14.0

c5f623f

Modin 0.14.0

This release contains significant upgrades to Developer API, as well as to Modin's documentation,
some refactor codebase and performance enhancements, and multiple bugfixes.

Key Features and Updates

Stability and Bugfixes
- FIX-#4058: Allow pickling empty dataframes and series (#4095)
- FIX-#4136: Fix exercise_3.ipynb example notebook (#4137)
- FIX-#4105: Fix names of pandas options to avoid OptionError (#4109)
- FIX-#3417: Fix read_csv with skiprows and header parameters (#3419)
- FIX-#4142: Fix OmniSci enabling (#4146)
- FIX-#4162: Use skipif instead of skip for compatibility with pytest 7.0 (#4163)
- FIX-#4158: Do not print OmniSci logs to stdout by default (#4159)
- FIX-#4177: Support read_feather from pathlike objects (#4177)
- FIX-#4234: Upgrade pandas to 1.4.1 (#4235)
- FIX-#3368: support unsigned integers in OmniSci backend (#4256)
- FIX-#4057: Allow reading an empty parquet file (#4075)
- FIX-#3884: Fix read_excel() dropping empty rows (#4161)
- FIX-#4257: Fix Categorical() for scalar categories (#4258)
- FIX-#4300: Fix Modin Categorical column dtype categories (#4276)
- FIX-#4208: Fix lazy metadata update for PandasDataFrame.from_labels (#4209)
- FIX-#3981, FIX-#3801, FIX-#4149: Stop broadcasting scalars to set items (#4160)
- FIX-#4185: Fix rolling across column partitions (#4262)
- FIX-#4303: Fix the syntax error in reading from postgres (#4304)
- FIX-#4308: Add proper error handling in df.set_index (#4309)
- FIX-#4056: Allow an empty parse_date list in read_csv_glob (#4074)
- FIX-#4312: Fix constructing categorical frame with duplicate column names (#4313).
- FIX-#4314: Allow passing a series of dtypes to astype (#4318)
- FIX-#4310: Handle lists of lists of ints in read_csv_glob (#4319)
- FIX-#4138, FIX-#4009: remove redundant sorting in the internal
Performance enhancements
- FIX-#4138, FIX-#4009: remove redundant sorting in the internal '.mask()' flow (#4140)
- FIX-#4183: Stop shallow copies from creating global shared state. (#4184)
Benchmarking enhancements
- FIX-#4221: add wait method for PandasOnRayDataframeColumnPartition class (#4231)
Refactor Codebase
- REFACTOR-#3990: remove code duplication in PandasDataframePartition hierarchy (#3991)
- REFACTOR-#4229: remove unused dask_client global variable in modin\pandas\__init__.py (#4230)
- REFACTOR-#3997: remove code duplication for broadcast_apply method (#3996)
- REFACTOR-#3994: remove code duplication for get_indices function (#3995)
- REFACTOR-#4331: remove code duplication for to_pandas, to_numpy functions in QueryCompiler hierarchy (#4332)
- REFACTOR-#4213: Refactor modin/examples/tutorial/ directory (#4214)
- REFACTOR-#4206: add assert check into __init__ method of PandasOnDaskDataframePartition class (#4207)
- REFACTOR-#3900: add flake8-no-implicit-concat plugin and refactor flake8 error codes (#3901)
- REFACTOR-#4093: Refactor base to be smaller (#4220)
- REFACTOR-#4047: Rename cluster directory to cloud in examples (#4212)
- REFACTOR-#3853: interacting with Dask interface through DaskWrapper class (#3854)
- REFACTOR-#4322: Move is_reduce_fn outside of groupby_agg (#4323)
Pandas API implementations and improvements
- FEAT-#3603: add experimental read_custom_text function that can read custom line-by-line text files (#3441)
- FEAT-#979: Enable reading from SQL server (#4279)
Developer API enhancements
- FEAT-#4245: Define base interface for dataframe exchange protocol (#4246)
- FEAT-#4244: Implement dataframe exchange protocol for OmnisciOnNative execution (#4269)
- FEAT-#4144: Implement dataframe exchange protocol for pandas storage format (#4150)
- FEAT-#4342: Support `from_dataframe`` for pandas storage format (#4343)
Update testing suite
- TEST-#3628: Report coverage data for test-internals CI job (#4198)
- TEST-#3938: Test tutorial notebooks in CI (#4145)
- TEST-#4153: Fix condition of running lint-commit and set of CI triggers (#4156)
- TEST-#4201: Add read_parquet, explode, tail, and various arithmetic functions to asv_bench (#4203)
Documentation improvements
- DOCS-#4077: Add release notes template to docs folder (#4078)
- DOCS-#4082: Add pdf/epub/htmlzip formats for doc builds (#4083)
- DOCS-#4168: Fix rendering the examples on troubleshooting page (#4169)
- DOCS-#4151: Add info in troubleshooting page related to Dask engine usage (#4152)
- DOCS-#4172: Refresh Intel Distribution of Modin paragraph (#4175)
- DOCS-#4173: Mention strict channel priority in conda install section (#4178)
- DOCS-#4176: Update OmniSci usage section (#4192)
- DOCS-#4027: Add GIF images and chart to Modin README demonstrating speedups (https://github.com/modin-project/m...

Contributors

paulovn, dorisjlee, and 12 other contributors

Assets 2

18 Mar 08:08

vnlitvinov

0.13.3

bac4031

Modin 0.13.3

This release contains a few key bugfixes and pandas version update.

Key Features and Updates

Stability and Bugfixes
- Stop shallow dataframe copies from creating global shared state (#4184)
- Make PandasOnRayDataframeColumnPartition conformant to partition interface (#4231)
- Fix lazy metadata update for PandasDataFrame.from_labels (#4209)
- Fix Categorical() for scalar categories (#4258)
- Fix some cases when assigning a scalar to a subset of dataframe or series. (#4160)
- Align read_excel() behaviour on empty rows with pandas 1.3+ (#4161)
- Allow reading an empty parquet file. (#4075)
- Pin Dask<2022.2.0 as a temporary fix. (#4218)
- Add proper error handling in df.set_index. (#4309)
Documentation improvements
- Clarify OmniSci activation in its usage section. (#4192)
Upgrade pandas to 1.4.1 (#4235)

Contributors

@mvashishtha @anmyachev @prutskov @devin-petersohn @naren-ponder @YarShev @Garra1980

Contributors

mvashishtha, devin-petersohn, and 5 other contributors

Assets 2

10 Feb 18:43

vnlitvinov

0.13.2

ea6951c

Modin 0.13.2

This release contains documentation polishing and small user experience
improvements.

Key Features and Updates

Mention strict channel priority in conda install section (#4178)
Refresh Intel Distribution of Modin paragraph (#4175)
Add info in troubleshooting page related to Dask engine usage (#4152)
Do not print OmniSci logs to stdout by default (#4159)
Fix rendering the examples on troubleshooting page (#4169)
Use skipif instead of skip for compatibility with pytest 7.0 (#4163)

Contributors

@RehanSD, @YarShev, @dchigarev, @prutskov, @Garra1980

Contributors

RehanSD, Garra1980, and 3 other contributors

Assets 2

04 Feb 18:07

RehanSD

0.13.1

f2aa03f

Modin 0.13.1

This release contains a few key bugfixes and updates to the documentation.

Key Features and Updates

Stability and Bugfixes
- FIX-#4058: Allow pickling empty dataframes and series (#4095)
- FIX-#4105: Fix names of pandas options to avoid OptionError (#4109)
- FIX-#4142: Fix OmniSci enabling (#4146)
Documentation improvements
- DOCS-#4082: Add pdf/epub/htmlzip formats for doc builds (#4083)
- DOCS-#4079: Fix link to PandasDataframe in docs (#4108)

Contributors

@prutskov, @paulovn, @YarShev, @RehanSD, @devin-petersohn,
@mvashishtha

Contributors

paulovn, mvashishtha, and 4 other contributors

Assets 2

27 Jan 01:08

RehanSD

0.13.0

8743203

Modin 0.13.0

This release contains significant upgrades to Modin's documentation,
support for pandas 1.4, new algebra and partitioning layer APIs, and some bugfixes.

Key Features and Updates

Stability and bugfixes
- Support for subscripting Resampler (1a1edfd)
- Fix groupby with column name for by (a04d7b7)
- Workaround for groupby with sort=False with categorical keys (c67a7c5)
- Align default value of REDIS_PASSWORD with Ray's DEFAULT_REDIS_PASSWORD (f79cb85)
- Fix groupby dictionary aggregation when by and columns to aggregate overlap (d42c070)
- Fix read_csv when callables are provided for skip_rows parameter (7c84758)
- Ensure address is not passed to ray.init when running Ray in local mode (02a23d4)
- Ensure that groupby.indices returns positional indices (e9c06f2)
- Fix setting of categorical values (0e36e22)
- Ensure df.__getitem__ respects step attribute of slice (7e85c5d)
- Ensure data argument is delievered to the Dataframe in experimental cloud mode (2f7da1f)
- Fix assigning to a Series with a single item (0d9d14e)
- Fix the default to pandas in pd.DataFrame.sparse.from_spmatrix (ab2855b)
- Fix apply result type inference (ac17ca1)
- Exclude "scripts" from setup package (6224aba)
- Fix assigning a Categorical to a column (cb4e727)
- Ensure df.to_csv propagates metadata (e.g. index) (154697b)
- Update pyarrow requirement in environment files (b55b08d)
Performance enhancements
- Optimize __getitem__ flow for .loc/.iloc (0947ee8)
- Delay instantiation of lazy dtypes on transpose (cd8db0c)
Benchmarking enhancements
- Update benchmarks for groupby that are more representative (0582aa2)
Refactor Codebase
- Update CODEOWNERS to reflect repository after refactor (cde6390)
- Remove duplicate import of FactoryDispatcher in Modin experimental pandas IO (2cfabaf)
- Update Modin to incorporate dataframe algebra (58bbcc3)
Pandas API implementations and improvements
- Add support for storage_options argument to read_csv_glob (7c33afe)
- Add support for dropna argument for groupby.indices and groupby.groups (144a613)
- Ensure relabeling Modin Frame does not lose partition shape (3c740db)
- Update Series.values to default to to_numpy() (67228ef)
- Add support for modin.pandas.show_versions and python -m modin --versions (efe717f)
- Upgrade pandas support to 1.4 (39fbc57)
OmniSci enhancements
- Update benchmarks for groupby that are more representative (9396f23)
- Update documentation on Native + OmniSci (edc1608)
- Add support for getArrowTable() (6882ec2)
- Fix segfault during init when only OmniSci is present (8c8a6a3)
- Optimize append with default arguments (67013f9)
- Fix OmniSci engine enabling for IO functions (9d1a334)
XGBoost enhancements
Developer API enhancements
- Add parameter for minimum partition size (1be66d1)
- Improve documentation for read_csv_glob and ensure warning raised if wildcard not in filepath_or_buffer (be10ba9)
- Expand virtual partitioning utility (8d1004f)
Update testing suite
Documentation improvements
- Improve documentation on pandas on Ray execution (b76dc57)
- Reformat documentation to match pandas documentation theme (cc96f5d)
- Improve documentation on pandas on Python execution (d590de0)
- Improve System view in architecture documentation (6d51921)
- Improve documentation on using pandas on Dask (003f338)
- Improve documentation on pandas on Dask execution (61bf043)
- Add documentation on using pandas on Python (195b668)
- Improve Modin Out of Core documentation (cf426c4)
- Improve documentation on OmniSci on native execution (689faee)
- Improve documentation on IO (ffa67c7)
- Add documentation on factories and parsers (6ca66db)
- Improve documentation for experimental pandas on Ray execution (20abddd)
- Improve documentation for modin.core.dataframe.base and modin.core.dataframe.pandas (cf1e541)
- Update troubleshooting documentation and add FAQs (cc95ae2)
- Improve README introduction and installation sections (a632d1f)
- Update copyright year (7da1dc8)
- Update a link to pandas.read_json (0315823)
- Improve documentation for Modin vs. Dask (34732cb)
- Fix links to the contributing page (81a06d6)
- Remove broken links from supported apis (c04502d)
- Change docs copyright statement to 'Modin Developers' (ed2a7a4)
- Rename Developer page to Development in docs (406af7c)
- Improve "Getting Started" section (4a62bba)
- Update Modin tutorials (76707bf)
- Add back quickstart notebook (4dd97ab)
- Fix links in README and update README and FAQs (5d84042)
- Update Modin module layout in architecture docs (7fcafa7)
- Update documentation with new algebra operators and ModinDataframe (4b70725)
- Add usage guide to documentation (4511566)
- Build docs with Python 3.8 (01c1876)
Dependencies
- Update PyArrow to 6.0 and OmniSci to 5.10.1 (018515f)

Contributors

@anmyachev, @prutskov, @Rubtsowa, @vnlitvinov, @dchigarev, @YarShev, @amyskov,
@mvashishtha, @dorisjlee, @devin-petersohn, @jeffreykennethli, @RehanSD,
@novichkovg, @Lozovskii-Aleksandr, @naren-ponder, @ahallermed, @fexolm,
@adityagp, @susmitpy, @ienkovich

Contributors

adityagp, dorisjlee, and 18 other contributors

Assets 2

19 Dec 03:35

devin-petersohn

0.12.1

34962ec

Modin 0.12.1

This release contains an update to the pandas version and a few bugfixes.

Key Features and Updates
------------------------
* Update supported pandas version to 1.3.5 (b79989a)
* Improvements to groupby
  * Fix `groupby` for case `by` is `None` (40d45c8)
  * Fix handling of dictionary aggregation (29f927b)
  * Return positional indices for Groupby property (c66324d)
* Fix slicing dataframes with `step` property (5651844)
* Fix assignment of data to category column (23dd3f8)

Contributors this release
-------------------------

@Rubtsowa, @prutskov, @dchigarev, @amyskov, @vnlitvinov, @mvashishtha,
@YarShev, @devin-petersohn

Assets 2

24 Nov 01:48

RehanSD

0.12.0

054e7fb

Modin 0.12.0

This release contains a refactor to the codebase, encapsulating
significant amounts of improvements to the maintainability of the code,
and a plethora of bugfixes.

This release also introduces a slack community for Modin users to interact
with Modin developers. Please join us at our [Slack](https://modin.org/slack.html)
to continue the conversation!

Key Features and Updates
------------------------
* Stability and bugfixes
  * Support allowing callables and scalars together in .loc/.iloc (25ea7fd)
  * Ensure .loc with slice and scalar column returns Series (9492878)
  * Fix Modin OmniSci Docker example (b853c51)
  * Ensure Modin OmniSci + Modin Ray Docker containers install packages from conda-forge (032afd6)
  * Determine return type (Series or DataFrame) from one element Series (17ad1f0)
  * Update cloud examples (648b6a0)
  * Fix Modin OmniSci memory leak during `read_csv` (8581ba1)
  * Use `floor` for casting `float` to `int` for OmniSci 5.8.0 (c67a936)
  * Fix .loc on empty DataFrame (2260431)
  * Ensure Modin on Ray does not duplicate writes to disk on `to_csv` when workers die (6178a57)
  * Add support for `storage_options` argument in `read_*` functions except `read_excel` (77a00cc)
  * Ensure Modin Ray correctly raises exceptions when `to_parquet` or `to_csv` fail (8d67cd3)
  * Ensure Modin Ray does not hang when workers crash on `to_csv` (73bf061)
  * Remove platform specific code from `setup.py` to ensure distributions are pure Python (b186e40)
* Refactor Codebase
  * Update import of public index classes to import from `pandas.core.indexes.api ` module (488357a)
  * Replace `try...finally` with pytest fixtures (c349a94)
  * Restructure project files (b37bcf8)
  * Use `fsspec` to open files (b8a9c07)
  * Add LGTM Service to CI (b193fef)
  * Remove extraneous `*NUM_THREADS` environment variables from CI (b925625)
  * Update documentation + code + comment language to reflect new project structure (7a81588)
  * Update language to reflect new project structure and add implementation to BaseDataframeAxisPartition (7ab2d90)
  * De-dupe `read_fwf` and `read_csv` code (2f824f8)
  * Reformat entire codebase with `black` and `flake8` (75f698c)
* Pandas API implementations and improvements
  * Add support for `{true|false}_values` for `read_csv` for Modin OmniSci (9cd93f2)
  * Implement `explode` for Series and DataFrame (ddd4afe)
  * Support reading gzipped fwf (a80cb3b)
  * Add support for `to_parquet` Modin Ray (643596d)
  * Add support for creating an `sqlalchemy` connection with arbitrary arguments (ece98a6, 4a42e04)
  * Add support for `set_index` with different input types (cab37f2)
* XGBoost enhancements
  * Support new DMatrix parameters (4d7f6d4)
* Developer API enhancements
  * Throw custom errors when optional dependencies are missing (53bb047)
  * Improve Modin OmniSci quickstart (167957b)
* Update testing suite
* Documentation improvements
* Dependencies
  * Add fsspec (dependency for IO) to dependencies (44e3f10)
  * Make `botocore` import optional (adc15c6)
  * Pin minimum `s3fs` dependency to fix `aibotocore` issue (8acad95)
  * Update PyArrow to 5.0 and OmniSci to 5.8 (4121358)

Contributors
------------
@ienkovich, @vnlitvinov, @mvashishtha, @devin-petersohn, @dchigarev, @prutskov, @amyskov,
@gshimansky, @anmyachev, @YarShev, @Garra1980, @Rubtsowa, @jeffreykennethli, @RehanSD,
@dorisjlee, @naren-ponder

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key Features and Updates

Contributors

Contributors

Key Features and Updates

Contributors

Contributors

Key Features and Updates

Contributors

Contributors

Key Features and Updates

Contributors

Key Features and Updates

Contributors

Contributors

Key Features and Updates

Contributors

Contributors

Key Features and Updates

Contributors

Contributors

Key Features and Updates

Contributors

Contributors

Releases: modin-project/modin

Modin 0.15.1

Key Features and Updates

Contributors

Contributors

Modin 0.15.0

Key Features and Updates

Contributors

Contributors

Modin 0.14.1

Key Features and Updates

Contributors

Contributors

Modin 0.14.0

Key Features and Updates

Contributors

Modin 0.13.3

Key Features and Updates

Contributors

Contributors

Modin 0.13.2

Key Features and Updates

Contributors

Contributors

Modin 0.13.1

Key Features and Updates

Contributors

Contributors

Modin 0.13.0

Key Features and Updates

Contributors

Contributors

Modin 0.12.1

Modin 0.12.0