Releases: GridTools/gridtools
GridTools version 2.3.5
GridTools version 2.3.4
changes since v2.3.2 (v2.3.3 removed because it introduced a breaking change in the nanobind adapter)
Performance improvements
- Introduce GT_PROMISE for __builtin_assume by @havogt, @iomaganaris in #1785, #1788
Bug fixes
- Bug: storage/gpu.h functions within CUDA_ARCH by @havogt in #1778
- Update nanobind to v2 by @havogt in #1777, #1790
- Update minimum required boost to 1.73 by @havogt in #1772
Tests
- Add Missing fn_unstructured_nabla_fused_tuple_of_fields to Regression & Performance Tests by @fthaler in #1783
- Improved Data Layout for Neighbor Tables by @fthaler in #1782
CI / Deployment
- test NVHPC 23.9 by @havogt in #1769
- build: Update deployment action with trusted publisher by @havogt in #1770
- build: Add download wheel step in deployment action by @havogt in #1771
- CI: test CUDA 12.4 by @havogt in #1773
- GitHub actions: Update compiler versions in configure test by @havogt in #1760
- Move to CUDA 11.2 for daint nvcc gcc by @havogt in #1786
GridTools version 2.3.2
GridTools version 2.3.1
GridTools version 2.3.0
Support for NVHPC (#1747)
GridTools now supports NVHPC starting from release 23.3!
Parallel fn::backend::naive (#1746)
Naive (just parallel for
, no blocking and other optimizations) OpenMP parallelization of the naive backend.
SID util to transform a dimension to a tuple_like element type (#1750)
Translates a SID with dimension D
and element type T
to a SID with D
removed and type is tuple<T>
-like, with tuple_size N
for sid::dimension_to_tuple_like<D, N>(sid)
.
Bug fixes and smaller features
Build fixes
- Support for Clang 16 (#1751)
and other changes already included in v2.2.3
fn: SID neighbor table wrapper (#1730)
Adds a simple class that wraps a SID and implements the neighbour table concept. (Picked for convenience into 2.2.2.)
Support for Python packaging (#1720)
Starting with this release we will publish GridTools C++ on pypi.org to make it easier to consume GridTools C++ from GT4Py.
Bug fixes
- Fix CUDA 12.0 compilation (#1741)
- Improvements to Python packaging (#1742, #1743, #1744)
- Fix get_keys of empty hymap (#1728)
- fn: CUDA early exit on empty grid - an empty domain skips execution instead of erroring (#1729)
- fn: prefer qualified names over ADL for fn builtins (they are not customization points for the user) (#1731, #1732)
- Enable workarounds for CUDA 11.8 (#1734)
- Enable workarounds for Clang 15 (#1735)
- Update pybind11 version to fix wrong C++ standard (#1723)
- Fix perfect forwarding in sid::composite::make_values (#1722)
- Workaround for NVCC bug in gcl (present in 11.6, 11.7 and most likely in 11.8) (#1726)
Performance fixes
- Alternative skip value check in fn, which improves CUDA performance (#1721)
Build fixes
- Fix perftests CMake target when no tests are added (#1724)
Cleanup
- Replace boost::variant by std::variant (#1718)
CI
Contributions
This release contains contributions from
@DropD, @egparedes, @fthaler, @havogt, @petiaccja
GridTools version 2.0.1
Bug fixes
- Fix: storage_gpu for HIPCC-AMDGPU (#1540)
- Performance fix for C++17 (#1618)
- Enable several CUDA workarounds for recent compilers (#1681 and others)
- Some declarations to definitions
- Workaround gtest incompatibilities in recent compilers
CI
- Compile (and run) all tests on GitHub actions (Jenkins doesn't run on v2.0.x anymore)
GridTools version 2.2.3
GridTools version 2.2.2
fn: SID neighbor table wrapper (#1730)
Adds a simple class that wraps a SID and implements the neighbour table concept. (Picked for convenience into 2.2.2.)
Support for Python packaging (#1720)
Starting with this release we will publish GridTools C++ on pypi.org to make it easier to consume GridTools C++ from GT4Py.
Bug fixes
- Fix get_keys of empty hymap (#1728)
- fn: CUDA early exit on empty grid - an empty domain skips execution instead of erroring (#1729)
- fn: prefer qualified names over ADL for fn builtins (they are not customization points for the user) (#1731, #1732)
- Enable workarounds for CUDA 11.8 (#1734)
- Enable workarounds for Clang 15 (#1735)
Build fixes
- Fix perftests CMake target when no tests are added (#1724)
GridTools version 2.2.1
Bug fixes
- Update pybind11 version to fix wrong C++ standard (#1723)
- Fix perfect forwarding in sid::composite::make_values (#1722)
- Workaround for NVCC bug in gcl (present in 11.6, 11.7 and most likely in 11.8) (#1726)
Performance fixes
- Alternative skip value check in fn, which improves CUDA performance (#1721)
Cleanup
- Replace boost::variant by std::variant (#1718)
GridTools version 2.2.0
C++ standard upgraded to C++17
Starting with this version of GridTools, we require the C++17 standard (#1680) and improved the code base using C++17 features (#1693, #1716, #1697):
- Get rid of
tuple_util::make
GT_CONSTEXPR
andGT_CONSTEXPR_TARGET
goes awaywstd
stuff goes awayis_trivially_copy_constructible
check is consistently used instead ofis_trivially_copyable
where the data is passed host/device boundary, because it is exactly what is needed.make_[smth]
pattern is replaced to template argument deduction in several places, the old pattern is deprecatedcomposite
is rewritten using c++17overload
is rewritten using c++17std::[smth]_v<...>
are used instead ofstd::[smth]<...>::value
static_assert(<cond>)
used instead ofstatic_assert(<cond>, "")
- CTAD for
simple_ptr_holder
(#1701, #1708)
If you were using functionality from the internal library common
you might have to update your code (all common
is considered internal API, see Release process). The most common change is using CTAD instead of makers where possible. If not possible due to compiler bugs, the maker pattern was updated to be independant of tuple_util::make
. E.g. replace
tuple_util::make<tuple>(...)
bytuple(...)
tuple_util::make<array>(...)
byarray(...)
sid::composite::make<...>(...)
bysid::composite::keys<...>::make_values(...)
tuple_util::make<hymap::keys<...>::values>(...)
byhymap::keys<...>::make_values(...);
New library fn
: functional model backend
The fn
library provides functionality for the Declarative GT4Py to implement a backend for the functional model. It supports (naive, no-blocking) CPU and (efficient) GPU (CUDA) execution for structured (Cartesian) and unstructured grids. See examples in tests/regression/fn/
.
The library provides a high-level, human-readable frontend, but is mainly meant as a target for code generators.
- Introduce functional model backend (#1648, #1666, #1679)
- Implements fn::extents (#1683)
- Column Stage (#1685)
- New Backend Backends (#1695)
- Fn Frontend (#1698)
- Performance References for FN Backends (#1711)
- Add fn::tuple_get and fn::make_tuple (#1713)
- Allow setting CUDA stream (#1712)
Minor new features
Minor improvements
- Extensions to meta and hymap (#1663)
- Soften sid value type requirements from
std::trivially_copyable
tostd::trivially_copy_constructible
(#1663) is_tuple_like
(#1676) andis_hymap
(#1677)
Bug fixes
- Propagate CXX_STANDARD to all tests (#1664)
- Compilation fixes for nvcc 11.5 and clang 12 with std=c++20 (#1665)
- Workaround for nvcc bug https://godbolt.org/z/orrev1xnM (#1681)
- c_bindings example: fix typo and split cpu and gpu fortran sources (#1684)
- Fix unused param warnings (#1706)
- Fix Compilation with CUDA 11.6 (#1710)
- Support for Clang 14 (#1707)
Testing
- Add C++20 with Cray Clang on Piz Daint to Jenkins CI (#1675)
- Perftest Updates (#1690)
- CI dom: Downgrade to gcc 10.3 for CUDA toolkit support (#1699)
Contributions
This release contains contributions from
@anstaf, @fthaler, @havogt.