Releases: GridTools/gridtools
GridTools version 2.1.0
New features
- Dump backend: outputs a json representation of the stencil specification (#1456)
- Reduction library with naive, CPU and GPU backends (#1590, #1594, #1619)
- SID: Python cuda array interface support (#1596)
Extended features
- Support for compile time length in data stores (#1545)
- Several SID improvements (#1548)
- Structured bindings support for gridtools tuple-like (#1556)
- Improvements for Hugepage Allocation (#1562)
- Add protection against misuse of device namespace (#1581)
- fortran_array_view: allow to disable openacc (#1603)
- Introduce sid::unknown_kind (#1605)
Non-functional changes
- Hold the sids within sid::composite as tuple (#1564)
- Various cleanups and c++17 related changes (#1579)
- C++17 versions of meta::fold (#1549)
- Sid as a proper C++20 concept (#1580, #1582)
Performance
- More Inlining in cpu_kfirst Backend (#1634)
- Support for Compile-Time Unit Stride Dimension for Python SID Adapter (#1635, #1651)
Bug fixes
- K-cache fixes (#1530)
- CMake: Fix storage_gpu for HIPCC-AMDGPU (#1540)
- Remove a warning in hugepage_alloc which warns about a problem which only affects testing code (#1560)
- Improve HIP + OpenMP Compilation (#1578)
- Fix empty composite and add composite::make helper (#1583)
- Fix as_const to work with any SID and be compatible with std::as_const (#1601, #1611)
- SID composite: add static_assert against incorrect kinds (#1604)
- Workaround a CUDA problem: tuple_util::concat remove constexpr var (#1606)
- Improve Compliance with Parallel Model: Limit fusion of k-parallel execution with k-offsets (#1612)
- GCC 9.x: Optimize multishift (#1630)
- Python SID adapter: fix integer format check (#1632)
- GCC 11.x: Compilation fixes (#1641, #1646)
- Fixes for CUDA 11.4 (#1644)
Testing
- Update to GTest v1.11 and minor changes to adapt for changed gtest interface (#1655)
Documentation
- Clarifications to the execution model (#1541)
Contributions
This release contains contributions from
@anstaf, @fthaler, @havogt, @lukasm91.
GridTools version 1.1.4
GridTools version 2.0.0
GridTools v2.0.0
GridTools v2.0.0 comes with an improved API for stencil composition and storage construction.
These changes and a few others (see below) are breaking changes.
Changes since v1.1.0
New API: Stencil Composition
The make_computation
API for composing stencils is replaced by a new stencil specification API, e.g.
auto horizontal_diffusion_spec = [](auto coeff, auto in, auto out) {
GT_DECLARE_TMP(double, lap, flx, fly);
return st::execute_parallel()
.ij_cached(lap, flx, fly)
.stage(lap_function(), lap, in)
.stage(flx_function(), flx, in, lap)
.stage(fly_function(), fly, in, lap)
.stage(out_function(), out, in, flx, fly, coeff);
};
st::run(horizontal_diffusion_spec, stencil_backend_t(), grid, coeff, in, out);
instead of
auto horizontal_diffusion = gt::make_computation<backend_t>(grid,
p_coeff{} = coeff,
gt::make_multistage(gt::enumtype::execute<gt::enumtype::parallel, 20>{},
define_caches(gt::cache<gt::IJ, gt::cache_io_policy::local>(p_lap{}, p_flx{}, p_fly{})),
gt::make_stage<lap_function>(p_lap{}, p_in{}),
gt::make_independent(gt::make_stage<flx_function>(p_flx{}, p_in{}, p_lap{}),
gt::make_stage<fly_function>(p_fly{}, p_in{}, p_lap{})),
gt::make_stage<out_function>(p_out{}, p_in{}, p_flx{}, p_fly{}, p_coeff{})));
horizontal_diffusion.run(p_in{} = in, p_out{} = out);
See the documentation and examples for details about the new API.
Related PRs: #1388
New API: Storage Builder
Datastores are now created using a builder API, e.g.
auto storage_builder = gt::storage::builder<storage_traits_t>.dimensions(d1, d2, d3).halos(halo, halo, 0);
auto in = storage_builder.type<double const>().value(42).build();
auto coeff = storage_builder.type<double const>().value(42).build();
auto out = storage_builder.type<double>().build();
The type returned by the builder is a shared_ptr
of a data_store (previously the shared_ptr
was inside the data_store
)
Other storage related changes:
- Memory alignment is applied in bytes (instead of in elements).
- Host/device buffers are automatically synchronized on creation of views or on access of the underlying pointer (the
sync
method is removed).
See the documentation and examples for details about the new API.
API break: New Backend names
Our backend names (cuda, mc, x86) where a source of confusion as the users had a certain (but wrong) idea of e.g. when to use x86
.
The new names are (#1490):
gpu
instead ofcuda
as the same backend works for HIP.cpu_kfirst
instead ofx86
, the innermost dimension isk
, suitable for vertical stencils and architectures that emphasize caches over vector instructions.cpu_ifirst
instead ofmc
, the innermost dimension isi
, suitable for modern CPUs where vector instructions are key for performance.
Additionally we introduced a new backend gpu_horizontal
(#1445) which works only for pure horizontal (parallel
) stencils.
Performance of gpu_horizontal
is improved over gpu
for most stencils, however we recommend to benchmark both backends.
Other API breaking changes
- Backend declarations (traits) are removed from
common/defs.hpp
and are now provided in component specific headers forstencil
,timer
,gcl
andstorage
(#1388). - We improved the code structure by introducing finer-grained namespaces (#1388)
- The storage repository was removed (#1456)
New functionality
- New
sid::rename_dimensions
(#1533) - New regression test illustrating c-arrays as SIDs (#1525)
- A Python SID adapter including regression test for calling computations from Python (#1523)
- Introduced the threadpool concept (#1484, #1498, #1504) and added an HPX threadpool (#1437)
- Added an example for calling CUDA GridTools computations from Fortran with OpenACC (#1454)
Improved functionality
- GCL is now header-only (-> all GridTools is now header-only)
- The CMake build scripts are rewritten, see the documentation and examples for how to use GridTools CMake targets (#1421, #1441, #1442, #1450, #1509)
Bug Fixes / Cleanup
- Fixes to SID concept helpers (#1524, #1527, #1531)
- Fixes for CUDA 11 (#1529), thanks @lukasm91
- Fixes for HIP compilation (#1488)
- Better error diagnostics at the frontend (#1495)
- Performance tests are now included in a single binary (#1453)
- Layout transformations are refactored (#1388)
- and many other small fixes
Infrastructure/Development
- Environments are renamed to describe more precisely what they are (#1507)
- Added testing on the new MeteoSwiss machine Tsa to Jenkins (#1452)
- Moved tests from Travis to GitHub actions (#1446), added tests for different CMake setups (#1443).
- Added a Gitpod configuration (#1423)
- Added testing with Clang-based Cray compiler on Daint (#1382)
Contributions
This release contains contributions from
@anstaf, @fthaler, @havogt, @jdahm, @lukasm91, @mbianco, @tehrengruber, @wdeconinck.
GridTools version 2.0.0rc2
see final release
GridTools version 2.0.0rc1
see final release
GridTools version 1.1.3
GridTools version 1.0.4
Fixes
- CMake: support for superbuilds (nesting gridtools with
add_subdirectory
/FetchContent
) #1383
GridTools version 1.1.2
GridTools version 1.1.1
Fixes
- Make computation API thread compatible by making the allocator thread_local (#1380).
- CMake: fix to make GridTools work as nested project in a "superbuild" setup.
GridTools version 1.1.0
GridTools
In GridTools v1.1.0 we set the default C++ standard to C++14 and drop compatibility for C++11. This requires at least CUDA 9.0.
Changes since v1.0.0
Full introduction of the SID concept
The backend is completely restructured based on the SID (stencil iteratable data) concept. There should be no user facing changes as long as user code was only using documented public API (*). The changes separate backend implementation from the core library to allow non intrusive extension of the library with new backends. Additionally maintainability of the gridtools infrastructure is significantly improved.
Performance should be improved in general, but might be worse for specific computations. A common pattern for performance improvement/degradation is not observed.
(*) There is one change which might trigger different behavior (though the old behavior was not documented): temporary fields are now implicitly 3 dimensional. Prior to this version the user could have abused a 2D temporary field for accumulating values between k-levels.
New
- New example illustrating the type-erasure pattern for computations. #1318
Deprecation (support will be removed in GridTools v2.0.0)
- Using the gridtools::c_bindings is deprecated. Switch to the standalone https://github.com/GridTools/cpp_bindgen.
global_accessor
is deprecated, usein_accessor
(without extents) instead.make_global_parameter
withbackend
as template parameter is deprecated. Thebackend
is not needed anymore.
Fixes / Cleanup
- Fix performance for CUDA 9.2 / 10.0 #1281 #1327 #1339
- Use c++14 features. #1307
- Use multiple threads in storage Initialization. #1300
- Remove dependency on boost::mpl and boost::fusion
- Fixes required to compile gridtools with HIP-Clang. Full support for AMD GPUs via HIP-Clang will come in a next release. #1363
- Fix a bug in communication #1355.
- The
global_parameter
doesn't require pre-allocated storage (as it is now passed via constant memory in case of CUDA), thereforeglobal_parameter
is a lightweight wrapper around the value type, which can be created without overhead, e.g. when passing it tocomputation.run()
.