Skip to content

Releases: GridTools/gridtools

GridTools version 2.1.0

25 Oct 13:09
a8039cb
Compare
Choose a tag to compare

New features

  • Dump backend: outputs a json representation of the stencil specification (#1456)
  • Reduction library with naive, CPU and GPU backends (#1590, #1594, #1619)
  • SID: Python cuda array interface support (#1596)

Extended features

  • Support for compile time length in data stores (#1545)
  • Several SID improvements (#1548)
  • Structured bindings support for gridtools tuple-like (#1556)
  • Improvements for Hugepage Allocation (#1562)
  • Add protection against misuse of device namespace (#1581)
  • fortran_array_view: allow to disable openacc (#1603)
  • Introduce sid::unknown_kind (#1605)

Non-functional changes

  • Hold the sids within sid::composite as tuple (#1564)
  • Various cleanups and c++17 related changes (#1579)
  • C++17 versions of meta::fold (#1549)
  • Sid as a proper C++20 concept (#1580, #1582)

Performance

  • More Inlining in cpu_kfirst Backend (#1634)
  • Support for Compile-Time Unit Stride Dimension for Python SID Adapter (#1635, #1651)

Bug fixes

  • K-cache fixes (#1530)
  • CMake: Fix storage_gpu for HIPCC-AMDGPU (#1540)
  • Remove a warning in hugepage_alloc which warns about a problem which only affects testing code (#1560)
  • Improve HIP + OpenMP Compilation (#1578)
  • Fix empty composite and add composite::make helper (#1583)
  • Fix as_const to work with any SID and be compatible with std::as_const (#1601, #1611)
  • SID composite: add static_assert against incorrect kinds (#1604)
  • Workaround a CUDA problem: tuple_util::concat remove constexpr var (#1606)
  • Improve Compliance with Parallel Model: Limit fusion of k-parallel execution with k-offsets (#1612)
  • GCC 9.x: Optimize multishift (#1630)
  • Python SID adapter: fix integer format check (#1632)
  • GCC 11.x: Compilation fixes (#1641, #1646)
  • Fixes for CUDA 11.4 (#1644)

Testing

  • Update to GTest v1.11 and minor changes to adapt for changed gtest interface (#1655)

Documentation

  • Clarifications to the execution model (#1541)

Contributions

This release contains contributions from
@anstaf, @fthaler, @havogt, @lukasm91.

GridTools version 1.1.4

04 Oct 08:42
Compare
Choose a tag to compare

Bug fixes

  • speedup compile time (#1608)
  • Support for GPU backend with custom block sizes in boundary conditions (#1438)
  • Fix sid shift origin (#1517)

Compatibility with new compilers

GridTools version 2.0.0

31 Jul 09:28
8101c64
Compare
Choose a tag to compare

GridTools v2.0.0

GridTools v2.0.0 comes with an improved API for stencil composition and storage construction.
These changes and a few others (see below) are breaking changes.

Changes since v1.1.0

New API: Stencil Composition

The make_computation API for composing stencils is replaced by a new stencil specification API, e.g.

auto horizontal_diffusion_spec = [](auto coeff, auto in, auto out) {
    GT_DECLARE_TMP(double, lap, flx, fly);
    return st::execute_parallel()
        .ij_cached(lap, flx, fly)
        .stage(lap_function(), lap, in)
        .stage(flx_function(), flx, in, lap)
        .stage(fly_function(), fly, in, lap)
        .stage(out_function(), out, in, flx, fly, coeff);
};

st::run(horizontal_diffusion_spec, stencil_backend_t(), grid, coeff, in, out);

instead of

auto horizontal_diffusion = gt::make_computation<backend_t>(grid,
    p_coeff{} = coeff,
    gt::make_multistage(gt::enumtype::execute<gt::enumtype::parallel, 20>{},
        define_caches(gt::cache<gt::IJ, gt::cache_io_policy::local>(p_lap{}, p_flx{}, p_fly{})),
        gt::make_stage<lap_function>(p_lap{}, p_in{}),
        gt::make_independent(gt::make_stage<flx_function>(p_flx{}, p_in{}, p_lap{}),
            gt::make_stage<fly_function>(p_fly{}, p_in{}, p_lap{})),
        gt::make_stage<out_function>(p_out{}, p_in{}, p_flx{}, p_fly{}, p_coeff{})));

horizontal_diffusion.run(p_in{} = in, p_out{} = out);

See the documentation and examples for details about the new API.

Related PRs: #1388

New API: Storage Builder

Datastores are now created using a builder API, e.g.

auto storage_builder = gt::storage::builder<storage_traits_t>.dimensions(d1, d2, d3).halos(halo, halo, 0);

auto in = storage_builder.type<double const>().value(42).build();
auto coeff = storage_builder.type<double const>().value(42).build();
auto out = storage_builder.type<double>().build();

The type returned by the builder is a shared_ptr of a data_store (previously the shared_ptr was inside the data_store)

Other storage related changes:

  • Memory alignment is applied in bytes (instead of in elements).
  • Host/device buffers are automatically synchronized on creation of views or on access of the underlying pointer (the sync method is removed).

See the documentation and examples for details about the new API.

Related PRs #1388, #1534

API break: New Backend names

Our backend names (cuda, mc, x86) where a source of confusion as the users had a certain (but wrong) idea of e.g. when to use x86.

The new names are (#1490):

  • gpu instead of cuda as the same backend works for HIP.
  • cpu_kfirst instead of x86, the innermost dimension is k, suitable for vertical stencils and architectures that emphasize caches over vector instructions.
  • cpu_ifirst instead of mc, the innermost dimension is i, suitable for modern CPUs where vector instructions are key for performance.

Additionally we introduced a new backend gpu_horizontal (#1445) which works only for pure horizontal (parallel) stencils.
Performance of gpu_horizontal is improved over gpu for most stencils, however we recommend to benchmark both backends.

Other API breaking changes

  • Backend declarations (traits) are removed from common/defs.hpp and are now provided in component specific headers for stencil, timer, gcl and storage (#1388).
  • We improved the code structure by introducing finer-grained namespaces (#1388)
  • The storage repository was removed (#1456)

New functionality

  • New sid::rename_dimensions (#1533)
  • New regression test illustrating c-arrays as SIDs (#1525)
  • A Python SID adapter including regression test for calling computations from Python (#1523)
  • Introduced the threadpool concept (#1484, #1498, #1504) and added an HPX threadpool (#1437)
  • Added an example for calling CUDA GridTools computations from Fortran with OpenACC (#1454)

Improved functionality

  • GCL is now header-only (-> all GridTools is now header-only)
  • The CMake build scripts are rewritten, see the documentation and examples for how to use GridTools CMake targets (#1421, #1441, #1442, #1450, #1509)

Bug Fixes / Cleanup

  • Fixes to SID concept helpers (#1524, #1527, #1531)
  • Fixes for CUDA 11 (#1529), thanks @lukasm91
  • Fixes for HIP compilation (#1488)
  • Better error diagnostics at the frontend (#1495)
  • Performance tests are now included in a single binary (#1453)
  • Layout transformations are refactored (#1388)
  • and many other small fixes

Infrastructure/Development

  • Environments are renamed to describe more precisely what they are (#1507)
  • Added testing on the new MeteoSwiss machine Tsa to Jenkins (#1452)
  • Moved tests from Travis to GitHub actions (#1446), added tests for different CMake setups (#1443).
  • Added a Gitpod configuration (#1423)
  • Added testing with Clang-based Cray compiler on Daint (#1382)

Contributions

This release contains contributions from
@anstaf, @fthaler, @havogt, @jdahm, @lukasm91, @mbianco, @tehrengruber, @wdeconinck.

GridTools version 2.0.0rc2

29 Jul 07:22
50c5e50
Compare
Choose a tag to compare
Pre-release

see final release

GridTools version 2.0.0rc1

15 Jun 12:09
53910ee
Compare
Choose a tag to compare
Pre-release

see final release

GridTools version 1.1.3

20 Jan 14:52
d33fa6f
Compare
Choose a tag to compare

Performance fixes

  • Revert a #pragma unroll to be optimal for the COSMO dycore on V100 (#1400)

Other

  • CMake: Add a missing policy workaround_mpi.cmake (#1398)

GridTools version 1.0.4

12 Dec 08:12
f45026d
Compare
Choose a tag to compare

Fixes

  • CMake: support for superbuilds (nesting gridtools with add_subdirectory/FetchContent) #1383

GridTools version 1.1.2

12 Dec 08:16
6858804
Compare
Choose a tag to compare

Support for new targets

  • Support for clang-CUDA and HIP (#1361)

Fixes

  • Support custom block size in storage traits (#1392)
  • Add GT_FUNCTION to storage_info
  • CMake: export compilation type (#1387)

Infrastructure

GridTools version 1.1.1

06 Dec 11:48
7cdf89a
Compare
Choose a tag to compare

Fixes

  • Make computation API thread compatible by making the allocator thread_local (#1380).
  • CMake: fix to make GridTools work as nested project in a "superbuild" setup.

GridTools version 1.1.0

07 Oct 12:02
12ee091
Compare
Choose a tag to compare

GridTools

In GridTools v1.1.0 we set the default C++ standard to C++14 and drop compatibility for C++11. This requires at least CUDA 9.0.

Changes since v1.0.0

Full introduction of the SID concept

The backend is completely restructured based on the SID (stencil iteratable data) concept. There should be no user facing changes as long as user code was only using documented public API (*). The changes separate backend implementation from the core library to allow non intrusive extension of the library with new backends. Additionally maintainability of the gridtools infrastructure is significantly improved.
Performance should be improved in general, but might be worse for specific computations. A common pattern for performance improvement/degradation is not observed.

(*) There is one change which might trigger different behavior (though the old behavior was not documented): temporary fields are now implicitly 3 dimensional. Prior to this version the user could have abused a 2D temporary field for accumulating values between k-levels.

New

  • New example illustrating the type-erasure pattern for computations. #1318

Deprecation (support will be removed in GridTools v2.0.0)

  • Using the gridtools::c_bindings is deprecated. Switch to the standalone https://github.com/GridTools/cpp_bindgen.
  • global_accessor is deprecated, use in_accessor (without extents) instead.
  • make_global_parameter with backend as template parameter is deprecated. The backend is not needed anymore.

Fixes / Cleanup

  • Fix performance for CUDA 9.2 / 10.0 #1281 #1327 #1339
  • Use c++14 features. #1307
  • Use multiple threads in storage Initialization. #1300
  • Remove dependency on boost::mpl and boost::fusion
  • Fixes required to compile gridtools with HIP-Clang. Full support for AMD GPUs via HIP-Clang will come in a next release. #1363
  • Fix a bug in communication #1355.
  • The global_parameter doesn't require pre-allocated storage (as it is now passed via constant memory in case of CUDA), therefore global_parameter is a lightweight wrapper around the value type, which can be created without overhead, e.g. when passing it to computation.run().

Infrastructure/Development

  • The bash build script is replaced by a python driven build process, see wiki for how to get the environment. #1273 #1298 #1341
  • Improved jenkins performance plots. #1301 #1338
  • Googletest is now pulled-in with CMake's FetchConent instead of having it as part of the repository. #1310