Releases · gorgonia/cu

Incorrect variable passed in cuLaunchAndSync by @tkunicki in #58
Fixed install command by @MarvinJWendt in #59
Add CUDA 11.8 and alternative means of setting up cgo on Windows by @dalva24 in #60
Cuda 12 by @neurlang in #69

New Contributors

@tkunicki made their first contribution in #58
@MarvinJWendt made their first contribution in #59
@dalva24 made their first contribution in #60
@neurlang made their first contribution in #69

Full Changelog: v0.9.4...v0.9.5

Contributors

tkunicki, dalva24, and 2 other contributors

Assets 2

02 Aug 04:06

chewxy

v0.9.4

a41082c

CUDA11 supported

* CUDA11 initial work. First, we generate the new enums

* Added generateEnums, which generates the Go version of the CUresult type

* Updated tests such that they no longer fail.
Added a Signal() method to BatchedContext, to force the BatchedContext to DoWork

* Updated benchmarking of batched vs no batched context. It would appear that for now Batching no longer confers a benefit

* Attempt #4 at getting CUDA11. Previous attempts were working based off a faulty copy of `cuda.h`

- Updated Device to support UUID
- Updated README
- Updated genlib to do more things more carefully

* More work on CUDA11
- Added more mappings into mappings.go to generate stufff
- Changed the definition of Context, by adding one additional method to clear L2Cache
- Added stubs for LaunchCooperativeKernel
- Added Graph types.

TODO next: add all the basic Graph data structure and then autogenerate all the things!

* Fixed mappings to also include @egonelbre's change in 2e25e65507
Fixed a bug where Fix() wasn't called, leading to weird generations

* Added some graph stuff, fixed some mappings stuff for genAPI. It seems that the graph functions will have to be manually written for now

* Updated graph.go from ages ago

* Updated more of CUDA11 Graph API into the library.
Slowly getting there.

* Added the body of CopyParams

* Added AddMemsetNode method for Graph.

* Fixed a bunch of things

* Switched to modernc.org/cc instead of using the older github.com/cznic/cc

* cuDNN updated their website. So parse.py also has to change.
As a result moredecls.go also changed

* Sorted the data in mappings.go. This will allow for better diffing

* Updated the generatethis pipeline

* Initial mappings generation.

* Mapped the old commented out mappings to new commented out mappings (see mappings.ods)

* Generated enums.

* Updated enums and enum strings

* Added more generated data structures

* Added methods

* Generated stubs. 7 TODOs

* Added more incompletes report

* Manually fixed the TODO of SpatialTransformer

* Manually fixed generated_rnndata.go

* Manually fixed generated_seqdata.go

* Manually fixed generated_backend.go

* Manually fixed generated_tensortransform.go

* Fixed the missing getters

* fixed all the .C()s of the generated types

* Generated a new API

* Fixed random C int issues. Now to handle the rest

* Updated INCOMPLETES_REPORTS

* fixed variable collition in _BackendAttributeTypeNames

* gencudnn enum generation syntax fixes added

* Updated INCOMPLETES

* variable renaming added as per the review

* AlgorithmDescriptor syntax fixes added

* AlgorithmPerformance syntax fixes added

* Activation cudnnActivationDescriptor_t return method name change added

* syntax fixes added on FusedOpVariantParams

* FusedOpConsts syntax fixes added

* C type retrieve function added for cudnnStatus

* tensor file syntax fixes added
tensor file unreachable code removed

* method receiver renaming added

* optensor syntax fixes added

* generated_api syntax fixes added

* code review changes added

* go modules updated
algorithmdescriptor Algorithm type changes added

* review changes added
GetRNNLinLayerBiasParams & GetRNNLinLayerMatrixParams methods moved to manually written API.go file

* Fixed a bug in parse.py where when parsing the documentation for CUDA11, the function names have `()`

* Removed deprecated functions from being generated

* More deprecated stuff no longer generated

* Fixed up algorithmdescriptor.go

* fixed some auto generated issues

* Manually fixed the fused ops generation

* Fixed even more autogenerated errors

* Fixed up more of the auto generated issues

* Renamed API to todo, because eh, I'll figure it out later

Co-authored-by: Aruna Prabhashwara <wg.aruna.p@gmail.com>

Assets 2

01 Jun 19:28

chewxy

v0.9.3

5b83640

CUDA 10.2 supported

v0.9.3

Added some more documentation, and support for cuda 10.2

Assets 2

12 Feb 01:08

chewxy

v0.9.2

a587ef5

New CUDA versions supported

fixed the convolution.c import

use cuda 10.1

Assets 2

05 Sep 00:32

chewxy

v0.9.1

4f793ce

v0.9.1

v0.9.0 never got out of beta and Gomod didn't like it. This release fixes that

Assets 2

12 Aug 07:21

chewxy

v0.9.0-beta

a49599f

Beta release of v0.9.0 Pre-release

Pre-release

Features:

CUDA 9 support
CuDNN 7 support
JIT support (thanks @egonelbre )
nvRTC support (thanks @egonelbre )
Full CUBLAS support
Move towards a unified generation method
Various API changes
Various fixes (@egonelbre)
Bug fixes (thanks to @egonelbre):
- CString not freed

Assets 2

17 Dec 20:42

chewxy

v0.8.0

32f1658

v0.8.0

Merge remote-tracking branch 'origin/master'

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: gorgonia/cu

CUDA 12 Support (Windows)

What's Changed

New Contributors

Contributors

CUDA 12 Support

What's Changed

New Contributors

Contributors

CUDA11 supported

CUDA 10.2 supported

New CUDA versions supported

v0.9.1

Beta release of v0.9.0

v0.8.0