ILGPU Release Versions

Version v0.7.1

  • Added extension method to load the effective address for Cuda and CPU-based array views.
  • Added support for data blocks (value containers) for easy the interop with value tuples.
  • Added additional primitive data blocks to simplfy operations on tuples consisting of primitive values.
  • Added new ExchangeBuffer class to simplify memory transfers between CPU and GPU memory.
  • Fixed invalid sub-group extension name in CLAccelerator.
  • Fixed invalid association of supported and unsupported CL accelerators.
  • Removed obsolete dispose functionality from AcceleratorId classes.
  • Fixed OpenCL code generator for float values that are assign integers values.
  • Fixed invalid creation of kernel interop types in OpenCL backend.
  • Made ABI thread safe to support concurrent queries of size/alignment information.

Version v0.7

  • Added support for .Net Standard 2.1
  • Added support for OpenCL-compatible GPUs (beta)
  • Added parallel code generation in backends to improve code-generation speed.
  • Added minimum CUDA driver version detection.
  • Enabled adaptive shared-memory allocation in CPUAccelerator.
  • Added new Utility.Select method that can be used to create highly-efficient select instructions in favor of if branches.
  • Added support to access Grid and Group indices via properties.
  • Added support for generic Warp intrinsics that will be automatically generated by the compiler.
  • Redesigned intrinsic math functions and moved XMath functions to the ILGPU.Algorihtms library. Use the new IntrinsicMath class for math functions that are supported on all platforms.
  • Reworked intrinsic functions to allow custom implementations of intrinsics for different backends.
  • Ported project to VS2019 including all static-program analysis checks.
  • Applied generate code cleanup to be compliant with the new analysis checks.
  • Redesigned AcceleratorId functionality.
  • Updated CudaMemoryBuffer to support MemSetToZero using alternate streams.
  • Fixed retrieving version number of ILGPU assembly.
  • Fixed non-deterministic generation of Phi mappings.
  • Fixed invalid loading of small basic types onto the evaluation stack.
  • Added utility property to Accelerator to resolve a launch extent with the maximum number of groups.
  • Fixed invalid shared-memory allocation within non-kernel functions in PTXBackend.

Version v0.6

  • Added support for new GeForce RTX cards.
  • Added initial support for arrays in kernels.
  • Added additional 3D indexing functionality to ArrayView types.
  • Added automatic binding of accelerators in advanced multi-GPU scenarios.
  • Tested debugging and profiling capabilities on NVIDIA GPUs.
  • Released test framework to verify generated kernel code.
  • Improved performance of predicates in PTX backend.
  • Removed strict array-length restriction from allocation nodes.
  • Enhanced generation of get/set field operations.
  • Optimized generation of conditional branches.
  • Fixed invalid generation of predicate barriers in PTX backend.
  • Fixed invalid register allocation of string types in PTX backend.
  • Removed explicit tracking of predecessors in phi nodes.
  • Fixed invalid debug assertion in SequencePoint.
  • Fixed invalid alignment of shared-memory allocations in PTX backend.
  • Fixed invalid shared memory configuration of Cuda kernels.

Version v0.5.1

  • Polished error messages and util methods.
  • Fixed invalid DebuggerDisplay attributes on array views.
  • Added support for loading addresses of static fields.
  • Added support to disable kernel caches and automatic disposal of kernels and memory buffers.
  • Extended kernel loaders with additional overloads.
  • Added support to clear internal caches.
  • Fixed invalid extent and bounds checks in MemoryBuffer.CopyTo.
  • Fixed invalid initialization of PTX-specific intrinsic functions.
  • Fixed invalid load/store instructions of bytes in PTX backend.
  • Fixed invalid generation of 'null' values in PTX backend.

Version v0.5.0

  • Extended kernel loaders with additional delegate overloads. (Community Request)
  • Fixed invalid loading of debug symbols from dynamic assemblies.

Version v0.5.0-beta

  • Added support for basic line-based GPU debugging and profiling.
  • Added support for jagged arrays. (Community Request)
  • Redesigned support for sub-warp shuffles. (Community Request)
  • Redesigned implicit stream launchers. (Community Request)
  • Fixed several code generation issues. (see GitHub)
  • Redesigned all required transformations and code generators.
  • Redesigned IR in order to significantly improve compilation time and memory consumption.

Version v0.4.0-beta

  • Removed all native library dependencies (including LLVM).
  • Redesigned huge parts of the compiler.
  • Added conceptionally new (experiemental) IR.
  • Introduced generic data views in order to generate code for low-level (e.g. PTX) and high-level (e.g Vulkan) targets.
  • Adapted IL-Frontend to generate IR code instead of LLVM-IR.
  • Added new code-transformation phases to optimize code.
  • Added support for parallel code generation.
  • Adapted support for portable PDBs.

Version v0.3.0

  • Added support for .Net Standard 2.0. (Community Request)
  • Added first support for specializing kernels during compilation.
  • Updated accelerator caching functionality. (Community Request)
  • Improved multithreading support. (Community Request)
  • Added automatic disposal of kernels, memory buffers and accelerator streams. (Community Request)
  • Integrated basic support for portable PDBs.
  • Added native build scripts for linux operating systems.
  • Added support for linux operating systems in DLLLoader and PTXBackend.

Version v0.2.1

  • Fixed invalid code generation of non-zero (true) branches.

Version v0.2.0

  • Added convenient kernel loading and caching to accelerator classes.
  • Added properties to query the maximum number of threads of an accelerator.
  • Added Disposed event to Accelerator.
  • Added support for Cuda 9.0.
  • Added new cross-platform Cuda API.
  • Added integer-division operators to GPUMath.
  • Added RadToDeg and DegToRad conversion methods to GPUMath.
  • Added support for .Net Core 2.0.
  • Removed LLVMSharp dependency.
  • Enhanced SSA code generation.
  • Fixed invalid constant generation of padded structures.
  • Updated CompilerServices.Unsafe dependency to version 4.4.0.

Version v0.1.4

  • Fixed invalid code generation of float-based Atomic.Min/Atomic.Max functions.
  • Added support for nullable types in kernels.
  • Fixed invalid return value of atomic add in CPU mode.
  • Fixed invalid resolving of generic virtual methods.

Version v0.1.3

  • Fixed critical thread-divergence issues in CPUAccelerator.
  • Added additional checks to avoid group-barrier functions in implicitly- grouped kernels.
  • Fixed critical issue in kernel-launcher code generation in CPUAccelerator.
  • Fixed invalid loading of double constants in Force32BitFloats mode.
  • Fixed invalid code generation of some math intrinsics (Atan, Atan2, Pow).
  • Fixed wrong view dimension in GetRowView.

Version v0.1.2

  • Added atomics for index types.
  • Added new debug views for generic array views in CPU mode.
  • Added additional operators to index types.
  • Added min/max functions to index types.
  • Added new clamp functions to GPUMath.
  • Added support for IntPtr.ToPointer functions.
  • Enhanced reduction interface.
  • Fixed invalid ArgumentOfOfRangeException-check in MemoryBuffer.
  • Fixed critical issue in ArrayView3D<T>.GetSliceView.
  • Fixed critical issue in ArrayView<T>.GetRowView.
  • Removed internal IL-assembly intrinsics by an official Unsage package.
  • Added debug and release versions to NuGet package. Reason: Exceptions are not allowed in GPU code but debug assertions are allowed. Release builds do not contain assertions. Hence, debug assemblies are are required for proper error messages in GPU kernels during development.
  • Added feature to force all floating-point operations to 32bit (even math intrinsics): CompileUnitFlags.Force32BitFloats.
  • Fixed invalid math-intrinsic annotation.
  • Fixed invalid error messages in debug assertions on PTX-based devices.
  • Fixed critical issues in array code generation.