ROCm Blog Roll
ROCm Device Libraries Now Available!
We are pleased to announce fully open-source bitcode device libraries for the ROCm platform.
The following libraries are available:
- Open Compute Math Library (OCML)—contains 250+ optimized math functions for single, double and half precision. See the OCML documentation for a list of currently supported functions.
- Open Compute Kernel Library (OCKL)—contains many useful functions, including work-item/workgroup functions, integer optimization operations and wavefront reduce algorithms. It also serves as a foundation for the built-in library of the OpenCL-enabled compiler.
- Compiler built-in library to support OpenCL.
- Compiler built-in library for the HC mode of the Heterogeneous Compute Compiler (HCC).
- OCLC libraries—enable fine-grain control over library modes (such as relaxed math).
- IRIF—interface library to AMD GPU back end.
The libraries are intended for language and run-time implementors. They work with the open-source AMD GPU LLVM back end on GCN architectures and link to user bitcode through the LLVM linker before code generation.
ROCm-Device-Libs is currently under development, so we expect many improvements. As always, contributions and other feedback are welcome.
Quickly Access ROCm GPU Performance Counters via CodeXL
You can get the list of available counters by running the following command:
- ./CodeXLGpuProfiler --list
To save the counter list to a file, use
- ./CodeXLGpuProfiler --list --outputfile counters.txt
Next, specify the set of counters using the following command:
- ./CodeXLGpuProfiler --hsapmc --counterfile
"counters.txt" should list one counter name per line.
HCC Adding Richer Array of Low-level Optimizations
- 24-bit integer multiply and multiply-add (__mul24 and __mad24)
- Intrinsics for ds_permute and ds_bpermute (__amdgcn_ds_permute and __amdgcn_ds_bpermute)
- Intrinsics for ds_swizzle (__amdgcn_ds_swizzle)
- Intrinsics for wave shift and rotate by one thread (__amdgcn_wave_*)
- Intrinsic for move dpp (__amdgcn_move_dpp)
Example of HCC Fuctions
- unsigned int hc::__activelaneid_u32 () __HC__
- Get the number of earlier (in flattened work-item order) active work items in the same wavefront.
- uint64_t hc::__activelanemask_v4_b64_b1 (unsigned int input) __HC__
- Return a bit mask showing which active work items in the wavefront have a nonzero input.
- unsigned int hc::__activelanecount_u32_b1 (unsigned int input) __HC__
- Count the number of active work items in the current wavefront that have a nonzero input.
- unsigned int hc::__activelanepermute_b32 (unsigned int src, unsigned int laneId, unsigned int identity, unsigned int useIdentity) __HC__
- Permute the active work items in the wavefront.
- int hc::__any (int predicate) __HC__
- Evaluate the predicate for all active work items in the wavefront and return a nonzero value if and only if every predicate evaluates to a nonzero value.
- int hc::__all (int predicate) __HC__
- Evaluate the predicate for all active work items in the wavefront and return a nonzero value if and only if any one predicate evaluates to a nonzero value.
- uint64_t hc::__ballot (int predicate) __HC__
- Evaluate the predicate for all active work items in the wavefront and return an integer whose Nth bit is set if and only if the predicate evaluates to a nonzero value for the Nth work item of the wavefront and the Nth work item is active.
- unsigned int hc::__shfl_xor (unsigned int var, int laneMask, int width=__HSA_WAVEFRONT_SIZE__) __HC__
It’s Time to ROC
HPC User Forum in Tucson; Gregory Stoner from AMD presents
It's Time to ROC.
Rolling Out HIP Version 0.86The team just released an update to HIP in version 0.86 that includes multiple improvements to the capabilities and tools. Also, we added several more HIP ports and examples. If you’re just getting started, HIP (Heterogeneous-Computing Interface for Portability) is a portable C++ run time and kernel language for GPUs. It includes tools to
hipifyCuda code into the portable C++ language.
HIP 0.86 Release NotesRelease: 0.86.00 Date: 2016.06.06
- Add clang-hipify—a Clang-based hipify tool that improves parsing of source code and automates creation of hipLaunchParm variable.
- Implement memory register/unregister commands (hipHostRegister, hipHostUnregister)
- Add cross-linking support between G++ and HCC, in particular for interfaces that use standard C++ libraries (i.e., std::vectors, std::strings). HIPCC now uses libstdc++ by default on the HCC compilation path.
- Include more samples, such as gpu-burn, SHOC, nbody and rtm. See HIP-Examples.
Info on Previous 0.84 ReleaseRelease: 0.84.01 Date: 2016.04.25
- Refactor HIP make and install system.
- Move to CMake. Refer to the installation section in README.md for details.
- Split source into multiple modular .cpp and .h files.
- Create static library and link.
- Set HIP_PATH to install.
- Make hipDevice and hipStream thread-safe.
- Preferred hipStream usage still involves creating new streams for each new thread, but it works even if you don’t.
- Improve automatic platform detection: if an AMD GPU is installed and the driver detects it, default HIP_PLATFORM to hcc.
- HIP_TRACE_API now prints arguments to the HIP function (in addition to the function name).
- Deprecate hipDeviceGetProp (replace with hipGetDeviceProp).
- Deprecate hipMallocHost (replace with hipHostMalloc).
- Deprecate hipFreeHost (replace with hipHostFree).
- The Mixbench benchmark tool for measuring operational intensity now has a HIP target, in addition to Cuda and OpenCL. Let the comparisons begin! See the Mixbench GitHub site for more information.