OpenCL

Erik Schnetter <eschnetter@perimeterinstitute.ca>, Federico Cipolletta

2020-07-01

Abstract

OpenCL is a programming standard for heterogeneous systems, i.e. for programming CPUs, GPUs, and other types of accelerators. OpenCL is implemented as a library, and OpenCL codes are compiled at run time by passing OpenCL routines, as strings, to the OpenCL library. This is different e.g. from CUDA, which is implemented as a language such as C or C++.

This thorn OpenCL provides the configuration bits that ensure that Cactus applications can use OpenCL libraries.

1 Introduction

OpenCL describes itself as:

OpenCL is the first open, royalty-free standard for cross-platform, parallel programming of modern processors found in personal computers, servers and handheld/embedded devices. OpenCL (Open Computing Language) greatly improves speed and responsiveness for a wide spectrum of applications in numerous market categories from gaming and entertainment to scientific and medical software.

More information is available at http://www.khronos.org/opencl/.

2 Availability

There seem to be four OpenCL implementations available at this time. Unfortunately, they each have their drawbacks:

AMD: Available at http://developer.amd.com/zones/openclzone/pages/default.aspx. This supports both CPUs and ATI GPUs. Unfortunately, the OpenCL compiler seems to produce code with a low quality.
Apple: Included with the operating system, available by default. This supports both CPU and GPU. The compiler is based on LLVM. Unfortunately, there seem to be serious bugs – for example, I can’t get the \(cos\) function to provide correct results.
Intel: Available at http://software.intel.com/en-us/articles/opencl-sdk/. This supports only (Intel?) CPUs. The compiler is based on LLVM, and the implementation is also based on Intel’s TBB (Threading Building Blocks).
Nvidia: Available at http://developer.nvidia.com/opencl, included in their CUDA distribution. This supports only GPUs.
pocl: Open source, available at https://launchpad.net/pocl. This OpenCL implementation has not yet been released (current version is 0.6), and is based on LLVM.

In addition, Wikipedia http://en.wikipedia.org/wiki/OpenCL lists two IBM implementations for their Power processor and for Intel compatible CPUs, respectively. The latter may be identical with or similar to AMD’s implementation.

Since OpenCL can run on CPUs, good OpenCL implementation are available at no cost for virtually all platforms.

It is possible to install several OpenCL implementations (platforms) at the same time, to build against any one of them, and then to choose at run time which devices from which platforms to use. For example, it is possible to build an application using the Intel implementation, and then at run time use the Nvidia platform to access a GPU (assuming that both Intel and Nvidia implementations are installed). On Unix, this is implemented via a system-wide configuration directory /etc/OpenCL/vendors that lists all OpenCL platforms that will be available at run time.

3 OpenCL Programming

OpenCL is very similar to C. However, it differs from C in several key aspects:

much smaller run-time library, consisting mostly of mathematical functions (such as sqrt) and printf;
built-in support for fine-grained and coarse-grainded multi-threading;
built-in support for vectorisation.

Given this, it is not possible to write a whole application in OpenCL. Instead, only the expensive parts (so-called compute kernels) are written in OpenCL, and are launched e.g. from C or C++.

In addition, the hardware architecture of GPUs and other accelerators differs from CPUs in one key aspect:

memory is separate from the host (regular CPU) memory.

That means that one has to explicitly copy data between the host memory and the device memory before and/or after calling compute kernels.

4 OpenCL Programming in Cactus

Cactus supports OpenCL programming at several levels. At the lowest level, one can use this thorn OpenCL directly. While this works fine, it is somewhat tedious because one has to write a certain amount of boilerplate code to detect and initialise the device, to copy data between host and device, and to build and run compute kernels.

Since OpenCL is implemented as a library, the flesh knows only little about OpenCL. For example, there are no configuration options to spedify an OpenCL compiler, since code is compiled at run time via a library call to which the source code is passed as string. There is, however, one way in which the flesh supports OpenCL: Files with a .cl suffix are converted into a string and placed into the executable. These strings have the type char const * in C, and can be accessed at run time under a (globally visible) name OpenCL_source_THORN_FILE, where THORN and FILE and are the thorn name and file name, respectively. (This is also explained in the users’ guide.)

5 High-Level OpenCL Programming in Cactus

Cactus also offers a higher-level way of OpenCL programming, implemented in the thorns OpenCLRunTime and Accelerator.

Thorn OpenCLRunTime provides a convenient function for executing OpenCL code. This function expects, as input, a string containing the OpenCL kernel code, and then calls this code. Lower-level tasks such as identifying available compute devices, initialising them, compiling the kernel (once, and then remembering it), and handling arguments and parameters are taken care of automatically. Details are described in this thorn’s documentation.

Thorn Accelerator simplifies memory management for GPUs and other types of devices. One declares in the thorn’s schedule which routines read and write what variables, and Accelerator then keeps track which variables need to be copied at what time. It keeps track where (host and/or device) a variable has valid values, and copies data only when necessary, taking time level cycling, synchronisation, and I/O into account. Details are described in that thorn’s documentation.

6 Using This Thorn

Refer to the Cactus UserGuide, Sec. B2.2, in order to know how this thorn can be used in a compiled configuration and how to possibly linking another specific version, already installed steparately.

Note on possible ExternalLibraries’ location stripping

Each thorn contained in Cactus/arrangements/ExternalLibraries will automatically adopt the library version contained in the Cactus/arrangements/<library>/dist folder. In particular, the tarball in Cactus/arrangements/<library>/dist is only used if either THORN_DIR is set to BUILD or is left empty and no precompiled copy of the library is found. If another location is specified via the THORN_DIR variable in the <machine>.cfg file at compilation, then the Cactus/lib/sbin/strip-incdirs.sh script will automatically strip away (for safety reasons) the locations:

/include
/usr/include
/usr/local/include

from THORN_INC_DIRS which default to THORN_DIR/include. Therefore, if there is any need for using one already installed version of one external library, the aforementioned location should be avoided (e.g. indicating /home as the THORN_DIR will work with no problems if the required library is installed there) or should be carefully checked, in order to avoid unwanted stripping. The same stripping happens to THORN_LIB_DIRS in lib/sbin/strip-libdirs.sh with a larger list of directories:

/lib
/usr/lib
/usr/local/lib
/lib64
/usr/lib64
/usr/local/lib64

7 Parameters

8 Interfaces

General

Implements:

opencl

9 Schedule

This section lists all the variables which are assigned storage by thorn ExternalLibraries/OpenCL. Storage can either last for the duration of the run (Always means that if this thorn is activated storage will be assigned, Conditional means that if this thorn is activated storage will be assigned for the duration of the run if some condition is met), or can be turned on for the duration of a schedule function.

Storage

NONE

Scheduled Functions

CCTK_WRAGH

opencl_printinfo

print opencl system information

	Language:	c
	Type:	function