GPU Computing at HPC.NRW

Europe/Berlin
Virtual (Zoom)

Virtual

Zoom

Description

GPU Computing plays a significant role in High-Performance Computation. This 2-day course on GPU Computing covers the basic concepts of GPU architectures, the CUDA C/C++ programming as well as GPU code profiling and optimization. The topics included are:

  • Day 1: Introduction to GPU computing and CUDA
  • Day 2: Advanced CUDA, optimization and profiling

Each day of the course consists of approximately 4 hours of lectures and 3 hours of hands-on exercises. The aim is that by the end of this 2-day course you are able to write CUDA programs on GPU and tune the GPU code performance by using the profiling and optimization tools from NVIDIA.

Prerequisites: Strong interest to dive into GPU computing (most important) and basic C/C++ programming knowledge.

The event is organized by the competence network HPC.NRW.

The event will be held in a virtual way via Zoom video conference. This course is free of cost for members of German universities or publicly-funded research institutions in Germany.

Please Note:

  • HPC.NRW reserves the right to the final allocation of seats for registrations.
  • This course is overbooked. We appreciate and thank you for your interest.
  • Tuesday, January 18
    • 9:00 AM 10:00 AM
      lecture 1h Virtual

      Virtual

      Zoom

      • What to learn about GPU computing in 2022
      • Quick review of GPU-enabled libraries and frameworks
      • CUDA, OpenCL, OpenACC, C++ std::parallel and CuPy
      • What do you really need to know to make your GPU code efficient
      • What GPU libraries are good at and how not to make things worse
    • 10:00 AM 11:00 AM
      lecture 1h Virtual

      Virtual

      Zoom

      • Introduction to GPU computing
      • Typical GPU system topology
      • GPU performance by example of typical computational building blocks
      • Differences between CPU and GPU design and execution flows
      • Three pillars of efficient GPU code: enough workload, memory coalescing and minimized divergent branching
    • 11:00 AM 12:00 PM
      lecture 1h Virtual

      Virtual

      Zoom

      • CUDA principles and CUDA implementation for C++
      • Analogies between MPI+OpenMP and CUDA programming models
      • The first CUDA program explained
      • CUDA compute grid, examples
      • Realistic CUDA application example (wave propagation code)
      • Understanding GPU compute capabilities, deviceQuery
    • 12:00 PM 2:00 PM
      Hands-on session 2h Virtual

      Virtual

      Zoom

      • Write & deploy a simple CUDA program
      • More control on CUDA compute grid
      • Write & deploy a meaningful image processing tool in CUDA
    • 2:00 PM 3:00 PM
      Lunch 1h
    • 3:00 PM 4:00 PM
      GPU memory hierarchy 1h Virtual

      Virtual

      Zoom

      • GPU memory types
      • Shared memory
      • GPU caches hierarchy and mode switches
      • Unified virtual address space (UVA) in CUDA 7.5
    • 4:00 PM 5:00 PM
      Hands-on session 1h Virtual

      Virtual

      Zoom

      • "fill-in" exercise on reduction with and without shared memory
  • Wednesday, January 19
    • 9:00 AM 10:00 AM
      lecture 1h Virtual

      Virtual

      Zoom

      • What kind of advanced GPU experience is really useful
      • An overview of the most recent NVIDIA Volta, Turing and Ampere GPU architectures
    • 10:00 AM 11:00 AM
      Advanced CUDA 1h Virtual

      Virtual

      Zoom

      • Atomic operations, and program examples
      • Warp shuffle instruction. Optimizing reduction with shuffles
      • Unified virtual address space (UVA) in CUDA 7.5
      • PCI-E optimizations: streams, asynchronous data transfers
    • 11:00 AM 12:30 PM
      Hands-on session 1h 30m Virtual

      Virtual

      Zoom

      • reimplementing the last step of reduction using atomics
      • leveraging warp shuffling in the reduction kernel
    • 12:30 PM 2:00 PM
      Advanced CUDA 1h 30m Virtual

      Virtual

      Zoom

      • Orchestrating repeatable dependency-driven complex execution pipelines with CUDA Graphs
      • CUDA 9 cooperative groups
      • Warp-synchronous programming in CUDA 9
      • Dynamic parallelism
      • CUDA C++ compiler pipeline, PTX assembler, SASS, NVVM backend
      • Understanding "-Xptxas -v" reports
    • 2:00 PM 3:00 PM
      Lunch 1h
    • 3:00 PM 4:00 PM
      GPU code optimization 1h Virtual

      Virtual

      Zoom

      • GPU optimizations: compute grid, coalescing, divergence, unrolling, vectorization, maxrregcount, aligning, floating-point constants
      • The concept of GPU Occupancy, driving your GPU optimization strategy with GPU Occupancy
      • Overview of NVIDIA Visual Profiler
      • Overview of nvprof (command line profiler)
      • Common practices of identifying performance hazards in GPU application using NVIDIA Visual Profiler
    • 4:00 PM 5:00 PM
      Hands-on session 1h Virtual

      Virtual

      Zoom

      • profile and optimize the bilinear interpolation kernel