Code Optimisation - Theory

Posted on Jun 6, 2022


  • Floating point operations per second
    • Peak FLOPs: Maximum attainable, assuming no cost to memory operations.
    • Peak GFLOPs = Clock speed (GHz) * Cores * FLOPs/cycle
  • Speedup = unoptimised time / optimised time

Optimisation approaches

  • Algorithmic: Implementing a more efficient algorithm eg. using merge sort instead of bubble sort
  • Code refactoring: Making changes to convince the compiler to apply optimisations.
    • Removing interloop dependencies: Making a loop only rely on the current iteration can allow for parallelisation.
      • Pointer aliasing: As long as a pointer alias isn’t hiding an interloop dependency, you can use the restrict keyword in declarations to tell the compiler that each iteration can be independent.
    • Loop peeling: Prerequisite for other optimisations. Take eg. the first iteration outside of the loop.
    • Loop interchange: Make memory accesses be in sequential order by eg. swapping order of rows and columns in iteration. Lets all relevant data be stored in cache simultaneously.
    • Loop blocking: Rearrange loops to improve cache-reuse of a block of memory eg. break a large matrix into smaller ones.
    • Loop fusion: Merge two loops traversing over the same range into one loop.
    • Loop fission: Improve data locality by splitting apart loops.
    • Loop unrolling: Make there be fewer iterations, make each iteration perform the function of several.
    • Loop pipelining: reorder independent operations within a loop to allow for pipelining.

Loop pipelining


All topics ⟶