Anaconda recently released Numba, an open just-in-time (JIT) compiler that translates a subset of Python and NumPy code into fast machine code. The new compiler is said to translate Python functions when it is called into a machine code equivalent that runs anywhere from 2x (simple NumPy operations) to 100 (complex Python loops) faster. 

Try out Numba here

One of the most effective methods to use Numba is to apply its one of the decorators to your functions and tell it to compile them. All or a part of your code can thus run at native machine code speed when a call to a Numba-decorated function is made. 

Numba works with the following:

  • OS: Windows (32 and 64 bit), OSX, Linux (32 and 64 bit). Unofficial support on BSD.
  • Architecture: x86, x86_64, ppc64le, armv7l, armv8l (aarch64). Unofficial support on M1/Arm64.
  • GPUs: Nvidia CUDA.
  • CPython
  • NumPy 1.18 – latest

It is not advised for first-time Numba users to compile Numba from the source code. The dependencies on it are maintained to an absolute minimum because it is frequently used as a core element. However, the following additional packages can be installed to offer more functionality:

  • Scipy makes it possible to compile Numpy- .linalg functions.
  • Colorama permits the use of colour highlighting in error messages and backtraces.
  • pyyaml supports Numba configuration through a YAML configuration file.
  • The Intel SVML (high-performance short vector math library, x86 64 only) can be used with icc_rt. Performance tips provide installation instructions.

How does it work?

Numba reads the Python bytecode for a decorated function and then mixes it with details on the types of the function’s input parameters. It then uses the LLVM compiler library to create a machine code version of your function that is suited to your CPU capabilities after analysing and optimising your code. This compiled version is used every time your function is called. 

Numba will tailor compilation to your particular CPU, assuming it can run in nopython mode or at least compile some loops. Depending on the application, speed increase can range from one to two orders of magnitude. 

Numba provides a variety of choices for parallelizing your code for CPUs and GPUs with minor code changes-

Simplified Threading: Numba can automatically execute NumPy array expressions on multiple CPU cores, making it easy to write parallel loops.

SIMD Vectorization: Numba can automatically convert some loops into vector instructions for 2-4x speed increases. Whether your CPU supports SSE, AVX, or AVX-512, Numba adapts to its capabilities.

GPU Acceleration: Numba enables you to create parallel GPU algorithms entirely from Python and supports NVIDIA CUDA.

The post Introducing Numba, A High-Performance Python Compiler  appeared first on Analytics India Magazine.