Polyglot programming
This page features a short comparison and some links on writing fast, (mostly) compiled code for efficient scientific computations which can easily be called from Python. We review several candidate languages, namely C, C++, Fortran, Cython, Numba, Nim, Rust, D, Chapel and Julia.
For simple problems, Numba is very useful to speed up code execution, but for more complicated tasks, other approaches are necessary. Cython seems to be the natural choice, but the translation to C code and consecutive compilation is rather disruptive in a Pythonic workflow. Cython generated code is sometimes slightly slower than other languages like C or Fortran, but usually not more than a factor of two. C, C++ and Fortran can easily be called from Python but are not exactly nice to work with. Even though C++11/14 and Fortran 2003/2008 improved the situation quite a bit, C++ is still very complex and Fortran shows its age of almost 60 years. Moreover, modern, object-oriented Fortran is not easily callable from Python (and C/C++ for that matter), but needs an additonal layer of wrappers.
Nim is nice in that it has a syntax rather close to that of Python and the Pymod package can be used to auto-generate Python modules that wrap Nim modules. As a compiled language, Nim is very fast but it has certain problems for practical use, e.g., it is still under heavy development, so the language is still changing, but the stable 1.0 version should not be too far ahead. A show stopper, however, is the lack of support for n-dimensional arrays.
Another relatively new, but already stable language is Rust. It features guaranteed memory and thread safety, type inference and good performance. Rust is a systems programming language intended to be a memory safe replacement for C/C++ and in some respects somewhat lower level than Nim, Chapel or D, and might therefore be slightly more complicated to use. Moreover, the Python interface is CFFI based (C Foreign Function Interface), which is fast and flexible but not necessarily the most pleasant to use. Still, Rust is much simpler than e.g. C and C++ and certainly preferable if one has to choose between the two.
A language that has been around for somewhat longer but is otherwise similar to Rust is D. It is fast, robust, easy to use. However, just as NIM, D seems to lack support for n-dimensional arrays, which is a show stopper for scientific computing.
A language, specifically designed for scientific computing, is Chapel. It shares the pleasant feature with D (fast, robust, easy to use), but offers native support for n-dimensional arrays. Moreover is is easy to call from Python and has the advantage of featuring not only local parallelism, but also cluster-level parallelism.
Even though Rust, Nim, D and Chapel are statically typed, they have very strong type inference, but at compile time, therefore offering flexibility while retaining good performance. Variables are statically typed, but not explicitly. Besides, they also support overloading and static dispatch, which is a both fast and clean way to write flexible algorithms which can deal with several data types.
Currently, the most interesting language for high-performance and scientific computing seems to be Julia, which features a nice and clean syntax, and very good performance. Similarly to Cython, the difference in execution time compared to C, C++ and Fortran is rarely more than a factor of two, usually less. Productivity (measured in lines of code necessary to accomplish a given task), however, is tremedously improved over C, C++, Fortran and even Cython. Julia is still under heavy development, implying that the language is not yet fully stable, but breaking changes are rare. The Python interface (PyJulia) does neither seem to be very stable nor very actively developed. So calling Julia from Python is not always as smooth as one would want it to be. The other way around, calling Python form Julia via PyCall, however, works very well and reliably. Many important Python/SciPy packages like matplotlib and SymPy have been wrapped that way and are accessible from Julia. Julia seems to provide the best overall package for writing scientific code. It includes native support for n-dimensional arrays, shared-memory concurrency and high-level distributed memory parallelism. There are bindings for many important libraries like MPI, HDF5, FFTW and PETSc. So instead of calling Julia from Python it seems more effective and more reasonable to write code directly in Julia and call Python if needed, rather than the other way aroud. This also solves the two-language problem, i.e., using some high-level but low-performance language for writing the main program and some low-level but high-performance language for writing numerical kernels. Julia is both, high-level and high-performance, allowing to write very short but still very fast code, providing a flexibility that is rarely found in any other language.
Ressources
Overview over support of important language features
Cython
- Cython Homepage
- Cython: A Guide for Python Programmers by Kurt W. Smith (O'Reilly, 2015)
- PETSc, PETSc4py
- Trilinos, PyTrilinos
- DistArray: multidimensional NumPy-like distributed arrays for Python
- Bohrium
Julia
- Julia Homepage
- PyCall: Calling Python Functions from the Julia Language
- PyJulia: Calling Julia Functions from Python
- Parallel Julia: PETSc, MPI, Distributed Arrays, Elemental, ScaLAPACK and others
- HDF5 interface for the Julia language
- Julia: A Fresh Approach to Numerical Computing by Jeff Bezanson, Alan Edelman, Stefan Karpinski, Viral B. Shah (arXiv, 2014)
- Getting Started with Julia Programming by Ivo Balbaert (Packt Publishing, 2015)
- Mastering Julia by Malcolm Sherrington (Packt Publishing, 2015)
- Learn Julia by Chris von Csefalvay (Manning, 2016)
- Crossing Language Barriers with Julia, SciPy, IPython by Stephen G. Johnson (EuroSciPy 2014)
Chapel
- Chapel Homepage
- A Brief Overview Of Chapel by Bradford L. Chamberlain (2012)
- pyChapel: The Python/Chapel Integration Module
- Chapel for Python Programmers
- The Chapel Programming Language
- Chapel: Productive, Multiresolution Parallel Programming by Bradford L. Chamberlain
D
- The D Programming Language
- Programming in D by Ali Çehreli
- PyD: Seamless Interoperability between the D Programming Language and Python
- PydMagic: Ipython/Jupyter magic for inline D code
- d_hdf5: D bindings and wrappers for HDF5
- dfftw3: D bindings for FFTW3
Nim
- Nim Homepage
- Pymod
- Nim for Python Programmers
- Nim for Scientific Computing
- Nim: An Overview by Andreas Rumpf (OSCON 2015)
Rust
- Rust Homepage
- Programming Rust by Jim Blandy (O'Reilly 2016)
- Rust for Python Programmers
- My Python's a little Rust-y by Dan Callahan (PyCon 2015)
- The Rust Programming Language (Google Tech Talks, 2015)
- The Rust Programming Language by J. M. Archer (2015)
- Introduction to the Rust Programming Language by Alex Crichton (2014)
- ndarray: N-dimensional array with array views, arbitrary slicing, and efficient operations
- hdf5-rs: Thread-safe Rust bindings and high-level wrappers for the HDF5 library
- rsmpi: Message Passing Interface (MPI 3.1) bindings for Rust
C++11/14
- Writing Modern C++ Code: How C++ has Evolved over the Years
- Ten C++11 Features Every C++ Developer Should Use
- A Tour of C++ by Bjarne Stroustrup
- Effective Modern C++ by Scott Meyers (O'Reilly, 2014)
Fortran 2003/2008
- f90wrap: Fortran to Python interface generator with derived type support
- Object-Oriented Programming in Fortran 2003: Part 1, Part 2, Part 3, Part 4
- Scientific Programming In Fortran 2003: A Tutorial Including Object-Oriented Programming by Katherine A. Holcomb (2012)
- Fortran Best Practices