Everything-you-need-to-know about python interpreters

Python itself is, of course, a programming language. But many people mistakenly believe that Python is the very thing that comes with most of the *nix systems and can be launched by typing “python” in the console. That is, the interpreter (a specific version thereof) is associated with the language as a whole. Just like those guys who write on Delphi. But what does it really mean?

In fact, Python is not one specific interpreter. Indeed, in most cases we have to do with the so-called CPython, which can be used as a reference implementation and an example to follow. This means that CPython can be utilized to completely and quickly implement the current and future functionality to be described in the standards for the language contained in all kinds of specifications and PEPs. And this implementation is under the scrutiny of the “Benevolent Dictator For Life” (it’s a real title, please refer to Wikipedia, if you do not believe me) and the creator of this language Guido van Rossum.

But I do not want to say that you cannot create your own implementation according to the described standards or, for example, your own C++ compiler. Actually, a large number of developers just did this. And I want to tell you about the results of their work.

To understand Python, you must understand Python

One of the most well-know alternative implementations of Python is PyPy (former Psyco), which is often referred to as the “python written using python.” Everyone who hears such a definition has a natural question: How can an implementation of a language in the same language run faster than the language itself? But we have already agreed that Python is a generic name of a group of standards rather than of a specific implementation. As for PyPy, it is not the CPython, but the so-called RPython, which is not really a dialect, but rather a platform or framework allowing developers to write their own interpreters.

RPython provides a bit of low-level magic allowing you to add some useful features in an arbitrary scripting language (it does not necessarily have to be Python). For example, you can use JIT for acceleration. But of course you need the right set of ingredients (acquisitive mind — mandatory; eye of newt — optional). If you want more details or if you have a question like “And what about LLVM?” — welcome to FAQ.

The basic idea is that RPython is too specific to write real-life programs using this implementation. PyPy is easier and more convenient, moreover, it is fully compatible with CPython 2.7 and 3.3 in terms of the supported standards. However, because of the specific internal structure, it is currently difficult to use this implementation along with the libraries of the reference implementation that are used by the C modules being compiled. But not to worry, Django, for example, is already fully supported, and in many cases it runs faster than on CPython.

Furthermore, PyPy is modular and expandable. For example, sandboxing is supported. You can execute “dangerous” arbitrary code in a virtual environment, where the file system can be emulated and external calls (for example, in order to create a socket) can be blocked. In addition, the PyPy-STM project has been actively developing in recent years. PyPy-STM allows you to replace the built-in support for multi-threading that forces us to use GIL and other unloved things with an implementation based on Software Transactional Memory, i.e. in this case, a completely different concurrency control mechanism will be used.

Lower level

In addition to RPython + PyPy, there is an easier way to speed up the execution of the Python code — you should choose one of optimizing compilers that extend the standard of the language by adding more rigorous static typing and other low-level features. As I’ve mentioned above, RPython (even taking into account that this definition also refers to it) is too specific for real-life tasks, but there is also a more general-purpose project — cython.

In fact, it adds a new language that is something between C and Python (such an approach is based on the Pyrex, a project that had been suspended in 2010). Usually, extension of the source code files in this language equals to .pyx, and the compiler turns them into quite ordinary C modules that can be directly imported and used.

The code with declared types would look something like this:

def primes(int kmax):
    cdef int n, k, i
    cdef int p[1000]
    result = []
    if kmax > 1000:
        kmax = 1000
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    return result

In the simplest case, a “setup.py” file should be created for this module. Example of the contents:

from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules=cythonize("myscript.pyx"),
)

Actually that’s all. In general, there is nothing else to do.
By the way, if someone believes that all types are already present in Numpy and we do not need to invent a new language — please see this link. The bottom line is that the developers of this project are actively working on the joint use of both of them, and such an approach allows them to get a significant performance boost in relation to some tasks.

Snake in the box

As of today, creator and BDFL of the original implementation of Python Guido van Rossum is employed by Dropbox, where he leads a team of developers that is currently working on a new high-performance Python interpreter called Pyston. In addition to the 1,000th version of the name of our beloved language, it is a quite remarkable fact that Pyston works with LLVM using the most modern trends in the field of JIT compilation.

For now, this implementation is in its infancy, but it is rumored that it is already used for some internal projects of Dropbox and that there will be a significant step forward in terms of performance (even in comparison to CPython). But the interesting thing is that this is a kind of “testing ground,” where the developers can explore various technologies. For example, this implementation allows to use the GRWL (Global Read-Write Lock) plugin as an alternative for the GIL in order to solve some old problems. The bottom line is that GIL locks everything and everyone (the so called mutex), and the new plugin, on the contrary, divides locking operations into reading and writing ones, allowing multiple threads to simultaneously access data without negative consequences. In addition, we can also mention the so-called OSR (On-Stack Replacement), which is a kind of inner magic allowing to use “heavy” methods (similar to that in JavaScript). So it goes.

Virtual reality

If you managed to read the previous section, then you’ve probably realized that the most effective way to deploy a new interpreter for Python is to create an intermediate layer on top of the existing virtual machine or runtime environment. The situation is as follows: two of the most developed alternative implementations (except for the aforementioned PyPy) are JVM and .NET/Mono.

Jython

Jython should not be confused with JPython, a similar but rather sluggishly progressing project! Jython is an implementation of Python that allows you to fully use all Java primitives and enjoy all the benefits of JVM. It looks something like this (an example from the standard documentation):

from javax.tools import (ForwardingJavaFileManager, ToolProvider, DiagnosticCollector,)
names = ["HelloWorld.java"]
compiler = ToolProvider.getSystemJavaCompiler()
diagnostics = DiagnosticCollector()
manager = compiler.getStandardFileManager(diagnostics, none, none)
units = manager.getJavaFileObjectsFromStrings(names)
comp_task = compiler.getTask(none, manager, diagnostics, none, none, units)
success = comp_task.call()
manager.close()

Moreover, Jython offers a solution to all the problems you may experience in GIL — just get rid of it! That is, there is the same set of threading primitives, but Jython uses not the threads of the operating system themselves, but their implementation in Java machine. To see examples of this kind of multi-threading (primitives from Python and Java are used simultaneously), please click here.

IronPython

IronPython is written in C#, this implementation allows you to run Python within .NET, as well as provides access to numerous embedded entities of this platform. The code will look something like this:

from System.Net import WebRequest
from System.Text import Encoding
from System.IO import StreamReader

PARAMETERS="lang=en&field1=1"

request = WebRequest.Create('http://www.example.com')
request.ContentType = "application/x-www-form-urlencoded"
request.Method = "POST"

bytes = Encoding.ASCII.GetBytes(PARAMETERS)
request.ContentLength = bytes.Length
reqStream = request.GetRequestStream()
reqStream.Write(bytes, 0, bytes.Length)
reqStream.Close()

response = request.GetResponse()
result = StreamReader(response.GetResponseStream()).ReadToEnd()
print result

The code is window-like, but it is used quite widely. For more useful examples, please see this online cookbook. Moreover, it is necessary to mention that in addition to .NET, IronPython also provides advanced support for Visual Studio, which can be quite an important factor for the fans of this huge IDE offered by the guys from Redmond.

Conclusion

As you can see, the terrarium is quite large, there are many implementations of one of the most popular general-purpose languages nowadays, which further extends the range of problems that can be solved with the help of this language. Of course, alternate interpreters provide support for new standards with a delay, but in compensation, all the syntax and architectural problems will have been solved in CPython by this time. Besides, isn’t that cool to not only know two different languages, but also be able to use one of them in the environment of the other? I wish you a successful hunt, snake catcher!


Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>