JIT Compiler: Really fast POVRay FPUThe highly experimental JITC can considerably speed up POVRay renderings involving lots of calls to the function VM such as tracing isosurfaces or parametric objects by actually compiling the user functions using GCC. Patch against UNIX POVRay-3.6.1.
News
The idea...The function VM (virtual machine) in POVRay gets called whenever user functions need to be evaluated auch as for isosurfaces or the parametric object. For a number of scenes (especially such ones which consist mostly of an isosurface), a major factor for rendering time is the speed of the POV VM. And since the POV VM interpretes assembly code (produced from the user functions at parsing stage), there is room for speed improvements by actually compiling the code for the real CPU/FPU in the computer. Hence, one day I decided to give the just-in-time compilation approach a chance and implement that. Actually, the just-in-time compilation of the function code is no really new idea: The PPC/MacOS version of POVRay already comes with a built-in JIT compiler which compiles POV VM code directly into PPC instructions without the help of external programs such as GCC. The disadvantage is that some optimization opportunities get lost but the advantage is that compilation is faster and all the fuzz with external programs, source code, compiler options and shared libraries (see implementation immediately below) is avoided. However, it is far easier to compile the POV VM code into a PPC RISC code with a number of general purpose FP registers than to translate into i387 FPU code (which has a register stack of size 8). ...and the implementationThe JIT compiler actually simply compiles the assembly for the POV VM. This is done by translating the POV VM assembler code into C++ source code (one C++ function for every user function). All these functions are then collected (in their string representation) until the POV VM is called to evaluate such a function. At that point, all the functions gathered so far are written into a temporary file wich is compiled into a shared object (that is the UNIX analogon to DLLs on Windows) using the system compiler (GCC works, other compilers will need adjustments). This shared object is then loaded and allows POVRay to directly call the compiled versions of the user functions. All that works on-the-fly without the need for the user to do anything special. The generated shared object will be named jit-X.so (in the current directory) where X is a serial number which gets increased each time a shared object is created. Care has been taken to put as many functions as possible into a single shared object. If all the functions in a shared object are deleted from the VM, the object is unloaded. You will normally not see the shared object file since it is unlinked (removed) as soon as it has been loaed. Since the JIT compiler patch involves things like shared object loading, it is highly system specific. The patch provided here works fine for me on my i386 Linux system. It should also work on other Linux/GNU systems (i.e. using GNU compiler and linker) but will definitely not work on Windows. It may, however, be portable to MacOS X with little effort. The JITC patch needs the POVRay source code because several include files from POVRay are needed when compiling the user functions. Since it also requires the conf.h file, you should also not remove the build directory after having built POVRay. The JIT compiler uses the same flags and directories as were used to build POVRay; this information is statically compiled into POVRay during build of the patched version. The JIT compiler must explicitly be enabled using an environment var, see usage below. I know that the implementation is not very clean in all points (especially note the part changing fnpovfpu.cpp). Actually, this is the first time, I worked with runtime loading of shared objects and I encountered some problems while implementing which I had not thought of before. What is it good for?The JITC patch is primarily useful for scene renderings which are dominated by function VM calls. For example if you want to trace things like mathematical isosurfaces, expect speed increases of factor 2 to 3. However, if you are tracing an isosurface landscape whose major time is spend calculating complicated pattern functions, the benefit will be small. See also the examples below. Performance considerationsThe advantage of the JITC approach presented here is that GCC and all its optimization capabilities can be used. (E.g. most of the pointless register moves generated by the POV VM are optimized away - although it turns out that this example alone is not responsible for a large performance gain.) The downside is that running GCC takes some time (typically 3 to 4 seconds on my box when functions.inc is included and (only) a couple of functions are used (summing up to about 100 functions, most from the include file)). However, calling functions in the dynamically linked library does not introduce noticeable overhead. (I did several measurements including verification of the produced assembler code which showed that result.) The only overhead (apart from compiling the code and loading the library) is function lookup which has to be performed only once and can therefore be neglected. Download and Install
Download:
The JITC patch can only be obtained as patch against UNIX POVRay-3.6.1.
Install:
First, patch your POVRay-3.6.1 using patch(1). Note that the JITC patch needs the POVRay sources and the build directory (with conf.h) installed at the exact place, so leave the sources and the build dir on your hd. The configure script automatically detects the directories and these are compiled statically into the patched version of POVRay. Activate: The JITC-patched POVRay should behave exactly like the non-patched. To enable PRT, set the environment variable POV_USE_JITCOMPILER to yes. Bugs: The patch is highly experimental. If you find any bugs, especially functions for which it does not work correctly, please contact me. Usage (important)
Using the POVRay with JITC patch should not be any different from using
normal POVRay. In order to enable the patch you need to set
the environment variable POV_USE_JITCOMPILER
to "yes". (Use no env var at all or value "no" to disable).
This is done e.g. using the bash(1) via: When having enabled the JIT compiler, it should automatically compile the functions. In case it fails, you should see error messages and POVRay will revert back to the slower built-in POV VM. A successful compile should look like this in the terminal: Mapping background image 0:00:00 Rendering line 1 of 120 JIT compiler: g++ -x c++ -pipe -Wno-multichar -O3 -march=athlon-xp -malign-double -minline-all-stringops -ffast-math -Wno-multichar -funit-at-a-time -fno-rtti -Wno-all -DHAVE_CONFIG_H -nostartfiles -shared -I/path/to/povray-3.6.1-modified/source -I/path/to/povray-3.6.1-modified/source/base -I/path/to/povray-3.6.1-modified/unix -I/path/to/povray-3.6.1-modified-build /tmp/jitcompiler-sjC6aD -o ./jit-0.so JIT compiler: dlopen(./jit-0.so)... OK JIT compiler: DL_Lookup......................................................... ..........................................................OK JIT Compiler (114 functions): success JIT compiler: VM lookup: POV_JIT_FPU_113 -> 0x40440fe0 JIT compiler: VM lookup: POV_JIT_FPU_76 -> 0x4043ef60 JIT compiler: VM lookup: POV_JIT_FPU_111 -> 0x40440e80 JIT compiler: VM lookup: POV_JIT_FPU_112 -> 0x40440f30 0:00:04 Rendering line 20 of 120 Especially note the red lines. Example scenesFinally, let's look at some examples and benchmarks. All were made using JITC-patched POVRay-3.6 on an idle AthlonXP with 1.47GHz running Linux-2.6 and a graphical display. (The unpatched version of POVRay is called "vanilla" and of course both were compiled with the same compiler using the same options etc.)
// Alex Kluchikov, 2003; mail: klkspa[at]ukr.net, aklk[at]mail.ru function { #declare MPI=16*pi/3; #macro tx() (sqrt(x*x+z*z)-1.5) #end #macro ty() y #end #macro ttx() tx()*sin(radialf(x,y,z)*MPI)+ty()*cos(radialf(x,y,z)*MPI) #end #macro tty() tx()*cos(radialf(x,y,z)*MPI)-ty()*sin(radialf(x,y,z)*MPI) #end pow(pow(ttx()+0.25,2)+pow(tty(),2),1/64)*.33 +pow(pow(ttx()-0.125,2)+pow(tty()+0.216506350946109661690930793,2),1/64)*.33 +pow(pow(ttx()-0.125,2)+pow(tty()-0.216506350946109661690930793,2),1/64)*.33 -.945+sin(radialf(x,y,z)*10*pi)*0.01 }
function { y - 0.3 + (f_noise3d(x/8,0,z/8)-0.5)/2 - fn_crack_large(x,0,z).grey - fn_crack_small(x,0,z).grey/10
function { u*sin(v+sqrt(u))*(1-0.001*sqrt(u)) } function { 15*sqrt(pow(sin(m*v+0.1*sqrt(u))* sin(u*0.5-0.1*sqrt(v))/3, 2)+ pow(cos(m*v+0.1*sqrt(u))*m/u*sin(u*0.5-0.1*sqrt(v))/3, 2)) +sqrt(u)-1/pow(u,3) } function { u*cos(v+sqrt(u))*(1-0.001*sqrt(u)) }
|