Here is some code we used for benchmarking purposes. The idea is to be able to compare the speed of floating point operations with different setups (Kernels, compilers, etc). Bugs, doubts, patches and suggestions to info@emqbit.com. Thanks :)