The big things remaining to prepare my presentation are
- Get a solid understanding of HyPer and Mutable
- Finish up benchmarking
- … finish making the slides themselves
So let’s talk about slides, and T1W8 Benchmarking talks about benchmarks
Efficiently Compiling Efficient Query Plans for Modern Hardware
- This paper is about HyPer’s compiler
- They argue the volcano style comes from a time when I/O really dominated runtimes so it wasn’t really worth optimising this part. The benefit of being able to develop faster was better
- A lot of other databases remedy this by using the vector model, but that’s really just the volcano model with larger tuples to use cache locality a bit better. This however, also bring back the issue of adding complexity and removes the big advantage of pipelining
- They tried using C++ only, but found that the C++ compiler would spend too much time compiling and not do particularly good things in its caches
- Instead they opted to use LLVM for most code except for complicated functions like sorting
- This dropped their compile times down significantly because the C++ components would be precompiled then they can use JIT on the LLVM instead
- They talk about how writing the LLVM optimally had quite a big impact, especially around branch predication, avoiding inlining everything into one function,
Mutable
-
Mutable was initially created to enable their research on JIT in SQL to wasm. Their EDBT 2023 paper talks about it
-
As seen above, their benchmarks brag about significantly better compile times and execution times than HyPer
At this point I started recording my thoughts straight onto the slides instead.