For an analytics project I’m working on, I want to store the data in ClickHouse using Ecto.
Unfourtunately the current clickhouse_ecto driver is a bit outdated. It doesn’t support the latest Ecto, but more importantly, it uses the http
interface instead of the tcp
interface.
My project needs to be high-throughput, so making tons of HTTP requests will be a lot slower than using a persistent TCP connection.
Workaround attempts
Since ClickHouse supports Postgres’s TCP protocol (port 9004) and MySQL’s TCP protocol (port 9004). I figured I’d give those a shot.
So I tried using postgrex, but it runs some postgres-specific queries when the connection is made, and those tables don’t exist in ClickHouse.
I ran into a similar thing with myxql. The connection logic checks if certain capabilities are enabled, but those requests error out with ClickHouse.
Writing a driver
Of course I could always write a new driver. Probably a lot of work, but I decided to give a try and see what it takes.
The funny thing is, the hardest part is probably having to make sense of ClickHouse’s C++ code. Writing the Elixir part is a lot easier, because of the built-in DSL for binary encoding & decoding.
Code
After a few hours of hair pulling, I worked out how to make a connection to the DB and parse the acknowledgment packet.
https://gist.github.com/joshnuss/e96ae827a454fb6de9da8282f6a37f9c
Next I’ll have to implement to query packets (which will probably be a lot harder).