Experiments >

Clickhouse driver

Experiment #20224th June, 2021by Joshua Nussbaum

For an analytics project I’m working on, I want to store the data in ClickHouse using Ecto.

Unfourtunately the current clickhouse_ecto driver is a bit outdated. It doesn’t support the latest Ecto, but more importantly, it uses the http interface instead of the tcp interface.

My project needs to be high-throughput, so making tons of HTTP requests will be a lot slower than using a persistent TCP connection.

Workaround attempts

Since ClickHouse supports Postgres’s TCP protocol (port 9004) and MySQL’s TCP protocol (port 9004). I figured I’d give those a shot.

So I tried using postgrex, but it runs some postgres-specific queries when the connection is made, and those tables don’t exist in ClickHouse.

I ran into a similar thing with myxql. The connection logic checks if certain capabilities are enabled, but those requests error out with ClickHouse.

Writing a driver

Of course I could always write a new driver. Probably a lot of work, but I decided to give a try and see what it takes.

The funny thing is, the hardest part is probably having to make sense of ClickHouse’s C++ code. Writing the Elixir part is a lot easier, because of the built-in DSL for binary encoding & decoding.

Code

After a few hours of hair pulling, I worked out how to make a connection to the DB and parse the acknowledgment packet.

https://gist.github.com/joshnuss/e96ae827a454fb6de9da8282f6a37f9c

Next I’ll have to implement to query packets (which will probably be a lot harder).

Note

  • Though it was hard to put this together, it was fun to dive into a bunch of random codebases. I had to look at the Python driver, Go driver, and the C++ code.
view all experiments

Stay tuned in

Learn how to add more experimentation to your workflow