最近在看Clickhouse、Palo、TiDB,发现在架构上:Clickhouse缺少了一个query转发层以支持高并发。
Palo的SQL转发层:
TiDB的SQL转发层:
这两个OLAP工具都具备支持多用户在线查询的能力,有一定负载能力。
相对的Clickhouse非常诡异。。我执行如下SQL(数据集结果较大):
SELECT
contract_type,
city_id,
clue_id
FROM
(
SELECT
contract_type,
city_id,
clue_id
FROM bi_contracts_all
)
ALL INNER JOIN
(
SELECT city_id
FROM bi_c2c_car_source_all
) USING (city_id)
就会报错:Memory limit。修改max_memory_usage的值(user.xml) 第一次报错:
Progress: 4.84 million rows, 42.70 MB (45.34 million rows/s., 399.59 MB/s.) 87%
Received exception from server:
Code: 241. DB::Exception: Received from localhost:9000, ::1. DB::Exception: Memory limit (for query) exceeded: would use 12.15 GiB (attempt to allocate chunk of 4294967296 bytes), maximum: 9.31 GiB.
0 rows in set. Elapsed: 48.023 sec. Processed 4.84 million rows, 42.70 MB (100.89 thousand rows/s., 889.21 KB/s.)
内存由9g改为80g,继续报错:
Progress: 4.89 million rows, 43.81 MB (23.23 thousand rows/s., 208.09 KB/s.) 88%
Received exception from server:
Code: 241. DB::Exception: Received from localhost:9000, ::1. DB::Exception: Memory limit (for query) exceeded: would use 80.15 GiB (attempt to allocate chunk of 17179869184 bytes), maximum: 74.51 GiB.
3668208133 rows in set. Elapsed: 235.536 sec. Processed 4.89 million rows, 43.81 MB (20.77 thousand rows/s., 186.00 KB/s.)
这个问题我实在不知道该如何解释。诚然,一个query是不可以用那么大的,工具不能这么用。但是如果我是10个用户同时查8G数据,Clickhouse会不会Kill掉一个查询,这不就蛋疼了嘛。
不知道姿势是否正确,clickhouse集群只能暴露一个host出去给程序\客户端用吗?特别想知道,该如何去构建这个SQL Query的转发层。