首页 > 系统相关 >ClickHouse中“大列”造成的JOIN的内存超限问题

ClickHouse中“大列”造成的JOIN的内存超限问题

时间:2024-01-21 19:12:52浏览次数:33  
标签:大列 ch JOIN pro DB ubuntu home depot ClickHouse

ClickHouse中“大列”造成的JOIN的内存超限问题

“大列”是指单行数据量非常大的列,通常是100KiB以上。这样的列会导致JOIN(通常LEFT JOIN 和 INNER JOIN)出现内存超限的异常。

常用的JOIN算法

这里讨论的是常用的JOIN算法:partial merge join 与 hash join。Direct join算法不在本文讨论范围。

内存超限的错误一般长这样:

2024.01.16 17:04:30.084819 [ 3654586 ] {52653888-f266-443a-beba-348c66a88e60} <Error> TCPHandler: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 33.61 GiB (attempt to allocate chunk of 34359738368 bytes), maximum: 27.94 GiB.: While executing JoiningTransform. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):

测试数据

为了分析“大列”的JOIN运算出现内存超限问题,这里准备了特定的测试数据。

测试数据为两张表:

table1 - 相当于N表,10W行,两列:id,fk(外键)

table2 - 相当于1表,1W行,两列:id,value(value为超长字符串列,每个cell大小500K左右)

创建测试数据的查询如下:

drop database if exists `big_columns_join`;
create database `big_columns_join`;
use big_columns_join;

create table `big_columns_join`.`big_columns_table` (
    `id` UInt64,
    `fk` UInt64
) ENGINE = MergeTree() order by id
as (select 
        `number`,
        rand(4) % 10000
    from numbers(100000)
    settings max_block_size = 100
);

create table `big_columns_join`.`dim_table1` (
    `id` UInt64,
    `value` String
) ENGINE = MergeTree() order by id
as (select
        `number`,
        repeat(toString(rand(1)), 100000)
    from numbers(10000));

测试查询

测试查询是简单的LEFT JOIN。

select
    *
from
    `big_columns_join`.`big_columns_table`
    left join `big_columns_join`.`dim_table1` on `big_columns_join`.`big_columns_table`.`fk` = `big_columns_join`.`dim_table1`.`id`
format Null;

Hash join算法

运行测试查询之前,设定max_joined_block_size_rows阈值。

set join_algorithm = 'hash';
set max_joined_block_size_rows=10000;  -- 阈值设小,需要内存会变少。
-- 运行测试查询

测试结果是:

JOIN算法为hash join时,无论如何调整max_joined_block_size_rows的阈值大小,均会内存超限。

报错信息如下:

0. Poco::Exception::Exception(String const&, int) @ 0x000000001504f759 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000b243077 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
2. DB::Exception::Exception<char const*, char const*, String, long&, String, char const*, std::basic_string_view<char, std::char_traits<char>>>(int, FormatStringHelperImpl<std::type_identity<char const*>::type, std::type_identity<char const*>::type, std::type_identity<String>::type, std::type_identity<long&>::type, std::type_identity<String>::type, std::type_identity<char const*>::type, std::type_identity<std::basic_string_view<char, std::char_traits<char>>>::type>, char const*&&, char const*&&, String&&, long&, String&&, char const*&&, std::basic_string_view<char, std::char_traits<char>>&&) @ 0x000000000b255d97 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
3. MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x000000000b2551b9 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
4. MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x000000000b254d6a in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
5. CurrentMemoryTracker::alloc(long) @ 0x000000000b21b30f in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
6. Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x000000000b218368 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
7. void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 63ul, 64ul>::resize<>(unsigned long) @ 0x0000000006343dc6 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
8. DB::ColumnString::insertFrom(DB::IColumn const&, unsigned long) @ 0x0000000011c0ae2c in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
9. void DB::(anonymous namespace)::AddedColumns::appendFromBlock<false>(DB::Block const&, unsigned long) @ 0x00000000113d24ca in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
10. DB::PODArray<char8_t, 4096ul, Allocator<false, false>, 63ul, 64ul> DB::(anonymous namespace)::joinRightColumns<(DB::JoinKind)1, (DB::JoinStrictness)3, DB::ColumnsHashing::HashMethodOneNumber<PairNoInit<unsigned long, DB::RowRefList>, DB::RowRefList const, unsigned long, false, true, false>, HashMapTable<unsigned long, HashMapCell<unsigned long, DB::RowRefList, DefaultHash<unsigned long>, HashTableNoState, PairNoInit<unsigned long, DB::RowRefList>>, DefaultHash<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>(std::vector<DB::ColumnsHashing::HashMethodOneNumber<PairNoInit<unsigned long, DB::RowRefList>, DB::RowRefList const, unsigned long, false, true, false>, std::allocator<DB::ColumnsHashing::HashMethodOneNumber<PairNoInit<unsigned long, DB::RowRefList>, DB::RowRefList const, unsigned long, false, true, false>>>&&, std::vector<HashMapTable<unsigned long, HashMapCell<unsigned long, DB::RowRefList, DefaultHash<unsigned long>, HashTableNoState, PairNoInit<unsigned long, DB::RowRefList>>, DefaultHash<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>> const*, std::allocator<HashMapTable<unsigned long, HashMapCell<unsigned long, DB::RowRefList, DefaultHash<unsigned long>, HashTableNoState, PairNoInit<unsigned long, DB::RowRefList>>, DefaultHash<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>> const*>> const&, DB::(anonymous namespace)::AddedColumns&, DB::JoinStuff::JoinUsedFlags&) @ 0x00000000113d602a in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
11. void DB::HashJoin::joinBlockImpl<(DB::JoinKind)1, (DB::JoinStrictness)3, DB::HashJoin::MapsTemplate<DB::RowRefList>>(DB::Block&, DB::Block const&, std::vector<DB::HashJoin::MapsTemplate<DB::RowRefList> const*, std::allocator<DB::HashJoin::MapsTemplate<DB::RowRefList> const*>> const&, bool) const @ 0x0000000011493867 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
12. DB::HashJoin::joinBlock(DB::Block&, std::shared_ptr<DB::ExtraBlock>&) @ 0x00000000113a40e9 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
13. DB::JoiningTransform::readExecute(DB::Chunk&) @ 0x0000000012a32b90 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
14. DB::JoiningTransform::transform(DB::Chunk&) @ 0x0000000012a31905 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
15. DB::JoiningTransform::work() @ 0x0000000012a313aa in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
16. DB::ExecutionThreadContext::executeTask() @ 0x0000000012810d13 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
17. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x0000000012808ad0 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
18. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads()::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x000000001280965b in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
19. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::worker(std::__list_iterator<ThreadFromGlobalPoolImpl<false>, void*>) @ 0x000000000b317483 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
20. ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<void ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>(void&&)::'lambda'()::operator()() @ 0x000000000b31a5bf in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
21. ThreadPoolImpl<std::thread>::worker(std::__list_iterator<std::thread, void*>) @ 0x000000000b314b07 in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
22. void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000b318a2e in /home/ubuntu/depot/ch-pro/build-release/programs/clickhouse
23. ? @ 0x00007f6db212b609 in ?
24. ? @ 0x00007f6db2050133 in ?

模板方法joinRightColumns根据join key查找对应的右边表的行,找到行对应的row_number,然后通过调用appendFromBlock(block, row_numer)完成一行的JOIN结果。

appendFromBlock每次调用IColumn::insertFrom插入一行。这里有个问题:ColumnString的内容增长是一行一行地增长,在这个过程中涉及到多个reallocation内存重分配,并造成大量的内存之间的数据复制。

parital merge join算法

运行测试查询之前,设定max_joined_block_size_rows阈值。

set join_algorithm = 'parital_merge';
set max_joined_block_size_rows=10000;  -- 阈值设小,需要内存会变少。
-- 第一次运行测试查询

set max_joined_block_size_rows=1000000;  -- 阈值设大,需要内存可能会更多。
-- 第二次运行测试查询

测试结果是:

JOIN算法为parital merge join时,调小max_joined_block_size_rows的阈值,内存就不会超限。

当max_joined_block_size_rows比较大时(上例中第二次运行测试查询),会发生内存超限问题。

出错信息:

2024.01.16 17:04:30.084819 [ 3654586 ] {52653888-f266-443a-beba-348c66a88e60} <Error> TCPHandler: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 33.61 GiB (attempt to allocate chunk of 34359738368 bytes), maximum: 27.94 GiB.: While executing JoiningTransform. (MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below):

0. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/exception:134: std::exception::capture() @ 0x000000000b609a42 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
1. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/exception:112: std::exception::exception[abi:v15000]() @ 0x000000000b609a0d in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
2. /home/ubuntu/depot/ch-pro/base/poco/Foundation/src/Exception.cpp:27: Poco::Exception::Exception(String const&, int) @ 0x000000002562ed40 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
3. /home/ubuntu/depot/ch-pro/src/Common/Exception.cpp:96: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000144166ee in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
4. /home/ubuntu/depot/ch-pro/src/Common/Exception.h:73: DB::Exception::Exception(String&&, int, bool) @ 0x000000000b5fba8a in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
5. /home/ubuntu/depot/ch-pro/src/Common/Exception.h:101: DB::Exception::Exception<char const*, char const*, String, long&, String, char const*, std::basic_string_view<char, std::char_traits<char>>>(int, FormatStringHelperImpl<std::type_identity<char const*>::type, std::type_identity<char const*>::type, std::type_identity<String>::type, std::type_identity<long&>::type, std::type_identity<String>::type, std::type_identity<char const*>::type, std::type_identity<std::basic_string_view<char, std::char_traits<char>>>::type>, char const*&&, char const*&&, String&&, long&, String&&, char const*&&, std::basic_string_view<char, std::char_traits<char>>&&) @ 0x0000000014435b3d in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
6. /home/ubuntu/depot/ch-pro/src/Common/MemoryTracker.cpp:339: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x00000000144340b0 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
7. /home/ubuntu/depot/ch-pro/src/Common/MemoryTracker.cpp:397: MemoryTracker::allocImpl(long, bool, MemoryTracker*, double) @ 0x0000000014434402 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
8. /home/ubuntu/depot/ch-pro/src/Common/CurrentMemoryTracker.cpp:58: CurrentMemoryTracker::allocImpl(long, bool) @ 0x00000000143bb7db in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
9. /home/ubuntu/depot/ch-pro/src/Common/CurrentMemoryTracker.cpp:90: CurrentMemoryTracker::alloc(long) @ 0x00000000143bb9e1 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
10. /home/ubuntu/depot/ch-pro/src/Common/Allocator.h:161: Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) @ 0x00000000143b5eba in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
11. /home/ubuntu/depot/ch-pro/src/Common/PODArray.h:165: void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 63ul, 64ul>::realloc<>(unsigned long) @ 0x000000000b667142 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
12. /home/ubuntu/depot/ch-pro/src/Common/PODArray.h:0: void DB::PODArrayBase<1ul, 4096ul, Allocator<false, false>, 63ul, 64ul>::resize<>(unsigned long) @ 0x000000000b666ee6 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
13. /home/ubuntu/depot/ch-pro/src/Columns/ColumnString.h:147: DB::ColumnString::insertFrom(DB::IColumn const&, unsigned long) @ 0x000000001f0eaaff in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
14. /home/ubuntu/depot/ch-pro/src/Columns/IColumn.h:179: DB::IColumn::insertManyFrom(DB::IColumn const&, unsigned long, unsigned long) @ 0x000000000b60192b in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
15. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:404: DB::(anonymous namespace)::copyRightRange(DB::Block const&, DB::Block const&, std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn>>>&, unsigned long, unsigned long) @ 0x000000001e999757 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
16. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:432: bool DB::(anonymous namespace)::joinEquals<true>(DB::Block const&, DB::Block const&, DB::Block const&, std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn>>>&, std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn>>>&, DB::MergeJoinEqualRange&, unsigned long) @ 0x000000001e997d86 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
17. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:892: bool DB::MergeJoin::leftJoin<true>(DB::MergeJoinCursor&, DB::Block const&, DB::MergeJoin::RightBlockInfo&, std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn>>>&, std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn>>>&, unsigned long&) @ 0x000000001e9b2a7d in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
18. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:793: void DB::MergeJoin::joinSortedBlock<false, true>(DB::Block&, std::shared_ptr<DB::ExtraBlock>&) @ 0x000000001e99b738 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
19. /home/ubuntu/depot/ch-pro/src/Interpreters/MergeJoin.cpp:733: DB::MergeJoin::joinBlock(DB::Block&, std::shared_ptr<DB::ExtraBlock>&) @ 0x000000001e997512 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
20. /home/ubuntu/depot/ch-pro/src/Processors/Transforms/JoiningTransform.cpp:206: DB::JoiningTransform::readExecute(DB::Chunk&) @ 0x00000000208ca1e3 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
21. /home/ubuntu/depot/ch-pro/src/Processors/Transforms/JoiningTransform.cpp:191: DB::JoiningTransform::transform(DB::Chunk&) @ 0x00000000208c9f4e in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
22. /home/ubuntu/depot/ch-pro/src/Processors/Transforms/JoiningTransform.cpp:125: DB::JoiningTransform::work() @ 0x00000000208c985d in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
23. /home/ubuntu/depot/ch-pro/src/Processors/Executors/ExecutionThreadContext.cpp:47: DB::executeJob(DB::ExecutingGraph::Node*, DB::ReadProgressCallback*) @ 0x00000000203e5263 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
24. /home/ubuntu/depot/ch-pro/src/Processors/Executors/ExecutionThreadContext.cpp:95: DB::ExecutionThreadContext::executeTask() @ 0x00000000203e4fa0 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
25. /home/ubuntu/depot/ch-pro/src/Processors/Executors/PipelineExecutor.cpp:251: DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x00000000203ca041 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
26. /home/ubuntu/depot/ch-pro/src/Processors/Executors/PipelineExecutor.cpp:217: DB::PipelineExecutor::executeSingleThread(unsigned long) @ 0x00000000203ca357 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
27. /home/ubuntu/depot/ch-pro/src/Processors/Executors/PipelineExecutor.cpp:341: DB::PipelineExecutor::spawnThreads()::$_0::operator()() const @ 0x00000000203cb098 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
28. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/__functional/invoke.h:394: decltype(std::declval<DB::PipelineExecutor::spawnThreads()::$_0&>()()) std::__invoke[abi:v15000]<DB::PipelineExecutor::spawnThreads()::$_0&>(DB::PipelineExecutor::spawnThreads()::$_0&) @ 0x00000000203caff5 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
29. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/__functional/invoke.h:480: void std::__invoke_void_return_wrapper<void, true>::__call<DB::PipelineExecutor::spawnThreads()::$_0&>(DB::PipelineExecutor::spawnThreads()::$_0&) @ 0x00000000203cafd5 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
30. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/__functional/function.h:235: std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads()::$_0, void ()>::operator()[abi:v15000]() @ 0x00000000203cafb5 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse
31. /home/ubuntu/depot/ch-pro/contrib/llvm-project/libcxx/include/__functional/function.h:716: void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads()::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x00000000203caf80 in /home/ubuntu/depot/ch-pro/build-debug/programs/clickhouse

MergeJoin::leftJoin负责partial merge join的LEFT JOIN运算的实现,最终会调用::copyRightRange方法,复制右边表的内容。在此期间IColumn::insertFrom会被调用,这会涉及到调用realloc重分配内存。在这个点上可能会发生内存超限的异常。

LEFT ANY JOIN

LEFT ANY JOIN走不同代码分支,该代码分支忽略了setting配置max_joined_block_size_rows,因此即使调小了max_joined_block_size_rows,LEFT ANY JOIN依然会产生内存超限异常。

“auto”的内存问题

根据文档,设置join_algorithm='auto' 以启用自动切换JOIN算法的模式,ClickHouse先尝试用hash join算法,然后在内存超过阈值的情况下切换到partial merge join算法。阈值有两个:max_bytes_in_join和max_rows_in_join,指定hash join算法创建的hash table的最大的占用内存字节数和行数。两个阈值只要有一个超过就算是超过阈值。

但是在实践中发现,原先的尝试的hash join所消耗的内存仍然没有释放,导致在切换到partial merge join后在原本不应该超限的情况下仍然产生内存超限的异常。参考以下示例:

SELECT
    t1.id AS id,
    t2.value AS value
FROM big_columns_join.big_columns_table AS t1
LEFT JOIN big_columns_join.dim_table1 AS t2 ON t1.fk = t2.id
SETTINGS join_algorithm = 'auto', max_bytes_in_join = '15G'
FORMAT `Null`;

运行日志如下:

2024.01.19 11:31:40.580304 [ 3654586 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> JoiningTransform: Before join block: 'id UInt64 UInt64(size = 0), fk UInt64 UInt64(size = 0)'
2024.01.19 11:31:40.580692 [ 3654586 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> JoiningTransform: After join block: 'id UInt64 UInt64(size = 0), fk UInt64 UInt64(size = 0), value String String(size = 0)'
2024.01.19 11:31:40.680382 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 1.01 GiB.
2024.01.19 11:31:40.753297 [ 3654852 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 2.01 GiB.
2024.01.19 11:31:40.833103 [ 3654862 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 3.01 GiB.
2024.01.19 11:31:40.905190 [ 3654854 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 4.01 GiB.
2024.01.19 11:31:41.047932 [ 3654900 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 6.01 GiB.
2024.01.19 11:31:41.117219 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 7.00 GiB.
2024.01.19 11:31:41.198586 [ 3654851 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 8.01 GiB.
2024.01.19 11:31:41.272147 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 9.00 GiB.
2024.01.19 11:31:41.342679 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 10.00 GiB.
2024.01.19 11:31:41.416512 [ 3654851 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 11.01 GiB.
2024.01.19 11:31:41.486753 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 12.01 GiB.
2024.01.19 11:31:41.560137 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 13.00 GiB.
2024.01.19 11:31:41.630755 [ 3654862 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 14.01 GiB.
2024.01.19 11:31:41.633781 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Trace> HashJoin: (0x7f8299383918) Join data is being released, 15001390080 bytes and 8940 rows in hash table
2024.01.19 11:31:41.666781 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MergeJoin: Joining keys: left [fk], right [t2.id]
2024.01.19 11:31:41.666943 [ 3654935 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Trace> HashJoin: (0x7f8299383918) Join data has been already released
2024.01.19 11:31:41.764950 [ 3654852 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 15.00 GiB.
2024.01.19 11:31:42.024590 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 16.14 GiB.
2024.01.19 11:31:42.221853 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 17.64 GiB.
2024.01.19 11:31:42.481914 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 19.64 GiB.
2024.01.19 11:31:43.001865 [ 3654941 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Debug> MemoryTracker: Current memory usage (for query): 23.64 GiB.
2024.01.19 11:31:44.610495 [ 3654586 ] {506df1da-8f03-480a-b263-a83a5b8277f0} <Error> executeQuery: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 31.64 GiB (attempt to allocate chunk of 17179869184 bytes), maximum: 27.94 GiB.: While executing MergeSortingTransform: While executing FillingRightJoinSide. (MEMORY_LIMIT_EXCEEDED) (version 23.8.3.1.alex-build-debug) (from [::ffff:127.0.0.1]:33076) (in query: select t1.id as id, t2.value as `value` from `big_columns_join`.`big_columns_table` as t1 left join `big_columns_join`.`dim_table1` as t2 on t1.fk = t2.id settings join_algorithm='auto',max_bytes_in_join=15000000000 format Null;), Stack trace (when copying this message, always include the lines below):

从日志可以看到,首先采用了hash join,然后内存使用量一路飙升到15GiB的时候切换到了partial merge join(日志第17行),原先hash join所使用的15GiB内存将被释放(日志第16行)。最后hash join的内存被释放(日志第18行),但是实际内存使用量并没有减少,仍然从15GiB开始继续增长。留给partial merge join的可用内存不多,因此很快产生内存超限MEMORY_LIMIT_EXCEEDED异常。

目前找到的解决办法只能是调小阈值,但是这样导致原本可以使用hash join的情况下却被迫切换到相对慢速的partial merge join。影响整体性能。

标签:大列,ch,JOIN,pro,DB,ubuntu,home,depot,ClickHouse
From: https://www.cnblogs.com/chengxin1985/p/17978176

相关文章

  • ClickHouse入门
    简介:ClickHouse是俄罗斯Yandex公司使用C++语言编写开源的列式存储数据库,主要用于在线分析查询(OLAP),使用SQL查询实时生成分析数据报告。一、安装 二、入门1. 特点A.列式存储:较MySQL行式存储有利于列的聚合、计数等操作,针对每列可选择最优的数据压缩算法来压缩......
  • 论文翻译 | 【深入挖掘Java技术】「底层原理专题」深入分析一下并发编程之父Doug Lea
    前提介绍DougLea在州立大学奥斯威戈分校(DougLea)摘要本文深入探讨了一个Java框架的设计、实现及其性能。该框架遵循并行编程的理念,通过递归方式将问题分解为多个子任务,并利用工作窃取技术进行并行处理。所有子任务完成后,其结果被整合以形成完整的并行程序。在总体设计上,该框架借鉴......
  • 无涯教程-SQL - CARTESIAN JOIN函数
    CARTESIANJOIN或CROSSJOIN从两个或多个联接表中返回记录集的笛卡尔积。CARTESIANJOIN-语法CARTESIANJOIN或CROSSJOIN的基本语法如下-SELECTtable1.column1,table2.column2...FROMtable1,table2[,table3]CARTESIANJOIN-示例请考虑以下两个表。表1-C......
  • 无涯教程-SQL - FULL JOIN函数
    SQLFULLJOIN组合了左右外部联接的输出,联接的表将包含两个表中的所有记录,并为任一侧缺少的匹配项填充NULL。FULLJOIN-语法FULLJOIN的基本语法如下-SELECTtable1.column1,table2.column2...FROMtable1FULLJOINtable2ONtable1.common_field=table2.common_fie......
  • 无涯教程-SQL - RIGHT JOIN函数
    SQLRIGHTJOIN返回右表中的所有行,即使左表中没有匹配项也是如此,这意味着右连接返回右表中的所有值,再加上左表中的匹配值,如果没有匹配则返回NULL。RIGHTJOIN-语法RIGHTJOIN的基本语法如下。SELECTtable1.column1,table2.column2...FROMtable1RIGHTJOINtable2O......
  • SQL Join的一些总结
    SQLJoin的一些总结 1.1.1摘要Join是关系型数据库系统的重要操作之一,SQLServer中包含的常用Join:内联接、外联接和交叉联接等。如果我们想在两个或以上的表获取其中从一个表中的行与另一个表中的行匹配的数据,这时我们应该考虑使用Join,因为Join具体联接表或函数进行查询的特......
  • 【flink番外篇】9、Flink Table API 支持的操作示例(14)- 时态表的join(java版本)
    文章目录Flink系列文章一、maven依赖二、时态表的join1、统计需求对应的SQL2、Withoutconnnector实现代码3、Withconnnector实现代码1)、bean定义2)、序列化定义3)、实现本文通过两个示例介绍了时态表TemporalTableFunction的join操作。本文除了maven依赖外,没有其他依赖。一、m......
  • 【flink番外篇】9、Flink Table API 支持的操作示例(8)- 时态表的join(scala版本)
    文章目录Flink系列文章一、maven依赖二、示例:时态表的join(scala版本)1)、统计需求对应的SQL2)、Withoutconnnector实现代码3)、WithCSVConnector实现代码本文给以scala的语言给出来TableAPI针对时态表的join操作。本文除了maven依赖外,没有其他依赖。本文需要有kafka的运行环境......
  • 【flink番外篇】9、Flink Table API 支持的操作示例(7)- 表的join操作(内联接、外联接以
    文章目录Flink系列文章一、maven依赖二、示例:表的join操作(内联接、外联接以及联接自定义函数等)本文介绍了表的join主要操作,比如内联接、外联接以及联接自定义函数等。本文除了maven依赖外,没有其他依赖。一、maven依赖本文maven依赖参考文章:【flink番外篇】9、FlinkTableAPI支......
  • thinkphp6的join连表查询
    说明join要关联的(完整)表名以及别名,支持三种写法:写法1:['完整表名或者子查询'=>'别名']写法2:'完整表名别名'写法3:'不带数据表前缀的表名'condition关联条件,可以为字符串或数组,为数组时每一个元素都是一个关联条件。type关联类型,可以为:INNER、LEFT、RIGHT、FULL,......