Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] tpcds sf1000 core dump #8815

Open
2 of 3 tasks
dataroaring opened this issue Apr 2, 2022 · 4 comments
Open
2 of 3 tasks

[Bug] tpcds sf1000 core dump #8815

dataroaring opened this issue Apr 2, 2022 · 4 comments

Comments

@dataroaring
Copy link
Contributor

dataroaring commented Apr 2, 2022

Search before asking

  • I had searched in the issues and found no similar issues.

Version

dev-1.0.1

What's Wrong?

``
#0 tcmalloc::SLL_Next (t=0x61000000163) at ./src/linked_list.h:45
#1 tcmalloc::SLL_TryPop (list=0x9b9d440, rv=) at ./src/linked_list.h:69
#2 tcmalloc::ThreadCache::FreeList::TryPop (this=0x9b9d440, rv=) at ./src/thread_cache.h:220
#3 tcmalloc::ThreadCache::Allocate (this=0x9b9d3c0, size=48, cl=, oom_handler=) at ./src/thread_cache.h:379
#4 malloc_fast_path<&tcmalloc::cpp_throw_oom> (size=) at src/tcmalloc.cc:1898
#5 tc_new (size=) at src/tcmalloc.cc:2019
#6 0x0000000001dccf81 in std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_construct<char const*> (
this=0x7fddcbd738c0, __beg=0x2a3cc080 "doris::vectorized::IColumn::clone_empty() const", __end=)
at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:219
#7 std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_construct_aux<char const*> (this=0x7fddcbd738c0,
__beg=0x2a3cc080 "doris::vectorized::IColumn::clone_empty() const", __end=)
at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:247
#8 std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_construct<char const*> (this=0x7fddcbd738c0,
__beg=0x2a3cc080 "doris::vectorized::IColumn::clone_empty() const", __end=)
at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:266
#9 std::__cxx11::basic_string<char, std::char_traits, std::allocator >::basic_string<std::allocator > (this=0x7fddcbd738c0,
__s=0x2a3cc080 "doris::vectorized::IColumn::clone_empty() const", __a=...)
at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:527
#10 boost::core::demangle[abi:cxx11](char const*) (name=)
at /root/regression/incubator-doris/thirdparty/installed/include/boost/core/demangle.hpp:99
#11 0x0000000001dccb06 in boost::stacktrace::detail::to_string_impl_baseboost::stacktrace::detail::to_string_using_backtrace::operator()[abi:cxx11](void const*) (this=this@entry=0x7fddcbd73950, addr=0x1f76451 <doris::vectorized::IColumn::clone_empty() const+17>)
at /root/regression/incubator-doris/thirdparty/installed/include/boost/stacktrace/detail/frame_unwind.ipp:41
#12 0x0000000001dcc948 in boost::stacktrace::detail::to_string[abi:cxx11](boost::stacktrace::frame const*, unsigned long) (frames=0x1b031b00, size=18)
at /root/regression/incubator-doris/thirdparty/installed/include/boost/stacktrace/detail/frame_unwind.ipp:76
#13 0x0000000001dc95a6 in boost::stacktrace::to_string<std::allocatorboost::stacktrace::frame > (bt=...)
at /root/regression/incubator-doris/thirdparty/installed/include/boost/stacktrace/stacktrace.hpp:402
#14 boost::stacktrace::operator<< <char, std::char_traits, std::allocatorboost::stacktrace::frame > (bt=..., os=...)
at /root/regression/incubator-doris/thirdparty/installed/include/boost/stacktrace/stacktrace.hpp:408
#15 doris::signal::(anonymous namespace)::FailureSignalHandler (signal_number=, signal_info=, ucontext=)
at /root/regression/incubator-doris/be/src/common/signal_handler.h:420
#16
#17 tcmalloc::SLL_Next (t=0x61000000163) at ./src/linked_list.h:45
#18 tcmalloc::SLL_TryPop (list=0x9b9d440, rv=) at ./src/linked_list.h:69
#19 tcmalloc::ThreadCache::FreeList::TryPop (this=0x9b9d440, rv=) at ./src/thread_cache.h:220
#20 tcmalloc::ThreadCache::Allocate (this=0x9b9d3c0, size=48, cl=, oom_handler=) at ./src/thread_cache.h:379
#21 malloc_fast_path<&tcmalloc::cpp_throw_oom> (size=) at src/tcmalloc.cc:1898
#22 tc_new (size=, size@entry=40) at src/tcmalloc.cc:2019
#23 0x0000000002537e42 in COWHelper<doris::vectorized::ColumnVectorHelper, doris::vectorized::ColumnVector >::create<>() ()
at /root/regression/incubator-doris/be/src/vec/common/cow.h:413

What You Expected?

does not core dump

How to Reproduce?

WITH ss AS ( SELECT s_store_sk , sum(ss_ext_sales_price) sales , sum(ss_net_profit) profit FROM store_sales , date_dim , store WHERE (ss_sold_date_sk = d_date_sk) AND (d_date BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '30' DAY) ) AND (ss_store_sk = s_store_sk) GROUP BY s_store_sk ) , sr AS ( SELECT s_store_sk , sum(sr_return_amt) returns , sum(sr_net_loss) profit_loss FROM store_returns , date_dim , store WHERE (sr_returned_date_sk = d_date_sk) AND (d_date BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '30' DAY)) AND (sr_store_sk = s_store_sk) GROUP BY s_store_sk ) , cs AS ( SELECT cs_call_center_sk , sum(cs_ext_sales_price) sales , sum(cs_net_profit) profit FROM catalog_sales , date_dim WHERE (cs_sold_date_sk = d_date_sk) AND (d_date BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '30' DAY)) GROUP BY cs_call_center_sk ) , cr AS ( SELECT cr_call_center_sk , sum(cr_return_amount) returns , sum(cr_net_loss) profit_loss FROM catalog_returns , date_dim WHERE (cr_returned_date_sk = d_date_sk) AND (d_date BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '30' DAY)) GROUP BY cr_call_center_sk ) , ws AS ( SELECT wp_web_page_sk , sum(ws_ext_sales_price) sales , sum(ws_net_profit) profit FROM web_sales , date_dim , web_page WHERE (ws_sold_date_sk = d_date_sk) AND (d_date BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '30' DAY)) AND (ws_web_page_sk = wp_web_page_sk) GROUP BY wp_web_page_sk ) , wr AS ( SELECT wp_web_page_sk , sum(wr_return_amt) returns , sum(wr_net_loss) profit_loss FROM web_returns , date_dim , web_page WHERE (wr_returned_date_sk = d_date_sk) AND (d_date BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '30' DAY)) AND (wr_web_page_sk = wp_web_page_sk) GROUP BY wp_web_page_sk ) SELECT channel , id , sum(sales) sales , sum(returns) returns , sum(profit) profit FROM ( SELECT 'store channel' channel , ss.s_store_sk id , sales , COALESCE(returns, 0) returns , (profit - COALESCE(profit_loss, 0)) profit FROM ss LEFT JOIN sr ON (ss.s_store_sk = sr.s_store_sk) UNION ALL SELECT 'catalog channel' channel , cs_call_center_sk id , sales , returns , (profit - profit_loss) profit FROM cs , cr UNION ALL SELECT 'web channel' channel , ws.wp_web_page_sk id , sales , COALESCE(returns, 0) returns , (profit - COALESCE(profit_loss, 0)) profit FROM ws LEFT JOIN wr ON (ws.wp_web_page_sk = wr.wp_web_page_sk) ) x GROUP BY ROLLUP (channel, id) ORDER BY channel ASC, id ASC, sales ASC LIMIT 100

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@zhangstar333
Copy link
Contributor

I want to reproduce this, but I can't running, you can put the sql number about tpcds
ERROR 1105 (HY000): errCode = 2, detailMessage = aggregate function cannot contain aggregate parameters: sum(sum(sales))

@dataroaring
Copy link
Contributor Author

It should be q77.

@zhangstar333
Copy link
Contributor

It should be q77.
I test newest dev-1.0.1. it's could get result;
Are there any other steps need?

dataroaring added a commit to dataroaring/incubator-doris that referenced this issue Apr 2, 2022
It is a mistake to cast ColumnNullable* to ColumnVectorHelper*.
UBSAN reports below messages:
runtime error: member call on address 0x00002af70680 which does not point to a
n object of type 'doris::vectorized::ColumnVectorHelper'

The be core dump running tpcds q77 on 1T data set without UBSAN.

Fix apache#8815.
@dataroaring
Copy link
Contributor Author

runtime error: member call on address 0x00002af70680 which does not point to a
n object of type 'doris::vectorized::ColumnVectorHelper'

dataroaring added a commit to dataroaring/incubator-doris that referenced this issue Apr 2, 2022
When I builded doris be with ubsan enabled and enabled vectorization,
be core dump at doris::DecimalV2Value::operator long(). It cored
because accessing on a non-aligned address by sse.

With ubsan enabled, compile generates different assemble code including
sse instruction.

A sender serializes tuples to a continugous memory area, while a recver
just copy it. So we should align each tuple offset to 16 bytes.

For compatibility, we should use a config to control it.

BTW: with tools like ubsan, asan, tsan we can find bugs more easily,
e.g. apache#8815. It is
difficult to find the bug without ubsan.

Anyway, we should use mordern tools to be more productivity.
dataroaring added a commit to dataroaring/incubator-doris that referenced this issue Jun 13, 2022
When I builded doris be with ubsan enabled and enabled vectorization,
be core dump at doris::DecimalV2Value::operator long(). It cored
because accessing on a non-aligned address by sse.

With ubsan enabled, compile generates different assemble code including
sse instruction.

A sender serializes tuples to a continugous memory area, while a recver
just copy it. So we should align each tuple offset to 16 bytes.

For compatibility, we should use a config to control it.

BTW: with tools like ubsan, asan, tsan we can find bugs more easily,
e.g. apache#8815. It is
difficult to find the bug without ubsan.

Anyway, we should use mordern tools to be more productivity.
dataroaring added a commit to dataroaring/incubator-doris that referenced this issue Jun 16, 2022
When I builded doris be with ubsan enabled and enabled vectorization,
be core dump at doris::DecimalV2Value::operator long(). It cored
because accessing on a non-aligned address by sse.

With ubsan enabled, compile generates different assemble code including
sse instruction.

A sender serializes tuples to a continugous memory area, while a recver
just copy it. So we should align each tuple offset to 16 bytes.

For compatibility, we should use a config to control it.

BTW: with tools like ubsan, asan, tsan we can find bugs more easily,
e.g. apache#8815. It is
difficult to find the bug without ubsan.

Anyway, we should use mordern tools to be more productivity.
dataroaring added a commit to dataroaring/incubator-doris that referenced this issue Jun 17, 2022
When I builded doris be with ubsan enabled and enabled vectorization,
be core dump at doris::DecimalV2Value::operator long(). It cored
because accessing on a non-aligned address by sse.

With ubsan enabled, compile generates different assemble code including
sse instruction.

A sender serializes tuples to a continugous memory area, while a recver
just copy it. So we should align each tuple offset to 16 bytes.

For compatibility, we should use a config to control it.

BTW: with tools like ubsan, asan, tsan we can find bugs more easily,
e.g. apache#8815. It is
difficult to find the bug without ubsan.

Anyway, we should use mordern tools to be more productivity.
morningman pushed a commit that referenced this issue Jun 23, 2022
…san enabled (#8831)

When I builded doris be with ubsan enabled and enabled vectorization,
be core dump at doris::DecimalV2Value::operator long(). It cored
because accessing on a non-aligned address by sse.

With ubsan enabled, compile generates different assemble code including
sse instruction.

A sender serializes tuples to a contiguous memory area, while a receiver
just copy it. So we should align each tuple offset to 16 bytes.

For compatibility, we should use a config to control it.

BTW: with tools like ubsan, asan, tsan we can find bugs more easily,
e.g. #8815. It is difficult to find the bug without ubsan.

Anyway, we should use modern tools to be more productive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants