refactor to use wrapper function to wrap all the codes in cast_string_to_function.h #2261

AEsir777 · 2023-10-24T21:51:13Z

fix castToInt128 error message
replace all cast operations with operation warpper
add test coverage

test/test_files/tinysnb/function/play.test

codecov · 2023-10-25T03:15:37Z

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (ecf1dd2) 89.60% compared to head (22c50a2) 89.61%.
Report is 9 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #2261    +/-   ##
========================================
  Coverage   89.60%   89.61%            
========================================
  Files        1016     1024     +8     
  Lines       35808    36031   +223     
========================================
+ Hits        32085    32288   +203     
- Misses       3723     3743    +20

Files	Coverage Δ
src/binder/bind/bind_graph_pattern.cpp	`96.22% <100.00%> (+0.02%)`	⬆️
src/function/cast_string_non_nested_functions.cpp	`100.00% <100.00%> (ø)`
src/function/vector_cast_functions.cpp	`82.96% <100.00%> (+1.80%)`	⬆️
src/function/vector_union_functions.cpp	`91.66% <100.00%> (ø)`
src/include/common/type_utils.h	`100.00% <100.00%> (ø)`
src/include/common/types/types.h	`100.00% <ø> (ø)`
.../cast/functions/cast_string_non_nested_functions.h	`97.33% <100.00%> (ø)`
src/include/function/cast/functions/numeric_cast.h	`100.00% <100.00%> (ø)`
src/include/function/cast/vector_cast_functions.h	`95.34% <100.00%> (-0.57%)`	⬇️
.../include/function/string/vector_string_functions.h	`96.00% <100.00%> (ø)`
... and 17 more

... and 59 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

acquamarin

My general comment is that instead of keeping different stringCast operation, we should have a generic stringCast operation class and information should be passed in as BindData.

bool BaseCSVReader::TryCastVector(Vector &parse_chunk_col, idx_t size, const LogicalType &sql_type) {
	// try vector-cast from string to sql_type
	Vector dummy_result(sql_type);
	if (options.has_format[LogicalTypeId::DATE] && sql_type == LogicalTypeId::DATE) {
		// use the date format to cast the chunk
		string error_message;
		idx_t line_error;
		return TryCastDateVector(options, parse_chunk_col, dummy_result, size, error_message, line_error);
	} else if (options.has_format[LogicalTypeId::TIMESTAMP] && sql_type == LogicalTypeId::TIMESTAMP) {
		// use the timestamp format to cast the chunk
		string error_message;
		return TryCastTimestampVector(options, parse_chunk_col, dummy_result, size, error_message);
	} else {
		// target type is not varchar: perform a cast
		string error_message;
		return VectorOperations::DefaultTryCast(parse_chunk_col, dummy_result, size, &error_message, true);
	}
}

This is how duckdb implements its casting function.

src/include/function/vector_functions.h

acquamarin · 2023-10-25T14:38:46Z

src/include/function/cast/functions/cast_string_to_functions.h

+
+    // nested types
+    template<typename T>
+    static void operation(const char* input, uint64_t len, common::ValueVector* vector,


Our ultimate goal is to only keep one string cast operation. CSVReaderConfig should be passed as bindData, and rowToAdd should be replaced with dstValue. You can take a look at how duckdb achieves this, they have a class called BoundCastInfo. You can change this in later PRs.

src/include/function/cast/functions/cast_string_non_nested_functions.h

src/function/cast_string_to_functions.cpp

AEsir777 · 2023-10-25T20:35:10Z

src/include/function/cast/functions/cast_functions.h

+        throw common::OverflowException{common::stringFormat(
+            "Value {} is not within INT64 range", common::TypeUtils::toString(input).c_str())};
+    };
+}


This line is covered by test

AEsir777 · 2023-10-25T20:36:43Z

src/include/function/cast/functions/cast_string_to_functions.h

+template<>
+inline void CastStringToTypes::operation(const char* input, uint64_t len, bool& result) {
+    castStringToBool(input, len, result);
+}


After checking gdb, this line is covered

AEsir777 · 2023-10-26T19:17:49Z

src/include/function/cast/functions/cast_string_to_functions.h

+    }
+
+    template<typename T>
+    static void operation(const char* input, uint64_t len, T& result,


will combine into 1 function after implementing casting const char* to ku_string_t

cast_string_to_function.h

Riolku · 2023-10-26T19:58:41Z

src/include/function/cast/functions/cast_functions.h

+inline void CastToDouble::operation(common::int128_t& input, double_t& result) {
+    if (!common::Int128_t::tryCast(input, result)) { // LCOV_EXCL_START
+        throw common::OverflowException{common::stringFormat(
+            "Value {} is not within DOUBLE range", common::TypeUtils::toString(input).c_str())};


why is this excluded from coverage?

int128 to double can't trigger overflow since int128 is always in double range

INT128 is always in double range is not true.
Large integer numbers in double are not consecutive because of the precision. For example: 2^100 may be representable in double but 2^100 + 1 cannot be represented in double. I am not sure what is the expected behaviour in this case. Maybe check duckdb's solution. I am sure that c++ just silently loose precision in this case.

Converting to double and losing precision is not surprising. Postgres says that the number stored may be inexact and to use numeric for exactness.

Let's add a comment for why this is unreachable.

duckdb always returns true (with losing precision) and so for us

│ CAST(CAST('170141183460469231731687303715884105727' AS HUGEINT) AS DOUBLE) │

│ double

│ 1.7014118346046923e+38

The point Ziyi is making is that if you add 1 to the integer there, the actual value stored will likely be the same because doubles are not arbitrary precision.

AEsir777 commented Oct 24, 2023

View reviewed changes

test/test_files/tinysnb/function/play.test Outdated Show resolved Hide resolved

AEsir777 force-pushed the cast branch 5 times, most recently from f36dd93 to 0baad2e Compare October 25, 2023 02:56

AEsir777 requested a review from acquamarin October 25, 2023 02:57

AEsir777 force-pushed the cast branch 2 times, most recently from 5f3a5ef to eb65ca0 Compare October 25, 2023 03:02

AEsir777 force-pushed the cast branch from eb65ca0 to f93c9dc Compare October 25, 2023 14:55

acquamarin approved these changes Oct 25, 2023

View reviewed changes

AEsir777 force-pushed the cast branch 3 times, most recently from 0a39452 to f5f6199 Compare October 25, 2023 20:17

AEsir777 commented Oct 25, 2023

View reviewed changes

AEsir777 force-pushed the cast branch 3 times, most recently from d5ebb51 to 21ea21c Compare October 26, 2023 19:09

AEsir777 requested a review from acquamarin October 26, 2023 19:14

AEsir777 commented Oct 26, 2023

View reviewed changes

refactor to use wrapper function to wrap all the codes in

22c50a2

cast_string_to_function.h

AEsir777 force-pushed the cast branch from 21ea21c to 22c50a2 Compare October 26, 2023 19:34

Riolku reviewed Oct 26, 2023

View reviewed changes

AEsir777 merged commit 40a4a16 into master Oct 26, 2023
12 checks passed

AEsir777 deleted the cast branch October 26, 2023 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor to use wrapper function to wrap all the codes in cast_string_to_function.h #2261

refactor to use wrapper function to wrap all the codes in cast_string_to_function.h #2261

AEsir777 commented Oct 24, 2023 •

edited

Loading

codecov bot commented Oct 25, 2023 •

edited

Loading

acquamarin left a comment •

edited

Loading

acquamarin Oct 25, 2023

AEsir777 Oct 25, 2023

AEsir777 Oct 25, 2023

AEsir777 Oct 26, 2023

Riolku Oct 26, 2023

AEsir777 Oct 26, 2023

acquamarin Oct 26, 2023 •

edited

Loading

Riolku Oct 26, 2023 •

edited

Loading

AEsir777 Oct 26, 2023 •

edited

Loading

AEsir777 Oct 26, 2023

Riolku Oct 26, 2023

refactor to use wrapper function to wrap all the codes in cast_string_to_function.h #2261

refactor to use wrapper function to wrap all the codes in cast_string_to_function.h #2261

Conversation

AEsir777 commented Oct 24, 2023 • edited Loading

codecov bot commented Oct 25, 2023 • edited Loading

Codecov Report

acquamarin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acquamarin Oct 26, 2023 • edited Loading

Choose a reason for hiding this comment

Riolku Oct 26, 2023 • edited Loading

Choose a reason for hiding this comment

AEsir777 Oct 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AEsir777 commented Oct 24, 2023 •

edited

Loading

codecov bot commented Oct 25, 2023 •

edited

Loading

acquamarin left a comment •

edited

Loading

acquamarin Oct 26, 2023 •

edited

Loading

Riolku Oct 26, 2023 •

edited

Loading

AEsir777 Oct 26, 2023 •

edited

Loading