Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-434] Improve reliability by resolve the nondeterministic of the order of the Map #1130

Merged
merged 3 commits into from
Nov 22, 2023

Conversation

MyEnthusiastic
Copy link
Contributor

@MyEnthusiastic MyEnthusiastic commented Nov 22, 2023

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

Problem :

when the program reading from the newRdd, it assumes the order of the data to read into the Map

String[] newGeomFields = newGeom.getUserData().toString().split("\t");

However, when it write the data, it did not assume the order according to the Oracle's official document

Map<String, Object> fields = new HashMap<String, Object>();
String[] fieldValues = spatialObject.getUserData().toString().split("\t");

So it will cause the ERROR when the environment has been changed

org.junit.ComparisonFailure: expected:<0[1]> but was:<0[23]>

	at org.junit.Assert.assertEquals(Assert.java:117)
	at org.junit.Assert.assertEquals(Assert.java:146)
	at org.apache.sedona.core.formatMapper.GeoJsonIOTest.testReadWriteGeoJson(GeoJsonIOTest.java:115)

Solution

So we can change the Map to use the LinkedHashMap to guarantee the order to solve the problem

How was this patch tested?

This error can be reproduced with the NondexTool with the following command (you can try that without modifying the pom, but feel free to use the plugin to protect your project and make it safer 😊 ), and this kind problem is widely documented in International Dataset of Flaky Tests (IDoFT)

Did this PR include necessary documentation updates?

  • No, this PR does not affect any functionality public API, only improve the security by making the order deterministic

@Kontinuation
Copy link
Member

Thank you for identifying a bug in the GeoJSON reader. While the proposed fix is appreciated, there is one more important issue we need to fix. The key issue is that the GeoJSON reader should have the capability to process GeoJSONs with properties in random orders within the same dataset.

Additionally, ensuring the consistent ordering of properties in GeoJSON files generated by Sedona is a valuable feature. It appears that this pull request addresses that aspect.

@jiayuasu jiayuasu changed the title Improve reliability by resolve the nondeterministic of the order of the Map [SEDONA-434] Improve reliability by resolve the nondeterministic of the order of the Map Nov 22, 2023
@jiayuasu jiayuasu added this to the sedona-1.5.1 milestone Nov 22, 2023
@jiayuasu
Copy link
Member

@MyEnthusiastic Thanks for the PR. We will address the other problem mentioned by @Kontinuation separately.

@jiayuasu jiayuasu merged commit 6478129 into apache:master Nov 22, 2023
35 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants