[summary] df may get duplicated rows when using `time_index=raw` #426

asnyv · 2022-10-24T06:54:39Z

If using time_index=raw, the resample_smry_dates method is still called to accommodate for functionality like start_date, end_date and normalize, and then used as input when fetching the summary data through ecl. If one or more timesteps are shorter than the resolution of the TIME vector, resulting in non-unique values in the TIME vector, the data is only fetched from the first of the time steps with non-unique TIME value giving duplicated rows rather than fetching the data for the updated timestep. This was observed by user after #412, since the first of the non-unique time steps typically has a considerably longer TIMESTEP than the subsequent ones, and thus the TIMESTEP correction may not be robust due to the possibility of jumping further than the unique TIME value.

For time_index=None this is not an issue, as resample_smry_dates is not called (though that might mean that not all the arguments like start_date actually work for None?). It seems like we have to call for data from ecl for raw like we currently do for None to make sure that we actually get the raw data. If we have to support start_date, end_date and normalize, we likely have to fetch the data twice from ecl if they are defined, once for raw (like currently for None), and then for start_date, end_date and normalize. Then do the possible TIMESTEP correction (#412) before cutting/merging the two df's. If so, this should probably be the behavior for both raw and None

The text was updated successfully, but these errors were encountered:

asnyv · 2022-10-25T11:37:38Z

@berland @lindjoha any thoughts?

berland · 2022-10-26T11:16:52Z

It is probably a bug that df(eclfiles, start_date="2003-02-01") will have the start_date ignored. It looks like time_index should default to raw and then deprecate time_index=None 🤷🏻‍♂️

asnyv · 2022-10-26T11:54:51Z

@berland yes and no. As the originally identified bug by @alifbe is not that there is a difference between raw and None, but that there is a bug in the implementation of raw due to the fact that it fetches data from ecl based on a time vector which may be non-unique

asnyv added the bug Something isn't working label Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[summary] df may get duplicated rows when using `time_index=raw` #426

[summary] df may get duplicated rows when using `time_index=raw` #426

asnyv commented Oct 24, 2022 •

edited

Loading

asnyv commented Oct 25, 2022

berland commented Oct 26, 2022

asnyv commented Oct 26, 2022

[summary] df may get duplicated rows when using time_index=raw #426

[summary] df may get duplicated rows when using time_index=raw #426

Comments

asnyv commented Oct 24, 2022 • edited Loading

asnyv commented Oct 25, 2022

berland commented Oct 26, 2022

asnyv commented Oct 26, 2022

[summary] df may get duplicated rows when using `time_index=raw` #426

[summary] df may get duplicated rows when using `time_index=raw` #426

asnyv commented Oct 24, 2022 •

edited

Loading