Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[summary] df may get duplicated rows when using time_index=raw #426

Open
asnyv opened this issue Oct 24, 2022 · 3 comments
Open

[summary] df may get duplicated rows when using time_index=raw #426

asnyv opened this issue Oct 24, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@asnyv
Copy link
Collaborator

asnyv commented Oct 24, 2022

If using time_index=raw, the resample_smry_dates method is still called to accommodate for functionality like start_date, end_date and normalize, and then used as input when fetching the summary data through ecl. If one or more timesteps are shorter than the resolution of the TIME vector, resulting in non-unique values in the TIME vector, the data is only fetched from the first of the time steps with non-unique TIME value giving duplicated rows rather than fetching the data for the updated timestep. This was observed by user after #412, since the first of the non-unique time steps typically has a considerably longer TIMESTEP than the subsequent ones, and thus the TIMESTEP correction may not be robust due to the possibility of jumping further than the unique TIME value.

For time_index=None this is not an issue, as resample_smry_dates is not called (though that might mean that not all the arguments like start_date actually work for None?). It seems like we have to call for data from ecl for raw like we currently do for None to make sure that we actually get the raw data. If we have to support start_date, end_date and normalize, we likely have to fetch the data twice from ecl if they are defined, once for raw (like currently for None), and then for start_date, end_date and normalize. Then do the possible TIMESTEP correction (#412) before cutting/merging the two df's. If so, this should probably be the behavior for both raw and None

@asnyv asnyv added the bug Something isn't working label Oct 24, 2022
@asnyv
Copy link
Collaborator Author

asnyv commented Oct 25, 2022

@berland @lindjoha any thoughts?

@berland
Copy link
Collaborator

berland commented Oct 26, 2022

It is probably a bug that df(eclfiles, start_date="2003-02-01") will have the start_date ignored. It looks like time_index should default to raw and then deprecate time_index=None 🤷🏻‍♂️

@asnyv
Copy link
Collaborator Author

asnyv commented Oct 26, 2022

@berland yes and no. As the originally identified bug by @alifbe is not that there is a difference between raw and None, but that there is a bug in the implementation of raw due to the fact that it fetches data from ecl based on a time vector which may be non-unique

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants