Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DA Prices seems to have broken. #346

Closed
PeteGilbert98 opened this issue Oct 4, 2024 · 20 comments
Closed

DA Prices seems to have broken. #346

PeteGilbert98 opened this issue Oct 4, 2024 · 20 comments

Comments

@PeteGilbert98
Copy link

PeteGilbert98 commented Oct 4, 2024

Hi @fboerman ,

I've noticed that as of today, the function getting the da_prices from entsoe seems to have broken. My assumption is that the format sent from the entsoe api must have changed.

image

As you can see, all prices now seem to be the same.

Here's what I think might be causing the issue:

On line 36, I would have expected the _extract_timeseries generator to produce multiple soup objects for the different days that I have queried. Instead only one object is returned.

image

This means, on line 88, the keys in the dictionary (on line 83) are overwritten, as the keys are non unique per day.

image

on line 92, the datetime index appears to be correct, covering the whole timespan of the (buffered) query.

When the (seemingly correct) index is combined with the data. Things start to break.

@LuisTellezSirocco
Copy link

Hi, I'm not here to answer you about the problem, but to tell you that you have leaked your API Key in one of the images...

@PeteGilbert98
Copy link
Author

Hi, I'm not here to answer you about the problem, but to tell you that you have leaked your API Key in one of the images...

Thanks, I changed token

@borg42
Copy link

borg42 commented Oct 4, 2024

I have the same problem. It worked fine yesterday, but now i have the same problem as described by @PeteGilbert98. I think this means that Entso-E must have changed something in the XML, right?

@fboerman
Copy link
Collaborator

fboerman commented Oct 4, 2024 via email

@binboupan
Copy link

As of today I am getting entsoe.exceptions.NoMatchingDataError, it worked fine for years so something has definitely changed.

@sakvaua
Copy link

sakvaua commented Oct 4, 2024

Same here.
I think this is related to how entsoy-py treats indexes as just data ranges with a fixed start, end, and frequency, while the actual data returned in XML may be missing some hours.
index = pd.date_range(start=start, end=end, freq=delta, inclusive='left')
I downloaded a random interval and noticed this
image

@PeteGilbert98
Copy link
Author

I have a suggested fix which looks like it works for the prices. But I haven't checked extensively.

Here is how I have adjusted parse_prices (in parsers.py) (I had to add in this mapping because it wasn't behaving as expected without):

`def parse_prices(xml_text):
"""
Parameters
----------
xml_text : str

Returns
-------
pd.Series
"""
time_mapping = {
    '15min': '15min',
    '30min': '30min',
    '60min': '60min',
    '15T': '15min',
    '30T': '30min',
    '1H': '60min',
    '1h': '60min',
    'h': '60min',
    '0.25H': '15min',
    '0.5H': '30min',
}


series = {
}
for soup in _extract_timeseries(xml_text):
    soup_series = _parse_timeseries_generic(soup, 'price.amount')
    series[time_mapping[soup_series.index.freqstr]] = soup_series

return series`

Here is how I changed _parse_timeseries_generic ( in series_parsers.py):

`def _parse_timeseries_generic(soup, label='quantity', to_float=True):
    # Create a list to store all time series data
    all_data = []

    # Iterate over each period
    for period in soup.find_all("period"):
        # Extract start time, end time, and resolution for each period
        start_time_str = period.find("start").text
        resolution_str = period.find("resolution").text  # PT6H, PT30M, etc.
        start_time = pd.to_datetime(start_time_str)

        # Convert ISO 8601 duration to a pandas Timedelta
        resolution_timedelta = pd.to_timedelta(resolution_str)

        # Loop over each point and extract position and price
        for point in period.find_all("point"):
            position = int(point.find("position").text)
            value = point.find(label).text
            if to_float:
                value = float(value)

            # Calculate the timestamp for this point based on the position and resolution
            timestamp = start_time + resolution_timedelta * (position - 1)

            # Append the data
            all_data.append([timestamp, value])

    # Create a DataFrame from the combined data
    df_combined = pd.DataFrame(all_data, columns=['Timestamp', label])

    # Reindex the DataFrame to include the complete range
    df_combined.set_index('Timestamp', inplace=True)

    if soup.find('curvetype').text == 'A03':
        # with A03 its possible that positions are missing, this is when values are repeated
        # see docs: https://eepublicdownloads.entsoe.eu/clean-documents/EDI/Library/cim_based/Introduction_of_different_Timeseries_possibilities__curvetypes__with_ENTSO-E_electronic_document_v1.4.pdf
        # so lets do reindex on a continious range which creates gaps if positions are missing
        # then forward fill, so repeat last valid value, to fill the gaps

        # Create a complete date range for the specified periods using the maximum resolution
        complete_range = pd.date_range(start=df_combined.index.min(),
                                       end=df_combined.index.max(),
                                       freq=resolution_timedelta)

        df_combined = df_combined.reindex(complete_range)

        # Forward fill missing values
        df_combined[label] = df_combined[label].ffill()

    return df_combined[label]`

Again, very untested. But looks promising.
**Potential errors:

if the first value it queries, or the last value it queries are missing, the timeseries returned will be shorter than expected.**

I haven't tested this on any of the other functionality at all!!!

@fboerman
Copy link
Collaborator

fboerman commented Oct 4, 2024 via email

@PeteGilbert98
Copy link
Author

@fboerman will do this eve. thanks

@GeneralCP
Copy link

GeneralCP commented Oct 4, 2024

for anyone looking for a quick fix: https://github.com/GeneralCP/entsoe2
this also fills in missing positions in the xml output.

as far as I can see there was a duplicate price today on the 4th for The Netherlands (22 and 23 rd position the same price). Apparantly the API then only gives positions 22 and skips 23. Not sure if this 'feature' is new or this is just the first time we have the exact same price 2 hours in a row.

@binboupan
Copy link

day ahead prices are also broken; all of the prices are the same.

@JaniKallankari
Copy link

Day ahead prices seems to be broken. Does this change in Entsoe platform https://transparency.entsoe.eu/news/widget?id=66f5203d792e84032cbb9b71 have something to do with this?

@borg42
Copy link

borg42 commented Oct 5, 2024

Day ahead prices seems to be broken. Does this change in Entsoe platform https://transparency.entsoe.eu/news/widget?id=66f5203d792e84032cbb9b71 have something to do with this?

Yes, this is exactly the problem. The day ahead prices use "variable sized blocks" now:
vsb

while entsoe-py expects every position to be present.

@Roeland54
Copy link

Roeland54 commented Oct 5, 2024

There is another change I have noticed. If I request prices for today and for tomorrow (belgium) the prices for today using the 60min resolution as usual. But the prices of tomorrow are using the 15min resolution. For other countries the response is still using the 60min resolution. I am confused.

image

So they make braking changes to a public api on a friday and only announce it 5 days before it happens...

@JaniKallankari
Copy link

JaniKallankari commented Oct 6, 2024

I made a parser for the new data type. Only tested with day a head prices so be carefully. Response all TimeSeries are combined to one pandas.Series with possibly (not likely) none equal time steps. Fell free to use this code you find this code usefully.

import xml.etree.ElementTree as ET
def parse_timeseries(self, xml_text, value_key='price.amount', to_float=True):    
	resolution_map = {
		'PT60M': pd.Timedelta(60, 'min'),
		'P1Y'  : pd.Timedelta(365,'day'),
		'PT15M': pd.Timedelta(15, 'min'),
		'PT30M': pd.Timedelta(30, 'min'),
		'P1D'  : pd.Timedelta(1,  'day'),
		'P7D'  : pd.Timedelta(7,  'day'),
		'P1M'  : pd.Timedelta(30, 'day'),
	}
	time_stamps = []
	values      = []
	xml_text = re.sub(' xmlns="[^"]+"', '', xml_text, count=1) #Remove namespace
	root        = ET.fromstring(xml_text)
	for time_serie in root.findall('TimeSeries'):
		#curve_type = time_serie.find('curveType').text
		for period in time_serie.findall('Period'):
			start_time = pd.Timestamp(period.find('timeInterval').find('start').text)
			resolution = resolution_map[period.find('resolution').text]
			for point in period.findall('Point'):
				position = float(point.find('position').text)-1
				time_stamps.append(start_time+position*resolution)
				if to_float:
					values.append(float(point.find(value_key).text))
				else:
					values.append(point.find(value_key).text)
	return pd.Series(data=values,index=time_stamps)

@miikasda
Copy link

miikasda commented Oct 6, 2024

The above fix by @JaniKallankari works for day ahead prices in Finland. Here is a small guide to include it as a a hotfix to entsoe-py if others need it as well:

Edit entsoe/parsers.py in Python's site-packages and add following:

# Hotfix for dayahead prices, see GitHub issue below for more information
# https://github.com/EnergieID/entsoe-py/issues/346
import xml.etree.ElementTree as ET
import re
def parse_timeseries(xml_text, value_key='price.amount', to_float=True):    
    resolution_map = {
        'PT60M': pd.Timedelta(60, 'min'),
        'P1Y'  : pd.Timedelta(365,'day'),
        'PT15M': pd.Timedelta(15, 'min'),
        'PT30M': pd.Timedelta(30, 'min'),
        'P1D'  : pd.Timedelta(1,  'day'),
        'P7D'  : pd.Timedelta(7,  'day'),
        'P1M'  : pd.Timedelta(30, 'day'),
    }
    time_stamps = []
    values      = []
    xml_text = re.sub(' xmlns="[^"]+"', '', xml_text, count=1) #Remove namespace
    root        = ET.fromstring(xml_text)
    for time_serie in root.findall('TimeSeries'):
        #curve_type = time_serie.find('curveType').text
        for period in time_serie.findall('Period'):
            start_time = pd.Timestamp(period.find('timeInterval').find('start').text)
            resolution = resolution_map[period.find('resolution').text]
            for point in period.findall('Point'):
                position = float(point.find('position').text)-1
                time_stamps.append(start_time+position*resolution)
                if to_float:
                    values.append(float(point.find(value_key).text))
                else:
                    values.append(point.find(value_key).text)
    return pd.Series(data=values,index=time_stamps)

And then change the entsoe/entsoe.py to use this new parser. First import the new parser function in line 14:

from .parsers import parse_timeseries, parse_prices, parse_loads, parse_generation, \
    parse_installed_capacity_per_plant, parse_crossborder_flows, \
    parse_unavailabilities, parse_contracted_reserve, parse_imbalance_prices_zip, \
    parse_imbalance_volumes_zip, parse_netpositions, parse_procured_balancing_capacity, \
    parse_water_hydro,parse_aggregated_bids, parse_activated_balancing_energy_prices

And then change the query_day_ahead_prices function defined in line 1202 to actually use it:

def query_day_ahead_prices(
            self, country_code: Union[Area, str],
            start: pd.Timestamp,
            end: pd.Timestamp,
            resolution: Literal['60min', '30min', '15min'] = '60min') -> pd.Series:
        """
        Parameters
        ----------
        resolution: either 60min for hourly values,
            30min for half-hourly values or 15min for quarterly values, throws error if type >
        country_code : Area|str
        start : pd.Timestamp
        end : pd.Timestamp

        Returns
        -------
        pd.Series
        """
        if resolution not in ['60min', '30min', '15min']:
            raise InvalidParameterError('Please choose either 60min, 30min or 15min')
        area = lookup_area(country_code)
        # we do here extra days at start and end to fix issue 187
        text = super(EntsoePandasClient, self).query_day_ahead_prices(
            country_code=area,
            #start=start-pd.Timedelta(days=1),
            start = start,
            end = end
            #end=end+pd.Timedelta(days=1)
        )
        #series = parse_prices(text)[resolution]
        series = parse_timeseries(text)
        if len(series) == 0:
            raise NoMatchingDataError
        series = series.tz_convert(area.tz)
        series = series.truncate(before=start, after=end)
        # because of the above fix we need to check again if any valid data exists after trun>
        if len(series) == 0:
            raise NoMatchingDataError
        return series

@fboerman
Copy link
Collaborator

fboerman commented Oct 6, 2024

@JaniKallankari @miikasda thank you for your suggestions. as a tip, its usually much more readable if you enter such proposals as a pull request instead of a copy in an issue. I am using your code as hint for the fix I am currently writing for this issue. Many thanks to all in this thread for their suggestions!

@borg42
Copy link

borg42 commented Oct 6, 2024

Regarding the suggestion above:

resolution = resolution_map[period.find('resolution').text]
for point in period.findall('Point'):
    position = float(point.find('position').text)-1
    time_stamps.append(start_time+position*resolution)
     if to_float:
         values.append(float(point.find(value_key).text))
     else:
         values.append(point.find(value_key).text)

This only works if the xml only contains one resolution, which is generally not the case. E.g. for DE_LU it usually contains 60min and 15min data. This would need to return a dict with the resolutions with one series per resolution to work with all regions.

fboerman added a commit that referenced this issue Oct 6, 2024
…appen

discussion and inspiration for fixes from #347 and #346
@fboerman
Copy link
Collaborator

fboerman commented Oct 6, 2024

many thanks to all for the discussions, I have released a new version, 0.6.9, which fixes this issue. If you still encounter issues on this version please open a new issue!

@fboerman fboerman closed this as completed Oct 6, 2024
@fboerman
Copy link
Collaborator

fboerman commented Oct 6, 2024

oh and @Roeland54 this is probably a mistake on Elia side I believe. I have send them an informal nudge hopefully they will fix it soon, I cant fix that in the package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants