Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Data Feed in wcds.py #88

Open
cmora1es opened this issue Feb 27, 2023 · 1 comment
Open

New Data Feed in wcds.py #88

cmora1es opened this issue Feb 27, 2023 · 1 comment
Labels
feeds support a new data feed

Comments

@cmora1es
Copy link

cmora1es commented Feb 27, 2023

Client Code: 2496.14

Data Source

Pull Daily Operation Monthly Reports from US Army Corps of Engineers Sacramento District Website - Water Control Data System:
URL to be used = https://www.spk-wc.usace.army.mil/fcgi-bin/monthly.py?month=feb&year=2023&project=frm
Where "month=", "year=" and "project=" are inputs to a function.

Sample Code:

def download_table(month, year, project):
    url = f"https://www.spk-wc.usace.army.mil/fcgi-bin/monthly.py?month={month}&year={year}&project={project}&textonly=true"
    response = requests.get(url, verify=False)
    lines = response.text.splitlines()
    rows = [line.split() for line in lines if line.strip()]
    return rows

def save_to_csv(data, file, month_num, year):
	for j, sublist in enumerate(data):
		if sublist[0] == 'Totals':
			break

	for i, sublist in enumerate(data[2:]):
		if sublist[0] == 'Date':
			name_list = sublist 
			break

	units=data[i+3]
	column_name = []
	for i, item in enumerate(name_list):
		if i > 0 and item.startswith('#'):
			column_name[-1] = ' '.join([column_name[-1], item])
		else:
			column_name.append(item)
	in_index = column_name.index('(in)')
	units.insert(0, '')
	units.insert(in_index, '(in)')

	if len(column_name)==len(units):
		column_list = [f'{column_name[i]}\n{units[i]}' for i in range(len(column_name))]
	else: 
		column_list = data[8]
    
	df = pd.DataFrame(data[9:j])
	df['datetime']= [f'{year}-{month_num}-{i}' for i in df[0]]
	df.set_index('datetime', inplace=True)
	df.columns = column_list
	
	table = df
	print(table)
	table.to_csv(file, index = True, header=column_list, mode='a')

def download_and_save(month_num, months, years, project, file_prefix):
	for year in years:
		for i in range(len(month_num)):
			data = download_table(months[i], year, project)
			file = f"{year}_{month_num[i]}_{file_prefix}.csv"
			save_to_csv(data, file, month_num[i], year)

def append_to_csv(input_file, output_file):
	with open(input_file, 'r') as in_file, open(output_file, 'a') as out_file:
		reader = csv.reader(in_file)
		writer = csv.writer(out_file)
		header = next(reader)
		if not os.path.isfile(output_file):
			writer.writerow(header)
		for row in reader:
			writer.writerow(row)

months = ['jan', 'feb', 'mar','apr','may','dec']
month_num = ['01','02','03','12']
years = range(2000,2004)
project = 'eng'
file_prefix = 'eng'
download_and_save(month_num, months, years, project, file_prefix)

output_file = 'eng_output_1990-2022.csv'

for year in years:
	for i in range(len(month_num)):
		input_file = f"{year}_{month_num[i]}_{file_prefix}.csv"
		append_to_csv(input_file, output_file)

Use Case

This URL will be an alternative source to pull data starting 1990 as opposed to starting 1995 when collecting from Corps and Section 7 Projects in California - Hourly/Daily Data and Plots

This feature will be added in ".../collect/collect/usace/wcds.py"

Breaking Changes

Unknown

Priority

Low

@cmora1es cmora1es added the feeds support a new data feed label Feb 27, 2023
@cmora1es cmora1es changed the title New Data Feed New Data Feed in wcds.py Feb 27, 2023
@narlesky
Copy link
Member

narlesky commented Mar 9, 2023

@cmora1es here's my pseudo-code notes from our meeting:

def get_monthly_report_data_for_reservoir(reservoir, year, month):
    """
    description (omit the totals)

    Arguments:
        reservoir (str):
        year (int):
        month (??): as a number or as a string?
    Returns:
    """

    # use helper function to lookup the columns as long as it's reliable for the reservoir
    # for the whole period of record

    return {'data': df, # date-indexed dataframe
            'info': {'url': ,
                    'date_published':
                    'date_retrieved': dt.datetime.now(),
                    'raw': the plain text}}


def get_por_monthly_report_data_for_reservoir(reservoir, start, end):

    frames = []
    # list of months between start and end
    for month in list:
        frames.append(get_monthly_report_data_for_reservoir(reservoir, year, month)['data'])
    df = pd.concat(frames, axis=0)

    return {'data': df, 
            'info': {
                    'urls': [],
                    'dates_published': [],
                    'date_retrieved': dt.datetime.now(),
                    }}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feeds support a new data feed
Projects
None yet
Development

No branches or pull requests

2 participants