Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Outlook approximation #690

Merged
merged 29 commits into from
Jun 15, 2023
Merged

New Outlook approximation #690

merged 29 commits into from
Jun 15, 2023

Conversation

Diego-H-S
Copy link
Contributor

Summary

We have changed the way we retrieve data from Outlook's folders. Instead of getting a short list of folders, it has been added a loop to cover all of them and their sub-folders.

Importance

Required by Client under CEE Reporting project.

Checklist

This PR:

  • adds new tests (if appropriate)
  • updates docstrings for any new functions or function arguments (if appropriate)
  • updates CHANGELOG.md with a summary of the changes

Copy link

@eSlider eSlider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice code and use of annotations, fixtures, mocking, stuff like that. It would be cool if you could write your thoughts in the comments sometimes. Looks promising good!

Copy link
Contributor Author

@Diego-H-S Diego-H-S left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the code to be more readable and added the limit parameter to where it is needed.

@Diego-H-S Diego-H-S requested a review from eSlider May 24, 2023 07:02
Copy link
Contributor

@Rafalz13 Rafalz13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few comments.

tests/unit/test_outlook.py Outdated Show resolved Hide resolved
Comment on lines +178 to +180
"(sub)folder": value.name,
"conversation ID": fetched.get("conversationId"),
"conversation index": conversation_index,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use a snake case for each key to keep the same logic? sub_folder, conversation_id, conversation_index

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could change it, however, the problem is that all in-production flows depending on this table are built with this convention. Therefore, If we change the name here, all depending flows will crash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance to modify production flows and introduce a proper naming convention here?

Copy link
Contributor Author

@Diego-H-S Diego-H-S Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the outlook project is going to change completely, we have to re-build all tables from zero.

conversation_index = " "
if message.conversation_index is not None:
conversation_index = message.conversation_index
row = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if it is possible to add also subject and read ('isRead') field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, but don't exactly understand what you want to point out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are additional (new) fields. The question is whether there is a chance to add new fields/columns to the existing Outlook extract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, the new paradigm does not require any other field, and adding more, I think, could be for a future big upgrade (if needed, because I don't think so) to choose from all avialable fields.

viadot/sources/outlook.py Outdated Show resolved Hide resolved
viadot/sources/outlook.py Show resolved Hide resolved
Copy link
Contributor Author

@Diego-H-S Diego-H-S left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all feedback implemented.

@Diego-H-S Diego-H-S requested a review from Rafalz13 June 1, 2023 07:02
Copy link
Contributor Author

@Diego-H-S Diego-H-S left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answered questions.

df_with_metadata.set_upstream(df, flow=self)
df_to_file.set_upstream(df_with_metadata, flow=self)
file_to_adls_task.set_upstream(df_to_file, flow=self)
# df = union_dfs_task.bind(dfs, flow=self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:
I think that you can remove all of those commented lines which were used as a code in the past.
We will have a cleaner repo :)
If someone would need to look back one can open older revision of this file.
Could you check all of your changed code and simply remove commented lines ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry Marcin, I don't know in which version are you, but I don't have that line in the PR...

self.mailbox_folders = mailbox_folders
else:
raise Exception("Provided mailbox folders are not correct.")
# print(type(self.mailbox_obj))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That commented part of code looks like something which was used for development but you've brake down this code in to methods right ? I would remove it from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry again, but I think you have a beta version of the code. That line is not there anymore!

final_dict_folders = dict_folders.copy()

# loop to get all subfolders
while True:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general using while loop with True is dangerous.
Maybe you will have some time to make it more robust?
Maybe set up a flag and then change it status ?
Here if someone will change or remove the line 172-173 your loop can get in to infinite loop mode ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic.

Copy link
Collaborator

@marcinpurtak marcinpurtak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is ok, but please ping me before merging it, because we are still clarifying folder list.

Copy link
Contributor

@Rafalz13 Rafalz13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Added comments to docstrings.

viadot/sources/outlook.py Outdated Show resolved Hide resolved
viadot/sources/outlook.py Outdated Show resolved Hide resolved
viadot/tasks/outlook.py Outdated Show resolved Hide resolved
viadot/flows/outlook_to_adls.py Outdated Show resolved Hide resolved
@Diego-H-S Diego-H-S requested a review from Rafalz13 June 13, 2023 11:02
@Rafalz13 Rafalz13 merged commit 05cc993 into dyvenia:dev Jun 15, 2023
3 checks passed
@Diego-H-S Diego-H-S deleted the Outlook branch June 7, 2024 08:03
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants