-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Outlook approximation #690
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice code and use of annotations, fixtures, mocking, stuff like that. It would be cool if you could write your thoughts in the comments sometimes. Looks promising good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified the code to be more readable and added the limit
parameter to where it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few comments.
"(sub)folder": value.name, | ||
"conversation ID": fetched.get("conversationId"), | ||
"conversation index": conversation_index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use a snake case for each key to keep the same logic? sub_folder
, conversation_id
, conversation_index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could change it, however, the problem is that all in-production flows depending on this table are built with this convention. Therefore, If we change the name here, all depending flows will crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a chance to modify production flows and introduce a proper naming convention here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the outlook project is going to change completely, we have to re-build all tables from zero.
conversation_index = " " | ||
if message.conversation_index is not None: | ||
conversation_index = message.conversation_index | ||
row = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if it is possible to add also subject
and read
('isRead') field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, but don't exactly understand what you want to point out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are additional (new) fields. The question is whether there is a chance to add new fields/columns to the existing Outlook extract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now, the new paradigm does not require any other field, and adding more, I think, could be for a future big upgrade (if needed, because I don't think so) to choose from all avialable fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all feedback implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answered questions.
viadot/flows/outlook_to_adls.py
Outdated
df_with_metadata.set_upstream(df, flow=self) | ||
df_to_file.set_upstream(df_with_metadata, flow=self) | ||
file_to_adls_task.set_upstream(df_to_file, flow=self) | ||
# df = union_dfs_task.bind(dfs, flow=self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
I think that you can remove all of those commented lines which were used as a code in the past.
We will have a cleaner repo :)
If someone would need to look back one can open older revision of this file.
Could you check all of your changed code and simply remove commented lines ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry Marcin, I don't know in which version are you, but I don't have that line in the PR...
viadot/sources/outlook.py
Outdated
self.mailbox_folders = mailbox_folders | ||
else: | ||
raise Exception("Provided mailbox folders are not correct.") | ||
# print(type(self.mailbox_obj)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That commented part of code looks like something which was used for development but you've brake down this code in to methods right ? I would remove it from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry again, but I think you have a beta version of the code. That line is not there anymore!
viadot/sources/outlook.py
Outdated
final_dict_folders = dict_folders.copy() | ||
|
||
# loop to get all subfolders | ||
while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general using while loop with True is dangerous.
Maybe you will have some time to make it more robust?
Maybe set up a flag and then change it status ?
Here if someone will change or remove the line 172-173 your loop can get in to infinite loop mode ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is ok, but please ping me before merging it, because we are still clarifying folder list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Added comments to docstrings.
Summary
We have changed the way we retrieve data from Outlook's folders. Instead of getting a short list of folders, it has been added a loop to cover all of them and their sub-folders.
Importance
Required by Client under CEE Reporting project.
Checklist
This PR:
CHANGELOG.md
with a summary of the changes