Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organize #229

Merged
merged 43 commits into from
Jun 26, 2024
Merged

Organize #229

merged 43 commits into from
Jun 26, 2024

Conversation

shrit
Copy link
Member

@shrit shrit commented Jun 12, 2024

The aim of this PR is to make this repo usable.

The current state of this repo is a bit of a mess, this is the first attempt to organize examples based on the language and based on the ml method that is used.

shrit added 13 commits June 12, 2024 14:38
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Copy link

Binder 👈 Launch a binder notebook on branch shrit/examples/organize

Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
shrit added 10 commits June 13, 2024 12:18
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
@rcurtin
Copy link
Member

rcurtin commented Jun 23, 2024

This is really nice! Thank you for doing this. It was definitely easier to just look at the new branch to see the organization. I actually haven't looked at the diff yet, I think there are some code changes here and there that I'll check, but first, a handful of suggestions:

  • c++/ -> cpp/ (I see that naming convention used more often)
  • cli_bindings/ -> cli/ (just to match the rest)
  • Reinforcement learning examples into jupyter/?
  • Another name for jupyter/ might be notebooks/, but that's just a suggestion, I don't think I have an opinion either way.
  • In the README, it would be great to change the description of what the directories are from a paragraph to bullet points (people's eyes will probably be drawn to them more).
  • A couple of the examples, like rain-classification, use multiple techniques... to match the other examples which are organized by technique, does it make sense to make something like a multi-technique/ directory? In fact, does using the directory structure to name the technique make the most sense anyway? I wonder if it might be better to use a README.md in each top-level directory (cpp/, cli/, etc.) that lists the examples and the techniques they use. Then you could list each of the techniques used in the rain-classification notebook or similar.

I think the Linux shell script is wrong here: https://github.com/shrit/examples/blob/organize/.ci/linux-steps.yaml#L99 --- it only changes up one directory (which used to be correct) but now it could be more than one, and has to go back to the root directory of the repository.

Let me know what you think of the suggestions; don't feel obligated to take them all 😄

@shrit
Copy link
Member Author

shrit commented Jun 24, 2024

I agree with these suggestions, initially, I had it as jupyter_notebook but I can bring this back with no problem.
I can put the reinforcement learning in the notebook folder. However. there were some ensmallen examples that I need to figure out a place for them.

* `c++/` -> `cpp/` (I see that naming convention used more often)

* `cli_bindings/` -> `cli/` (just to match the rest)

* Reinforcement learning examples into `jupyter/`?

* Another name for `jupyter/` might be `notebooks/`, but that's just a suggestion, I don't think I have an opinion either way.

Will do this for sure.

* In the README, it would be great to change the description of what the directories are from a paragraph to bullet points (people's eyes will probably be drawn to them more).

I thought about this a lot, for the rain classification and examples that contain multiple techniques I did not find the best solution yet. However, I am 100% convinced that we need to separate them per method since most of our users will be looking for an example regarding a specific method. Also, most of our examples are named after datasets which itself does not make sense. Therefore, I really think that we need to class them per method.
This is going to be more interesting when it comes to embedded because we can show that some methods use fewer resources than others, or show examples of specific methods that can perform faster on low-resource devices.

* A couple of the examples, like `rain-classification`, use multiple techniques... to match the other examples which are organized by technique, does it make sense to make something like a `multi-technique/` directory?  In fact, does using the directory structure to name the technique make the most sense anyway?  I wonder if it might be better to use a `README.md` in each top-level directory (`cpp/`, `cli/`, etc.) that lists the examples and the techniques they use.  Then you could list each of the techniques used in the `rain-classification` notebook or similar.

Thanks for pointing this out, I started fixing it and I need to continue

I think the Linux shell script is wrong here: https://github.com/shrit/examples/blob/organize/.ci/linux-steps.yaml#L99 --- it only changes up one directory (which used to be correct) but now it could be more than one, and has to go back to the root directory of the repository.

Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Signed-off-by: Omar Shrit <omar@avontech.fr>
Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a lot, for the rain classification and examples that contain multiple techniques I did not find the best solution yet. However, I am 100% convinced that we need to separate them per method since most of our users will be looking for an example regarding a specific method. Also, most of our examples are named after datasets which itself does not make sense. Therefore, I really think that we need to class them per method.
This is going to be more interesting when it comes to embedded because we can show that some methods use fewer resources than others, or show examples of specific methods that can perform faster on low-resource devices.

Yeah, I agree with how the users will approach the examples. And it is probably true that many users will just poke into the directory structure to see what they are interested in. The question is just what to do with the multi-method examples. The best I can think of---and maybe you can think of something better---is just a multi-technique/ directory or something like this, and then an associated README that has quick descriptions of the examples and links between them. So e.g. you can put the "rain classification" example in the 'random forest' and 'logistic regression' sections in the README.

The perfect is the enemy of the good, so I agree it would be good to merge this even if we don't work out all the CI issues right now, and then come back for incremental improvements later.

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@shrit shrit merged commit 78ddf19 into mlpack:master Jun 26, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants