Organize #229

shrit · 2024-06-12T14:25:13Z

The aim of this PR is to make this repo usable.

The current state of this repo is a bit of a mess, this is the first attempt to organize examples based on the language and based on the ml method that is used.

Signed-off-by: Omar Shrit <omar@avontech.fr>

github-actions · 2024-06-12T14:25:24Z

👈 Launch a binder notebook on branch shrit/examples/organize

Signed-off-by: Omar Shrit <omar@avontech.fr>

review-notebook-app · 2024-06-13T16:42:05Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Signed-off-by: Omar Shrit <omar@avontech.fr>

rcurtin · 2024-06-23T12:06:20Z

This is really nice! Thank you for doing this. It was definitely easier to just look at the new branch to see the organization. I actually haven't looked at the diff yet, I think there are some code changes here and there that I'll check, but first, a handful of suggestions:

c++/ -> cpp/ (I see that naming convention used more often)
cli_bindings/ -> cli/ (just to match the rest)
Reinforcement learning examples into jupyter/?
Another name for jupyter/ might be notebooks/, but that's just a suggestion, I don't think I have an opinion either way.
In the README, it would be great to change the description of what the directories are from a paragraph to bullet points (people's eyes will probably be drawn to them more).
A couple of the examples, like rain-classification, use multiple techniques... to match the other examples which are organized by technique, does it make sense to make something like a multi-technique/ directory? In fact, does using the directory structure to name the technique make the most sense anyway? I wonder if it might be better to use a README.md in each top-level directory (cpp/, cli/, etc.) that lists the examples and the techniques they use. Then you could list each of the techniques used in the rain-classification notebook or similar.

I think the Linux shell script is wrong here: https://github.com/shrit/examples/blob/organize/.ci/linux-steps.yaml#L99 --- it only changes up one directory (which used to be correct) but now it could be more than one, and has to go back to the root directory of the repository.

Let me know what you think of the suggestions; don't feel obligated to take them all 😄

shrit · 2024-06-24T20:11:18Z

I agree with these suggestions, initially, I had it as jupyter_notebook but I can bring this back with no problem.
I can put the reinforcement learning in the notebook folder. However. there were some ensmallen examples that I need to figure out a place for them.

* `c++/` -> `cpp/` (I see that naming convention used more often)

* `cli_bindings/` -> `cli/` (just to match the rest)

* Reinforcement learning examples into `jupyter/`?

* Another name for `jupyter/` might be `notebooks/`, but that's just a suggestion, I don't think I have an opinion either way.

Will do this for sure.

* In the README, it would be great to change the description of what the directories are from a paragraph to bullet points (people's eyes will probably be drawn to them more).

I thought about this a lot, for the rain classification and examples that contain multiple techniques I did not find the best solution yet. However, I am 100% convinced that we need to separate them per method since most of our users will be looking for an example regarding a specific method. Also, most of our examples are named after datasets which itself does not make sense. Therefore, I really think that we need to class them per method.
This is going to be more interesting when it comes to embedded because we can show that some methods use fewer resources than others, or show examples of specific methods that can perform faster on low-resource devices.

* A couple of the examples, like `rain-classification`, use multiple techniques... to match the other examples which are organized by technique, does it make sense to make something like a `multi-technique/` directory?  In fact, does using the directory structure to name the technique make the most sense anyway?  I wonder if it might be better to use a `README.md` in each top-level directory (`cpp/`, `cli/`, etc.) that lists the examples and the techniques they use.  Then you could list each of the techniques used in the `rain-classification` notebook or similar.

Thanks for pointing this out, I started fixing it and I need to continue

I think the Linux shell script is wrong here: https://github.com/shrit/examples/blob/organize/.ci/linux-steps.yaml#L99 --- it only changes up one directory (which used to be correct) but now it could be more than one, and has to go back to the root directory of the repository.

Signed-off-by: Omar Shrit <omar@avontech.fr>

rcurtin

I thought about this a lot, for the rain classification and examples that contain multiple techniques I did not find the best solution yet. However, I am 100% convinced that we need to separate them per method since most of our users will be looking for an example regarding a specific method. Also, most of our examples are named after datasets which itself does not make sense. Therefore, I really think that we need to class them per method.
This is going to be more interesting when it comes to embedded because we can show that some methods use fewer resources than others, or show examples of specific methods that can perform faster on low-resource devices.

Yeah, I agree with how the users will approach the examples. And it is probably true that many users will just poke into the directory structure to see what they are interested in. The question is just what to do with the multi-method examples. The best I can think of---and maybe you can think of something better---is just a multi-technique/ directory or something like this, and then an associated README that has quick descriptions of the examples and links between them. So e.g. you can put the "rain classification" example in the 'random forest' and 'logistic regression' sections in the README.

The perfect is the enemy of the good, so I agree it would be good to merge this even if we don't work out all the CI issues right now, and then come back for incremental improvements later.

mlpack-bot

Second approval provided automatically after 24 hours. 👍

shrit added 13 commits June 12, 2024 14:38

Move the examples according to their nature

af99fbc

Signed-off-by: Omar Shrit <omar@avontech.fr>

Change the neural network regression

0cf213f

Signed-off-by: Omar Shrit <omar@avontech.fr>

Re-organize it inside nerual network

1acafda

Signed-off-by: Omar Shrit <omar@avontech.fr>

Move the cifar10 dataset to neural network

a68f4b3

Signed-off-by: Omar Shrit <omar@avontech.fr>

Move the example to its new location

20ec4b6

Signed-off-by: Omar Shrit <omar@avontech.fr>

Remove boost from appveyor

edb4869

Signed-off-by: Omar Shrit <omar@avontech.fr>

Move the kmeans example to its new location

adfd0d0

Signed-off-by: Omar Shrit <omar@avontech.fr>

Change the location of another example

debdd1c

Signed-off-by: Omar Shrit <omar@avontech.fr>

Move two new datasets

a672c9d

Signed-off-by: Omar Shrit <omar@avontech.fr>

Move another part of the examples

4f1d39e

Signed-off-by: Omar Shrit <omar@avontech.fr>

Move another part of the examples

df7d8d3

Signed-off-by: Omar Shrit <omar@avontech.fr>

Move another batch of examples

a6c9202

Signed-off-by: Omar Shrit <omar@avontech.fr>

Move the last part of the ML

2ec70c7

Signed-off-by: Omar Shrit <omar@avontech.fr>

mlpack-bot bot added s: needs review s: unanswered s: unlabeled labels Jun 12, 2024

shrit added 5 commits June 12, 2024 17:35

Adding covertype example

3c79233

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix the pima indians files name

8a91f76

Signed-off-by: Omar Shrit <omar@avontech.fr>

Adding the cpp, done manually, not tested yet

6264743

Signed-off-by: Omar Shrit <omar@avontech.fr>

Adding another example in C++

c8cf8a5

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix the name of the files

9861d9f

Signed-off-by: Omar Shrit <omar@avontech.fr>

shrit removed s: unlabeled s: unanswered labels Jun 12, 2024

shrit added 6 commits June 12, 2024 18:17

Fix things inside the linear regression repo

1c2c39b

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix path in dbscan and decision tree

1af8313

Signed-off-by: Omar Shrit <omar@avontech.fr>

Not finished yet, so much things to do

73ef3cd

Signed-off-by: Omar Shrit <omar@avontech.fr>

not finished yet, but almost

7887517

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix indentation and add pima dataset

d6ae1a6

Signed-off-by: Omar Shrit <omar@avontech.fr>

Remove tools directory and keep scripts

9b30714

Signed-off-by: Omar Shrit <omar@avontech.fr>

shrit added 10 commits June 13, 2024 12:18

Fix a couple of commans, add dominant color dataset

f2c861b

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix the files name after they have been put into directories

43db8f4

Signed-off-by: Omar Shrit <omar@avontech.fr>

Adapt README for now

da5b8b0

Signed-off-by: Omar Shrit <omar@avontech.fr>

Change the root capital letter

549a76e

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix a couple of C++ examples

10c5d74

Signed-off-by: Omar Shrit <omar@avontech.fr>

Refactor and add makefile for this example

7988f6d

Signed-off-by: Omar Shrit <omar@avontech.fr>

Remove useless files from the C++ directory

4502266

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix path fot the cifar_eval

cb8361a

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix path for the data

c7dc7af

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix the path for the data and ignore binary

db847f6

Signed-off-by: Omar Shrit <omar@avontech.fr>

shrit added 5 commits June 13, 2024 19:15

Add covertype and fix indentation

e4db3ba

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix the indentation in this website

c3e4748

Signed-off-by: Omar Shrit <omar@avontech.fr>

The script is fully working, all of dataset are being downloaded

5d0b83a

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix the path on the ci for the dataset

f7ef551

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix the CI

aff0acb

Signed-off-by: Omar Shrit <omar@avontech.fr>

shrit added 4 commits June 25, 2024 10:25

Fix all the paths for these directories

295ac44

Signed-off-by: Omar Shrit <omar@avontech.fr>

Update the read me with the recent modifications

7b2e5f4

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix the return path for the CI

b5a7063

Signed-off-by: Omar Shrit <omar@avontech.fr>

Fix the directory name

59ae675

Signed-off-by: Omar Shrit <omar@avontech.fr>

rcurtin approved these changes Jun 25, 2024

View reviewed changes

mlpack-bot bot approved these changes Jun 26, 2024

View reviewed changes

mlpack-bot bot removed the s: needs review label Jun 26, 2024

shrit merged commit 78ddf19 into mlpack:master Jun 26, 2024
4 checks passed

rcurtin mentioned this pull request Jul 1, 2024

Update Jupyter notebook example links in documentation mlpack/mlpack#3751

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Organize #229

Organize #229

shrit commented Jun 12, 2024

github-actions bot commented Jun 12, 2024

review-notebook-app bot commented Jun 13, 2024

rcurtin commented Jun 23, 2024

shrit commented Jun 24, 2024

rcurtin left a comment

mlpack-bot bot left a comment

Organize #229

Organize #229

Conversation

shrit commented Jun 12, 2024

github-actions bot commented Jun 12, 2024

review-notebook-app bot commented Jun 13, 2024

rcurtin commented Jun 23, 2024

shrit commented Jun 24, 2024

rcurtin left a comment

Choose a reason for hiding this comment

mlpack-bot bot left a comment

Choose a reason for hiding this comment