Skip to content

2nd AtmoRep core developers Meeting

ankitpatnala edited this page Aug 19, 2024 · 4 revisions

#HClimRep core developer meeting, 2024-08-12

Participants: Christian, Ilaria, Asma, Nishant, Ankit, Simon, Julius, Sindhu, Martin

Meeting notes

Announcement

There is another project name atmorep analysis. Some code atmorep reinterface. There, it enables to do analysis by converting the zarr to grib and get results and plots. Ilaria owns it and if somebody wants to be a collaborator, one can ask Ilaria to be a collaborator.

Ilaria and Christian cleaned the code and pushed to the develop. It will be merged to the main today. It contains small fixes which improve performance.

H100 in Barcelona are better maybe due to their filesystem and hardware. Atmorep is quite complex.

Bottleneck with dataloading code and team with BSC helped us. We are no longer dataloading bound.

Discussions

Is there any documentation explaining the current to-do list? Issues are there in the repo. Currently, it is in the build-up phase. In principle, it will be in the issue that will be centrally located and easy to track and discuss.

What if we find new bugs? Just add a new issue with bug label or label feature request to propose new features

If you want to discuss or track developments, open the scientific issue use the label scientific

Is ther any timeplan to have a newer version? Or is it simply open issue. It will be a part of roadmap meeting. Ilaria and Christian are working to find a smaller model. Once the model is trained, one can use as a pre-terained model for our work. There has been discussion about the rollout features. Maybe nice to have a roadmap or a discussion.

First roadmap meeting is somewhere on September and till the time, it will be nice to get familiar with the code. We can use the workshop time to discuss about it.

Is there any code snippet to measure performance? Single field training performance. ( train.py and measured the run time). One epoch. Ilaria shared a wiki page (wiki/performance). Measured using standard python timers and integrated in the train code.

Asma had a question about the training mode. Temporal interpolation. Training mode=BERT and evaluation=temporal interpolation. At some point of time, make sure to check if they are doing what exactly it meant to do. Asma will carry on as it seems it is doing what it is intended to do.

There may be a need to refactor code for longer time steps. @Asma If you think you can do it, make a PR

Have you also done hyperparameter tuning in the code to find optimal hyperparameters? Ilaria and Christian are running some experiments to find some optimal configurations and can discuss them in the next developer meeting. The one in the develop branch has it already. The new config for velocity_u has large tokens and makes training faster. It was done manually and no automated tools. If you want to use some tools. Things with automatic tuning are trained on loss. It is better to find a nice spot between performance and loss. It could be done with unit test. While training a new big model, it will be easy to monitor. It is worth trying if the accuracy improves. Julius recommended the sweep parameters of w&b which used bayesina optimizer but they are sometimes misleading. Some tools can give easy solutions but it is not the best option. There are modern ones which are fast and give a better solution. It is challenging for Atmorep as there are different models for different var and their smoothness and all lead to various factors. It can be misleading if we only see a few epochs and may be converging later. It will be nice to keep that in mind in determining the hyperparameters.

Are there stale issues? There are some issues with Kacper. Indirectly yes, it will be automatically fixed once Christian rebases it. Simon can ask Kacper separately. There is a merge request from Kacper. Feel free to ask people and mark it stale. There are some longer-standing issues marked as bugs. Some issue numbers and it will be nice to investigate if it is still relevant. We can transform this issue to look at the attention maps. we can mark or comment on the issue or archive the issue. We can rename the issue. Christian just did it.