Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalize loss, reparametrize network #297

Merged
merged 37 commits into from
Mar 22, 2024
Merged

normalize loss, reparametrize network #297

merged 37 commits into from
Mar 22, 2024

Conversation

jpata
Copy link
Owner

@jpata jpata commented Feb 14, 2024

  • normalize loss function for classification by the number of not padded elements
  • normalized loss function for regression by the number of true target particles and with the stddev of the regression targets for more stable loss values
  • reparametrize attention network to have num_heads, head_dim
  • produced new version 1.7.1 of cms_pf_multi_particle_gun with more stats
  • remove unneeded pad_power_of_two option (seems like FlashAttention does that internally), not sure why I thought it was needed
  • make regression output type configurable
  • disable charge prediction for now (so far we didn't really study its performance)
  • enable setting only certain layers as trainable
  • set minimum lr to 1e-5 for cosinedecay
  • add new standalone notebook for quick studies
  • remove TF workflows from pipeline

@jpata jpata changed the title WIP for int8 quantization stubs for int8 quantization Feb 18, 2024
@jpata jpata changed the title stubs for int8 quantization normalize loss Mar 4, 2024
@jpata jpata changed the title normalize loss normalize loss, reparametrize network Mar 22, 2024
@jpata jpata merged commit e1b439a into main Mar 22, 2024
5 checks passed
@jpata jpata deleted the int8_quant branch March 22, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant