Skip to content

Releases: tyronechen/genomenlp

v2.8.5

04 Oct 06:38
Compare
Choose a tag to compare
  • Fix a bug where passing a config to train did not override output URL
  • Raise a warning when local and remote sources clash
  • Fix a bug where automated download does not work for cross validation

v2.8.2

07 Sep 08:30
Compare
Choose a tag to compare
  • Enable fit_powerlaw to pull models from wandb directly.

v2.8.1

30 Aug 23:49
Compare
Choose a tag to compare
  • Enable pooling of different token files
  • Update documentation for associated scripts

v2.8.0

27 Aug 02:53
Compare
Choose a tag to compare
  • Can now compare empirical token distributions across runs

v2.7.2

24 Aug 04:41
Compare
Choose a tag to compare
  • Fix a bug where casing and sequence splitting did not occur correctly during tokenisation

v2.7.1

12 Aug 14:41
Compare
Choose a tag to compare
  • Fix a bug where tokenise_bio did not load dependencies correctly
  • Add sequence breaking functionality to tokenise_bio -b for splitting long seqs that may cause memory issues (eg chr1)
  • Add sequence casing functionality to tokenise_bio -c for changing data input during tokeniser training to upper or lower case

v2.6.3

10 Aug 11:24
Compare
Choose a tag to compare
  • Can now sweep on a subset of data by using the --partition_percent option

v2.5.0

09 Aug 05:18
Compare
Choose a tag to compare
  • Fix a bug where train was not functioning correctly (class label encodings)

v2.4.4

04 Aug 02:51
Compare
Choose a tag to compare
  • Fix a bug where csv files were truncated if the input sequence is too long
  • Fix a bug where embeddings were not generated correctly in create_embedding_bio_sp.py

v2.4.3

31 Jul 08:12
Compare
Choose a tag to compare
  • Add support for reverse complementing some non-standard nucleotides
  • Fix bug in k-merisation process where a non-existing tokeniser file was parsed