Releases: tyronechen/genomenlp
Releases · tyronechen/genomenlp
v2.8.5
v2.8.2
- Enable
fit_powerlaw
to pull models fromwandb
directly.
v2.8.1
- Enable pooling of different token files
- Update documentation for associated scripts
v2.8.0
- Can now compare empirical token distributions across runs
v2.7.2
- Fix a bug where casing and sequence splitting did not occur correctly during tokenisation
v2.7.1
- Fix a bug where
tokenise_bio
did not load dependencies correctly - Add sequence breaking functionality to
tokenise_bio -b
for splitting long seqs that may cause memory issues (eg chr1) - Add sequence casing functionality to
tokenise_bio -c
for changing data input during tokeniser training to upper or lower case
v2.6.3
- Can now sweep on a subset of data by using the
--partition_percent
option
v2.5.0
- Fix a bug where
train
was not functioning correctly (class label encodings)
v2.4.4
- Fix a bug where
csv
files were truncated if the input sequence is too long - Fix a bug where embeddings were not generated correctly in
create_embedding_bio_sp.py
v2.4.3
- Add support for reverse complementing some non-standard nucleotides
- Fix bug in k-merisation process where a non-existing tokeniser file was parsed