Skip to content

Commit

Permalink
Merge pull request #203 from QuickSolverDab/patch-1
Browse files Browse the repository at this point in the history
Update lm_datasets.py for uint32 bin-file
  • Loading branch information
donglixp authored Apr 16, 2024
2 parents a3850b9 + 1e3bf11 commit daf9721
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions minillm/data_utils/lm_datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,10 @@ def _process_lm(self, i, samp, model_data, no_model_data, gen_data):
prompt = None
if 65535 in input_ids:
source_len = np.where(input_ids==65535)[0][0]
prompt = input_ids[:source_len] #for uint16 (others)
input_ids = np.concatenate([input_ids[:source_len], input_ids[source_len+1:]], axis=0)
elif 4294967295 in input_ids: #for uint32 (qwen, gemma, and etc)
source_len = np.where(input_ids==4294967295)[0][0]
prompt = input_ids[:source_len]
input_ids = np.concatenate([input_ids[:source_len], input_ids[source_len+1:]], axis=0)
input_ids = input_ids[:self.max_length]
Expand Down

0 comments on commit daf9721

Please sign in to comment.