You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When running the TF training notebook for MovieLens, the cell that creates the dataloaders raises an IndexError: list index out of range exception.
Steps/Code to reproduce bug
Pull the 0.5.1 merlin-tensorflow-training container and run the MovieLens example notebooks in sequence.
Expected behavior
The notebook should load the dataset and successfully train a model.
Environment details (please complete the following information):
Environment location: Docker
Method of NVTabular install: pre-installed in container
Additional context
Cell content:
train_dataset_tf=KerasSequenceLoader(
TRAIN_PATHS, # you could also use a glob patternbatch_size=BATCH_SIZE,
label_names=['rating'],
cat_names=CATEGORICAL_COLUMNS+CATEGORICAL_MH_COLUMNS,
cont_names=NUMERIC_COLUMNS,
engine='parquet',
shuffle=True,
buffer_size=0.06, # how many batches to load at onceparts_per_chunk=1
)
valid_dataset_tf=KerasSequenceLoader(
VALID_PATHS, # you could also use a glob patternbatch_size=BATCH_SIZE,
label_names=['rating'],
cat_names=CATEGORICAL_COLUMNS+CATEGORICAL_MH_COLUMNS,
cont_names=NUMERIC_COLUMNS,
engine='parquet',
shuffle=False,
buffer_size=0.06,
parts_per_chunk=1
)
Stack trace:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-8-9f76412fe015> in <module>
----> 1 train_dataset_tf = KerasSequenceLoader(
2 TRAIN_PATHS, # you could also use a glob pattern
3 batch_size=BATCH_SIZE,
4 label_names=['rating'],
5 cat_names=CATEGORICAL_COLUMNS+CATEGORICAL_MH_COLUMNS,
/nvtabular/nvtabular/loader/tensorflow.py in __init__(self, paths_or_dataset, batch_size, label_names, feature_columns, cat_names, cont_names, engine, shuffle, seed_fn, buffer_size, device, parts_per_chunk, reader_kwargs, global_size, global_rank, drop_last)
223
224 device = device or 0
--> 225 DataLoader.__init__(
226 self,
227 dataset,
/nvtabular/nvtabular/loader/backend.py in __init__(self, dataset, cat_names, cont_names, label_names, batch_size, shuffle, seed_fn, parts_per_chunk, device, global_size, global_rank, drop_last)
202 self._buff = ChunkQueue(self, 1, num_parts=parts_per_chunk, shuffle=shuffle)
203 # run once instead of everytime len called
--> 204 self._buff_len = len(self._buff)
205 self._batch_itr = None
206 self._workers = None
/nvtabular/nvtabular/loader/backend.py in __len__(self)
62
63 def __len__(self):
---> 64 return len(self.itr)
65
66 @property
/nvtabular/nvtabular/io/dataset.py in __len__(self)
913 # will not be correct if rows where added or dropped
914 # after IO (within Ops).
--> 915 return sum(self.partition_lens[i] for i in self.indices)
916 if len(self.indices) < self._ddf.npartitions:
917 return len(self._ddf.partitions[self.indices])
/nvtabular/nvtabular/io/dataset.py in <genexpr>(.0)
913 # will not be correct if rows where added or dropped
914 # after IO (within Ops).
--> 915 return sum(self.partition_lens[i] for i in self.indices)
916 if len(self.indices) < self._ddf.npartitions:
917 return len(self._ddf.partitions[self.indices])
IndexError: list index out of range
The text was updated successfully, but these errors were encountered:
Describe the bug
When running the TF training notebook for MovieLens, the cell that creates the dataloaders raises an
IndexError: list index out of range
exception.Steps/Code to reproduce bug
Pull the 0.5.1 merlin-tensorflow-training container and run the MovieLens example notebooks in sequence.
Expected behavior
The notebook should load the dataset and successfully train a model.
Environment details (please complete the following information):
Additional context
Cell content:
Stack trace:
The text was updated successfully, but these errors were encountered: