Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Output keyset file #1049

Closed
albert17 opened this issue Aug 19, 2021 · 1 comment · Fixed by #1136
Closed

[FEA] Output keyset file #1049

albert17 opened this issue Aug 19, 2021 · 1 comment · Fixed by #1136
Assignees
Labels
HugeCTR HugeCTR Integration P1

Comments

@albert17
Copy link
Contributor

Generate and output file in binary file, with the unique keys of all the cat features in sequential order.

@yingcanw @jershi425 Please, add more details.

@oyilmaz-nvidia for viz.

@albert17 albert17 added the HugeCTR HugeCTR Integration label Aug 19, 2021
@jershi425
Copy link

We need a binary file which contains all of unique keys of all the categorical features in sequential order.

Some clarifications:

  1. unique keys of all the categorical features mean that there should NOT be any duplicates, even across features.
  2. The file size should be exactly the size(4 bytes or 8 bytes) of key type(int32 or int64) * number of unique keys. In another word, we don't need any separators in the file.
  3. The naming of file is *.keyset
  4. The number of unique keys is also equal to the sum of embedding sizes(can be generated using the get_embedding_sizes method in NVT).

For example:
Suppose we have "feature1":[key1, key2, key3], "feature2:"[key1, key2], "feature3:"[key1, key2, key3, key4].
The embedding sizes of feature1, feature2, feature3 are 3, 2, 4, respectively. So the total number of unique keys is equal to 3 + 2 +4 = 9.
Therefore, what in the .keyset file should be 123456789 (in binary format).

@viswa-nvidia viswa-nvidia added this to the NVTabular-v21.09 milestone Aug 31, 2021
@benfred benfred added the P1 label Sep 7, 2021
@albert17 albert17 linked a pull request Sep 21, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HugeCTR HugeCTR Integration P1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants