Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keyword "assistant" Error with ShareGPT Datasets #752

Closed
5 tasks done
MilesQLi opened this issue Oct 21, 2023 · 5 comments
Closed
5 tasks done

Keyword "assistant" Error with ShareGPT Datasets #752

MilesQLi opened this issue Oct 21, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@MilesQLi
Copy link
Contributor

⚠️ Please check that this feature request hasn't been suggested before.

  • I searched previous Ideas in Discussions didn't find any similar feature requests.
  • I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

On ShareGPT datasets such as shibing624/sharegpt_gpt4, some "from" values are "assistant" rather than "gpt", so, the program raises the keyword error.

✔️ Solution

Maybe take both assistant and gpt as the machine's responses?

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this feature has not been requested yet.
  • I have provided enough information for the maintainers to understand and evaluate this request.
@MilesQLi MilesQLi added the enhancement New feature or request label Oct 21, 2023
@NanoCode012
Copy link
Collaborator

I checked that linked dataset. It seems to be human/gpt.

In case you want to swap keys, please check this.

https://github.com/OpenAccess-AI-Collective/axolotl/blob/21cf09b60840ee03ba3ee4e57c2707d2296f532c/src/axolotl/prompt_strategies/sharegpt.py#L78-L90

It swaps roles using a simple dictionary map.

@MilesQLi
Copy link
Contributor Author

@NanoCode012 I said "some "from" values are "assistant" rather than "gpt"". You checked only some data from that dataset, not all.

@NanoCode012
Copy link
Collaborator

Oh I see! You can use this kind of role_map then.

role_map = {"assistant": "gpt", "human": "human", "gpt": "gpt"} 

@MilesQLi
Copy link
Contributor Author

@NanoCode012 Thanks! I'm not sure I'm doing it the way you mentioned. I updated the role_map, but still get the error. Please see the two screenshots.
image
image

@MilesQLi
Copy link
Contributor Author

MilesQLi commented Oct 22, 2023

I fixed this by updating SimpleShareGPTPromptTokenizingStrategy with the role_map you provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants