Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretraining (with CPUs) #660

Open
bitmarkcc opened this issue Jul 1, 2024 · 5 comments
Open

Pretraining (with CPUs) #660

bitmarkcc opened this issue Jul 1, 2024 · 5 comments

Comments

@bitmarkcc
Copy link

I'm new to deep learning but have some experience with training boosted-decision-trees.

Is this just for fine-tuning or pretraining as well? When I look inside train_gpt2.c I see the first thing it does is it loads weights from a bin file (gpt2_124M.bin). Where did this bin file come from? Is this an official file released by OpenAI? I would like to be able to start from scratch.

I would like to first see how pretraining works, even if it's just a small dataset, and it doesn't need to be GPUs. I would like to start with CPUs first, and maybe add CPU-only nodes that can work on 'parts' of the training.

@bitmarkcc
Copy link
Author

bitmarkcc commented Jul 2, 2024

I see in train_gpt2.cu there is a gpt2_build_from_random() for training from scratch. I can attempt to copy that into the train_gpt2.c, but not sure how easy it will be. Any forks doing this?

What I would like to see is code that is platform independent (no reliance on Nvidia or AMD), though if people have those devices (or ASICs) they can use optimized code, but there should be a fallback to the platform independent code.

Edit: I think this will do mainly what I want. Though I need to add a way to pass the model and training parameters to the command line: bitmarkcc@bdff450

@bitmarkcc bitmarkcc changed the title Pretraining Pretraining (with CPUs) Jul 2, 2024
@gordicaleksa
Copy link
Contributor

Hey @bitmarkcc! Did you follow the README?

You should first run the Python code, it'll generate all the necessary bin/state files before you run C/CUDA code.

If something is not clearly explained in the README either open up a PR fixing it or reply back here, happy to help.

@bitmarkcc
Copy link
Author

bitmarkcc commented Jul 6, 2024

Ya so according to the README, these can be generated with the train_gpt2.py and it references the official implementations of GPT-2 from OpenAI and HuggingFace. So these were generated from that python script? And if you run the C program it reproduces the same bin files?

In any case, I am still wondering if my code is fine for how I implemented pretraining for CPU mode (bitmarkcc@bdff450). I want to make more changes and I can put a pull request later on.

Edit: I think now it actually randomizes the parameters (2nd commit): bitmarkcc@7581695

@invisiblepancake
Copy link

nebody know where to get nice SHA?

@brisker
Copy link

brisker commented Jul 31, 2024

@gordicaleksa
I only have 4050 NVIDIA gpu(I have no A100 or v100), can the code be running on my gpu?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants