Pretraining (with CPUs) #660

bitmarkcc · 2024-07-01T09:35:57Z

I'm new to deep learning but have some experience with training boosted-decision-trees.

Is this just for fine-tuning or pretraining as well? When I look inside train_gpt2.c I see the first thing it does is it loads weights from a bin file (gpt2_124M.bin). Where did this bin file come from? Is this an official file released by OpenAI? I would like to be able to start from scratch.

I would like to first see how pretraining works, even if it's just a small dataset, and it doesn't need to be GPUs. I would like to start with CPUs first, and maybe add CPU-only nodes that can work on 'parts' of the training.

bitmarkcc · 2024-07-02T09:46:05Z

I see in train_gpt2.cu there is a gpt2_build_from_random() for training from scratch. I can attempt to copy that into the train_gpt2.c, but not sure how easy it will be. Any forks doing this?

What I would like to see is code that is platform independent (no reliance on Nvidia or AMD), though if people have those devices (or ASICs) they can use optimized code, but there should be a fallback to the platform independent code.

Edit: I think this will do mainly what I want. Though I need to add a way to pass the model and training parameters to the command line: bitmarkcc@bdff450

gordicaleksa · 2024-07-05T20:40:12Z

Hey @bitmarkcc! Did you follow the README?

You should first run the Python code, it'll generate all the necessary bin/state files before you run C/CUDA code.

If something is not clearly explained in the README either open up a PR fixing it or reply back here, happy to help.

bitmarkcc · 2024-07-06T08:42:52Z

Ya so according to the README, these can be generated with the train_gpt2.py and it references the official implementations of GPT-2 from OpenAI and HuggingFace. So these were generated from that python script? And if you run the C program it reproduces the same bin files?

In any case, I am still wondering if my code is fine for how I implemented pretraining for CPU mode (bitmarkcc@bdff450). I want to make more changes and I can put a pull request later on.

Edit: I think now it actually randomizes the parameters (2nd commit): bitmarkcc@7581695

invisiblepancake · 2024-07-08T16:24:04Z

nebody know where to get nice SHA?

brisker · 2024-07-31T16:23:03Z

@gordicaleksa
I only have 4050 NVIDIA gpu(I have no A100 or v100), can the code be running on my gpu?

bitmarkcc changed the title ~~Pretraining~~ Pretraining (with CPUs) Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretraining (with CPUs) #660

Pretraining (with CPUs) #660

bitmarkcc commented Jul 1, 2024

bitmarkcc commented Jul 2, 2024 •

edited

Loading

gordicaleksa commented Jul 5, 2024

bitmarkcc commented Jul 6, 2024 •

edited

Loading

invisiblepancake commented Jul 8, 2024

brisker commented Jul 31, 2024 •

edited

Loading

Pretraining (with CPUs) #660

Pretraining (with CPUs) #660

Comments

bitmarkcc commented Jul 1, 2024

bitmarkcc commented Jul 2, 2024 • edited Loading

gordicaleksa commented Jul 5, 2024

bitmarkcc commented Jul 6, 2024 • edited Loading

invisiblepancake commented Jul 8, 2024

brisker commented Jul 31, 2024 • edited Loading

bitmarkcc commented Jul 2, 2024 •

edited

Loading

bitmarkcc commented Jul 6, 2024 •

edited

Loading

brisker commented Jul 31, 2024 •

edited

Loading