Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update multi-node.qmd #1688

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 113 additions & 8 deletions docs/multi-node.qmd
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
---
title: Multi Node
title: Distributed Finetuning For Multi-Node with Axolotl and FSDP
description: How to use Axolotl on multiple machines
---
# Distributed Finetuning For Multi-Node with Axolotl and FSDP
winglian marked this conversation as resolved.
Show resolved Hide resolved
You will need to create a configuration for accelerate, either by using `accelerate config` and follow the instructions or you can use
one of the preset below:

You will need to create a configuration for accelerate, either by using `accelerate config` and follow the instructions or you can use one of the preset below:

~/.cache/huggingface/accelerate/default_config.yaml
```yaml
~/.cache/huggingface/accelerate/default_config.yaml
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
Expand All @@ -27,6 +28,7 @@ use_cpu: false
```

Configure your model to use FSDP with for example:

```yaml
fsdp:
- full_shard
Expand All @@ -38,11 +40,114 @@ fsdp_config:
```

## Machine configuration

On each machine you need a copy of Axolotl, we suggest using the same commit to ensure compatibility.

You will also need to have the same configuration file for your model on each machine.
On the main machine only, make sure the port you set as main_process_port is open in TCP and reachable by other machines.
All you have to do now is launch using accelerate as you would usually do on each machine and voila, the processes will start once you have launched accelerate on every machine.

On the main machine only, make sure the port you set as `main_process_port` is open in TCP and reachable by other machines.
--------------------------------------------------------------------------------------------------------------------------------------

All you have to do now is launch using accelerate as you would usually do on each machine and voila, the processes will start once you have launched accelerate on every machine.
# Distributed Finetuning For Multi-Node with Axolotl and Deepspeed On AWS EC2 Instances
This guide explains how to set up a distributed finetuning environment using Axolotl, a framework for finetuning large language models. The setup involves configuring SSH for passwordless access, generating public keys, and configuring Axolotl and Accelerate for multi-node training.

## Prerequisites
- Multiple nodes (servers or machines) with GPUs
- Ubuntu or any other Linux distribution

## Step 1: Configure SSH for Passwordless Access
1. Open the `sshd_config` file on all nodes and the server:

```bash
sudo nano ~/etc/ssh/sshd_config
```

2. Uncomment the `PubkeyAuthentication` line to enable public key authentication.
3. Save the changes and restart the SSH service:

```bash
sudo systemctl restart sshd
```

## Step 2: Generate Public Key
1. Generate a public key using `ssh-keygen`:

```bash
ssh-keygen -t rsa
```

2. Copy the public key to the clipboard:

```bash
cat ~/.ssh/id_rsa.pub
```

3. Add the public key to the `authorized_keys` file on all nodes and the server:

```bash
nano ~/.ssh/authorized_keys
```

4. Repeat steps 1-3 on all other nodes.
5. Exchange public keys between nodes:
- Paste the public key of Node 1 into the `authorized_keys` file of Node 2, and vice versa.
- Repeat this process for all node pairs.
6. Test passwordless SSH access:

```bash
ssh <ip-of-other-node>
```

## Step 3: Configure Axolotl
1. Configure Axolotl on each node with the same `.yml` files and settings.
2. Create a `deepspeed_hostfile` inside the Axolotl folder:

```bash
nano deepspeed_hostfile
```

3. Add the IP addresses and GPU slots for each node:

```
<ip-node-1> slots=<num_gpu_in_node1>
<ip-node-2> slots=<num_gpu_in_node2>
```

## Step 4: Configure Accelerate
Follow these steps on each node to configure Accelerate:

```bash
accelerate config
```

### Node 1 (Server) Configuration
1. Select `This machine` for the compute environment.
2. Select `multi_gpu` for the compute type.
3. Enter the number of machines (e.g., `2` for two nodes).
4. Enter the machine rank `0` for the first node (server).
5. Enter the server IP address.
6. Enter the main process port (e.g., `5000`).
7. Select `no` for setting up custom environment variables.
8. Select `static` for the rendezvous backend.
9. Select `yes` for running on the same network.
10. Select `no` for using a cluster.
11. Select `yes` for using Deepspeed.
12. Select `yes` for using Deepspeed configs.
13. Enter the Deepspeed config file (e.g., `deepspeed_configs/zero2.json`).
14. Select `no` for using Zero 3.
15. Enter `pdsh` for the Deepspeed multinode launcher.
16. Enter `deepspeed_hostfile` for the Deepspeed hostfile.
17. Select `no` for using custom launch utility options.
18. Select `no` for using a TPU.
19. Enter the number of processes (e.g., `8` for 8 GPUs).

### Node 2 Configuration
Repeat the above steps for Node 2, but change the machine rank to `1`.

## Step 5: Finetuning
On Node 1 (server), run the finetuning process using Accelerate:

```bash
accelerate launch -m axolotl.cli.train examples/llama-2/qlora.yml
```

This will start the finetuning process across all nodes. You can check the different IP addresses before each step to verify that the training is running on every node.
Comment on lines +147 to +153

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge this is not the case. You need to do accelerate launch -m on every server else it will sit there and never actually start

Copy link
Author

@shahdivax shahdivax Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we tasted, we were only starting on single node (server) and it was able to use the resources from other nodes,
As a proof, we were able to see the ip of both the machines on the left, and in the total GPU it were showing all the GPU form all the nodes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muellerzr My guess is this is probably specific to deepspeed since the IP addresses are set in a hostfile. We should probably disambiguate this that it only needs to be run on the first node when this is the case. Most other cases like FSDP or plain multinode DDP will likely still need accelerate launch to be run on each node.

Copy link
Author

@shahdivax shahdivax Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winglian, @muellerzr That might be the case , because for us deepspeed was a good options where we were using multi node for finetuning via EC2, as it provides the public ip , and we used hostfile, it was really easy to connect both machines and run the finetuning on root only, this indeed connected all the other instances. (using all the resources from all the nodes via single node)