-
Notifications
You must be signed in to change notification settings - Fork 212
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[NeuralChat] Enabled image2text finetuning and added an example (#1372)
* Enabled image2text finetuning and added an example. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>
- Loading branch information
1 parent
7539c35
commit ef94aea
Showing
6 changed files
with
225 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
78 changes: 78 additions & 0 deletions
78
...ension_for_transformers/neural_chat/examples/finetuning/image_to_text/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
NeuralChat Fine-tuning | ||
============ | ||
|
||
This example demonstrates how to finetune the pretrained generative image-to-text model on customized dataset. | ||
|
||
# Prerequisite | ||
|
||
## 1. Environment | ||
### Bare Metal | ||
Recommend python 3.9 or higher version. | ||
```shell | ||
pip install -r requirements.txt | ||
pip install transformers==4.34.1 | ||
# To use ccl as the distributed backend in distributed training on CPU requires to install below requirement. | ||
python -m pip install oneccl_bind_pt==2.2.0 -f https://developer.intel.com/ipex-whl-stable-cpu | ||
``` | ||
>**Note**: Suggest using transformers no higher than 4.34.1 | ||
### Docker | ||
Pick either one of below options to setup docker environment. | ||
#### Option 1 : Build Docker image from scratch | ||
Please refer to this section : [How to build docker images for NeuralChat FineTuning](../../../docker/finetuning/README.md#21-build-docker-image) to build docker image from scratch. | ||
|
||
#### Option 2: Pull existing Docker image | ||
Please follow the session [itrex docker setup](../../../docker/finetuning/README.md#22-docker-pull-from-docker-hub) and use the docker pull command to pull itrex docker image. | ||
|
||
|
||
Once you have the docker image ready, please follow [run docker image](../../../docker/finetuning/README.md#3-create-docker-container) session to launch a docker instance from the image. | ||
|
||
|
||
## 2. Prepare the Model | ||
|
||
#### microsoft/git-base | ||
To acquire the checkpoints and tokenizer, the user can get those files from [microsoft/git-base](https://huggingface.co/microsoft/git-base). | ||
Users could follow below commands to get the checkpoints from github repository after the access request to the files is approved. | ||
```bash | ||
git lfs install | ||
git clone https://huggingface.co/microsoft/git-base | ||
``` | ||
|
||
## 3. Prepare Dataset | ||
|
||
For datasets exist in the Hugging Face Hub, user can use `dataset_name` argument to pass in the needed dataset. | ||
For local datasets, user can follow this [guide](https://huggingface.co/docs/datasets/v2.18.0/en/image_dataset#image-captioning) from datasets' official document to create a metadata file that contain image and text pairs, than use `train_dir` and optionally `validation_dir` to pass in the path to the needed dataset. | ||
|
||
### Dataset related arguments | ||
- **dataset_name**: The name of the dataset to use (via the datasets library). | ||
- **dataset_config_name**: The configuration name of the dataset to use (via the datasets library). | ||
- **train_dir**: A folder containing the training data. | ||
- **validation_dir**: A folder containing the validation data. | ||
- **image_column**: The column of the dataset containing an image or a list of images. | ||
- **caption_column**: The column of the dataset containing a caption or a list of captions. | ||
- **validation_split_percentage**: The percentage of the train set used as validation set in case there's no validation split. | ||
|
||
# Finetune | ||
|
||
Use the below command line for finetuning `microsoft/git-base` model on the `gaodrew/roco-65k-256px` dataset. | ||
|
||
```bash | ||
python finetune_clm.py \ | ||
--model_name_or_path "microsoft/git-base" \ | ||
--bf16 True \ | ||
--dataset_name "gaodrew/roco-65k-256px" \ | ||
--per_device_train_batch_size 8 \ | ||
--per_device_eval_batch_size 8 \ | ||
--gradient_accumulation_steps 1 \ | ||
--do_train \ | ||
--learning_rate 1e-4 \ | ||
--num_train_epochs 3 \ | ||
--logging_steps 100 \ | ||
--save_total_limit 2 \ | ||
--overwrite_output_dir \ | ||
--log_level info \ | ||
--save_strategy epoch \ | ||
--output_dir ./git-base_finetuned_model \ | ||
--task image2text \ | ||
--full_finetune \ | ||
--bits 16 |
68 changes: 68 additions & 0 deletions
68
..._extension_for_transformers/neural_chat/examples/finetuning/image_to_text/finetune_clm.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# !/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (c) 2023 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import os | ||
import sys | ||
from transformers import TrainingArguments, HfArgumentParser | ||
from intel_extension_for_transformers.neural_chat.config import ( | ||
ModelArguments, | ||
DataArguments, | ||
FinetuningArguments, | ||
BaseFinetuningConfig, | ||
) | ||
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model | ||
from intel_extension_for_transformers.utils.device_utils import is_hpu_available | ||
|
||
def main(): | ||
# See all possible arguments in src/transformers/training_args.py | ||
# or by passing the --help flag to this script. | ||
# We now keep distinct sets of args, for a cleaner separation of concerns. | ||
if not is_hpu_available: | ||
parser = HfArgumentParser( | ||
(ModelArguments, DataArguments, TrainingArguments, FinetuningArguments) | ||
) | ||
else: | ||
from optimum.habana import GaudiTrainingArguments | ||
|
||
parser = HfArgumentParser( | ||
(ModelArguments, DataArguments, GaudiTrainingArguments, FinetuningArguments) | ||
) | ||
|
||
if len(sys.argv) == 2 and sys.argv[1].endswith(".json"): | ||
# If we pass only one argument to the script and it's the path to a json file, | ||
# let's parse it to get our arguments. | ||
model_args, data_args, training_args, finetune_args = parser.parse_json_file( | ||
json_file=os.path.abspath(sys.argv[1]) | ||
) | ||
else: | ||
( | ||
model_args, | ||
data_args, | ||
training_args, | ||
finetune_args, | ||
) = parser.parse_args_into_dataclasses() | ||
|
||
finetune_cfg = BaseFinetuningConfig( | ||
model_args=model_args, | ||
data_args=data_args, | ||
training_args=training_args, | ||
finetune_args=finetune_args, | ||
) | ||
finetune_model(finetune_cfg) | ||
|
||
if __name__ == "__main__": | ||
main() |
15 changes: 15 additions & 0 deletions
15
...extension_for_transformers/neural_chat/examples/finetuning/image_to_text/requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
datasets | ||
einops | ||
evaluate | ||
fastapi | ||
nltk | ||
peft | ||
pydub | ||
python-multipart | ||
rouge_score | ||
sentencepiece | ||
shortuuid | ||
torch==2.2.0 | ||
transformers | ||
uvicorn | ||
yacs |