Skip to content

Commit

Permalink
ultralytics 8.1.39 add YOLO-World training (ultralytics#9268)
Browse files Browse the repository at this point in the history
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: UltralyticsAssistant <web@ultralytics.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
  • Loading branch information
3 people authored and hmurari committed Apr 17, 2024
1 parent d35ed2a commit 645bc3a
Show file tree
Hide file tree
Showing 34 changed files with 2,161 additions and 95 deletions.
1 change: 1 addition & 0 deletions docs/en/datasets/detect/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ Here is a list of the supported datasets and a brief description for each:

- [**Argoverse**](argoverse.md): A collection of sensor data collected from autonomous vehicles. It contains 3D tracking annotations for car objects.
- [**COCO**](coco.md): Common Objects in Context (COCO) is a large-scale object detection, segmentation, and captioning dataset with 80 object categories.
- [**LVIS**](lvis.md): LVIS is a large-scale object detection, segmentation, and captioning dataset with 1203 object categories.
- [**COCO8**](coco8.md): A smaller subset of the COCO dataset, COCO8 is more lightweight and faster to train.
- [**GlobalWheat2020**](globalwheat2020.md): A dataset containing images of wheat heads for the Global Wheat Challenge 2020.
- [**Objects365**](objects365.md): A large-scale object detection dataset with 365 object categories and 600k images, aimed at advancing object detection research.
Expand Down
96 changes: 96 additions & 0 deletions docs/en/datasets/detect/lvis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
comments: true
description: Learn how LVIS, a leading dataset for object detection and segmentation, integrates with Ultralytics. Discover ways to use it for training YOLO models.
keywords: Ultralytics, LVIS dataset, object detection, YOLO, YOLO model training, image segmentation, computer vision, deep learning models
---

# LVIS Dataset

The [LVIS](https://www.lvisdataset.org/dataset) dataset is a large-scale, fine-grained vocabulary-level annotation dataset developed and released by Facebook AI Research (FAIR). It is primarily used as a research benchmark for object detection and instance segmentation with a large vocabulary of categories, aiming to drive further advancements in computer vision field.

## Key Features

- LVIS contains 160k images and 2M instance annotations for object detection, segmentation, and captioning tasks.
- The dataset comprises 1203 object categories, including common objects like cars, bicycles, and animals, as well as more specific categories such as umbrellas, handbags, and sports equipment.
- Annotations include object bounding boxes, segmentation masks, and captions for each image.
- LVIS provides standardized evaluation metrics like mean Average Precision (mAP) for object detection, and mean Average Recall (mAR) for segmentation tasks, making it suitable for comparing model performance.
- LVIS uses the exactly the same images as [COCO](./coco.md) dataset, but with different splits and different annotations.

## Dataset Structure

The LVIS dataset is split into three subsets:

1. **Train**: This subset contains 100k images for training object detection, segmentation, and captioning models.
2. **Val**: This subset has 20k images used for validation purposes during model training.
3. **Minival**: This subset is exactly the same as COCO val2017 set which has 5k images used for validation purposes during model training.
4. **Test**: This subset consists of 20k images used for testing and benchmarking the trained models. Ground truth annotations for this subset are not publicly available, and the results are submitted to the [LVIS evaluation server](https://eval.ai/web/challenges/challenge-page/675/overview) for performance evaluation.


## Applications

The LVIS dataset is widely used for training and evaluating deep learning models in object detection (such as YOLO, Faster R-CNN, and SSD), instance segmentation (such as Mask R-CNN). The dataset's diverse set of object categories, large number of annotated images, and standardized evaluation metrics make it an essential resource for computer vision researchers and practitioners.

## Dataset YAML

A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the LVIS dataset, the `lvis.yaml` file is maintained at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/lvis.yaml](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/lvis.yaml).

!!! Example "ultralytics/cfg/datasets/lvis.yaml"

```yaml
--8<-- "ultralytics/cfg/datasets/lvis.yaml"
```

## Usage

To train a YOLOv8n model on the LVIS dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.

!!! Example "Train Example"

=== "Python"

```python
from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n.pt') # load a pretrained model (recommended for training)

# Train the model
results = model.train(data='lvis.yaml', epochs=100, imgsz=640)
```

=== "CLI"

```bash
# Start training from a pretrained *.pt model
yolo detect train data=lvis.yaml model=yolov8n.pt epochs=100 imgsz=640
```

## Sample Images and Annotations

The LVIS dataset contains a diverse set of images with various object categories and complex scenes. Here are some examples of images from the dataset, along with their corresponding annotations:

![Dataset sample image](https://private-user-images.githubusercontent.com/61612323/316485965-a88c2e62-58d0-4f67-bc69-1418e42175e9.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTEzNjcyNjYsIm5iZiI6MTcxMTM2Njk2NiwicGF0aCI6Ii82MTYxMjMyMy8zMTY0ODU5NjUtYTg4YzJlNjItNThkMC00ZjY3LWJjNjktMTQxOGU0MjE3NWU5LmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDAzMjUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwMzI1VDExNDI0NlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZmMTVlNzE5MTBkOTZmNDQwNzJjNWQzYzM2NmEyMGMxODQ4ZDEyMjYwYmMyY2JjZDU5YzBmMDIyZGEwMGEwZDAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.7thukPdnJKYuBmTk1ROUyqxxV3Ix5GeNLqyi4wSDYvA)


- **Mosaiced Image**: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This helps improve the model's ability to generalize to different object sizes, aspect ratios, and contexts.

The example showcases the variety and complexity of the images in the LVIS dataset and the benefits of using mosaicing during the training process.

## Citations and Acknowledgments

If you use the LVIS dataset in your research or development work, please cite the following paper:

!!! Quote ""

=== "BibTeX"

```bibtex
@inproceedings{gupta2019lvis,
title={{LVIS}: A Dataset for Large Vocabulary Instance Segmentation},
author={Gupta, Agrim and Dollar, Piotr and Girshick, Ross},
booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},
year={2019}
}
```

We would like to acknowledge the LVIS Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the LVIS dataset and its creators, visit the [LVIS dataset website](https://www.lvisdataset.org/dataset).
1 change: 1 addition & 0 deletions docs/en/datasets/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ Bounding box object detection is a computer vision technique that involves detec

- [Argoverse](detect/argoverse.md): A dataset containing 3D tracking and motion forecasting data from urban environments with rich annotations.
- [COCO](detect/coco.md): A large-scale dataset designed for object detection, segmentation, and captioning with over 200K labeled images.
- [LVIS](lvis.md): A large-scale object detection, segmentation, and captioning dataset with 1203 object categories.
- [COCO8](detect/coco8.md): Contains the first 4 images from COCO train and COCO val, suitable for quick tests.
- [Global Wheat 2020](detect/globalwheat2020.md): A dataset of wheat head images collected from around the world for object detection and localization tasks.
- [Objects365](detect/objects365.md): A high-quality, large-scale dataset for object detection with 365 object categories and over 600K annotated images.
Expand Down
2 changes: 1 addition & 1 deletion docs/en/models/fast-sam.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ FastSAM is also available directly from the [https://github.com/CASIA-IVA-Lab/Fa

4. Install the CLIP model:
```shell
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/ultralytics/CLIP.git
```

### Example Usage
Expand Down
86 changes: 86 additions & 0 deletions docs/en/models/yolo-world.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,39 @@ This section details the models available with their specific pre-trained weight

The YOLO-World models are easy to integrate into your Python applications. Ultralytics provides user-friendly Python API and CLI commands to streamline development.

### Train Usage

!!! Tip "Tip"

We strongly recommend to use `yolov8-worldv2` model for custom training, because it supports deterministic training and also easy to export other formats i.e onnx/tensorrt.

Object detection is straightforward with the `train` method, as illustrated below:

!!! Example

=== "Python"
PyTorch pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLOWorld()` class to create a model instance in python:

```python
from ultralytics import YOLOWorld

# Load a pretrained YOLOv8s-worldv2 model
model = YOLOWorld('yolov8s-worldv2.pt')

# Train the model on the COCO8 example dataset for 100 epochs
results = model.train(data='coco8.yaml', epochs=100, imgsz=640)

# Run inference with the YOLOv8n model on the 'bus.jpg' image
results = model('path/to/bus.jpg')
```

=== "CLI"

```bash
# Load a pretrained YOLOv8s-worldv2 model and train it on the COCO8 example dataset for 100 epochs
yolo train model=yolov8s-worldv2.yaml data=coco8.yaml epochs=100 imgsz=640
```

### Predict Usage

Object detection is straightforward with the `predict` method, as illustrated below:
Expand Down Expand Up @@ -196,6 +229,59 @@ You can also save a model after setting custom classes. By doing this you create

This approach provides a powerful means of customizing state-of-the-art object detection models for specific tasks, making advanced AI more accessible and applicable to a broader range of practical applications.

## Reproduce official results from scratch(Experimental)

### Prepare datasets

- Train data

| Dataset | Type | Samples | Boxes | Annotation Files |
|-------------------------------------------------------------------|-----------|---------|-------|--------------------------------------------------------------------------------------------------------------------------------------------|
| [Objects365v1](https://opendatalab.com/OpenDataLab/Objects365_v1) | Detection | 609k | 9621k | [objects365_train.json](https://opendatalab.com/OpenDataLab/Objects365_v1) |
| [GQA](https://nlp.stanford.edu/data/gqa/images.zip) | Grounding | 621k | 3681k | [final_mixed_train_no_coco.json](https://huggingface.co/GLIPModel/GLIP/blob/main/mdetr_annotations/final_mixed_train_no_coco.json) |
| [Flickr30k](https://shannon.cs.illinois.edu/DenotationGraph/) | Grounding | 149k | 641k | [final_flickr_separateGT_train.json](https://huggingface.co/GLIPModel/GLIP/blob/main/mdetr_annotations/final_flickr_separateGT_train.json) |

- Val data

| Dataset | Type | Annotation Files |
|---------------------------------------------------------------------------------------------------------|-----------|--------------------------------------------------------------------------------------------------------|
| [LVIS minival](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/lvis.yaml) | Detection | [minival.txt](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/lvis.yaml) |

### Launch training from scratch

!!! Note

`WorldTrainerFromScratch` is highly customized to allow training yolo-world models on both detection datasets and grounding datasets simultaneously. More details please checkout [ultralytics.model.yolo.world.train_world.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/yolo/world/train_world.py).

!!! Example

=== "Python"

```python
from ultralytics.models.yolo.world.train_world import WorldTrainerFromScratch
from ultralytics import YOLOWorld

data = dict(
train=dict(
yolo_data=["Objects365.yaml"],
grounding_data=[
dict(
img_path="../datasets/flickr30k/images",
json_file="../datasets/flickr30k/final_flickr_separateGT_train.json",
),
dict(
img_path="../datasets/GQA/images",
json_file="../datasets/GQA/final_mixed_train_no_coco.json",
),
],
),
val=dict(yolo_data=["lvis.yaml"]),
)
model = YOLOWorld("yolov8s-worldv2.yaml")
model.train(data=data, batch=128, epochs=100, trainer=WorldTrainerFromScratch)

```

## Citations and Acknowledgements

We extend our gratitude to the [Tencent AILab Computer Vision Center](https://ai.tencent.com/) for their pioneering work in real-time open-vocabulary object detection with YOLO-World:
Expand Down
4 changes: 4 additions & 0 deletions docs/en/reference/data/augment.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ keywords: Ultralytics, Data Augmentation, BaseTransform, MixUp, RandomHSV, Lette

<br><br>

## ::: ultralytics.data.augment.RandomLoadText

<br><br>

## ::: ultralytics.data.augment.ClassifyLetterBox

<br><br>
Expand Down
4 changes: 4 additions & 0 deletions docs/en/reference/data/build.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ keywords: Ultralytics, YOLO v3, Data build, DataLoader, InfiniteDataLoader, seed

<br><br>

## ::: ultralytics.data.build.build_grounding

<br><br>

## ::: ultralytics.data.build.build_dataloader

<br><br>
Expand Down
10 changes: 7 additions & 3 deletions docs/en/reference/data/dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,18 @@ keywords: Ultralytics, YOLO, YOLODataset, SemanticDataset, data handling, data m

<br><br>

## ::: ultralytics.data.dataset.SemanticDataset
## ::: ultralytics.data.dataset.YOLOMultiModalDataset

<br><br>

## ::: ultralytics.data.dataset.load_dataset_cache_file
## ::: ultralytics.data.dataset.GroundingDataset

<br><br>

## ::: ultralytics.data.dataset.save_dataset_cache_file
## ::: ultralytics.data.dataset.YOLOConcatDataset

<br><br>

## ::: ultralytics.data.dataset.SemanticDataset

<br><br>
8 changes: 8 additions & 0 deletions docs/en/reference/data/utils.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,11 @@ keywords: Ultralytics, data utils, YOLO, img2label_paths, exif_size, polygon2mas
## ::: ultralytics.data.utils.autosplit

<br><br>

## ::: ultralytics.data.utils.load_dataset_cache_file

<br><br>

## ::: ultralytics.data.utils.save_dataset_cache_file

<br><br>
15 changes: 15 additions & 0 deletions docs/en/reference/models/yolo/world/train.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Reference for `ultralytics/models/yolo/world/train.py`

!!! Note

This file is available at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/yolo/world/train.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/yolo/world/train.py). If you spot a problem please help fix it by [contributing](https://docs.ultralytics.com/help/contributing/) a [Pull Request](https://github.com/ultralytics/ultralytics/edit/main/ultralytics/models/yolo/world/train.py) 🛠️. Thank you 🙏!

<br><br>

## ::: ultralytics.models.yolo.world.train.WorldTrainer

<br><br>

## ::: ultralytics.models.yolo.world.train.on_pretrain_routine_end

<br><br>
11 changes: 11 additions & 0 deletions docs/en/reference/models/yolo/world/train_world.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Reference for `ultralytics/models/yolo/world/train_world.py`

!!! Note

This file is available at [https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/yolo/world/train_world.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/yolo/world/train_world.py). If you spot a problem please help fix it by [contributing](https://docs.ultralytics.com/help/contributing/) a [Pull Request](https://github.com/ultralytics/ultralytics/edit/main/ultralytics/models/yolo/world/train_world.py) 🛠️. Thank you 🙏!

<br><br>

## ::: ultralytics.models.yolo.world.train_world.WorldTrainerFromScratch

<br><br>
1 change: 1 addition & 0 deletions docs/mkdocs_github_authors.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ chr043416@gmail.com: RizwanMunawar
glenn.jocher@ultralytics.com: glenn-jocher
muhammadrizwanmunawar123@gmail.com: RizwanMunawar
not.committed.yet: null
plashchynski@gmail.com: plashchynski
priytosh.revolution@live.com: priytosh-tripathi
shuizhuyuanluo@126.com: null
xinwang614@gmail.com: GreatV
4 changes: 4 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,7 @@ nav:
- datasets/detect/index.md
- Argoverse: datasets/detect/argoverse.md
- COCO: datasets/detect/coco.md
- LVIS: datasets/detect/lvis.md
- COCO8: datasets/detect/coco8.md
- GlobalWheat2020: datasets/detect/globalwheat2020.md
- Objects365: datasets/detect/objects365.md
Expand Down Expand Up @@ -492,6 +493,9 @@ nav:
- predict: reference/models/yolo/segment/predict.md
- train: reference/models/yolo/segment/train.md
- val: reference/models/yolo/segment/val.md
- world:
- train: reference/models/yolo/world/train.md
- train_world: reference/models/yolo/world/train_world.md
- nn:
- autobackend: reference/nn/autobackend.md
- modules:
Expand Down
26 changes: 26 additions & 0 deletions tests/test_python.py
Original file line number Diff line number Diff line change
Expand Up @@ -643,3 +643,29 @@ def test_yolo_world():
model = YOLO("yolov8s-world.pt") # no YOLOv8n-world model yet
model.set_classes(["tree", "window"])
model(ASSETS / "bus.jpg", conf=0.01)

# Training from yaml
model = YOLO("yolov8s-worldv2.yaml") # no YOLOv8n-world model yet
model.train(data="coco8.yaml", epochs=2, imgsz=32, cache="disk", batch=-1, close_mosaic=1, name="yolo-world")

model = YOLO("yolov8s-worldv2.pt") # no YOLOv8n-world model yet
# val
model.val(data="coco8.yaml", imgsz=32, save_txt=True, save_json=True)
# Training from pretrain
model.train(data="coco8.yaml", epochs=2, imgsz=32, cache="disk", batch=-1, close_mosaic=1, name="yolo-world")

# test WorWorldTrainerFromScratch
from ultralytics.models.yolo.world.train_world import WorldTrainerFromScratch

model = YOLO("yolov8s-worldv2.yaml") # no YOLOv8n-world model yet
data = dict(train=dict(yolo_data=["coco8.yaml"]), val=dict(yolo_data=["coco8.yaml"]))
model.train(
data=data,
epochs=2,
imgsz=32,
cache="disk",
batch=-1,
close_mosaic=1,
name="yolo-world",
trainer=WorldTrainerFromScratch,
)
2 changes: 1 addition & 1 deletion ultralytics/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Ultralytics YOLO 🚀, AGPL-3.0 license

__version__ = "8.1.38"
__version__ = "8.1.39"

from ultralytics.data.explorer.explorer import Explorer
from ultralytics.models import RTDETR, SAM, YOLO, YOLOWorld
Expand Down
Loading

0 comments on commit 645bc3a

Please sign in to comment.