-
-
Notifications
You must be signed in to change notification settings - Fork 16k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training on two consecutive images in the context of spatio-temporal learning #8920
Comments
👋 Hello @kristinatel, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available. For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com. RequirementsPython>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started: git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit. |
You don't need to touch EDIT: This idea doesn't work. I changed the dataloader to make it work ... |
I'm trying to do something similar. Working with more than 3 channels is something of a mess apparently. It's not just the dataloader, the 3 channel limit is sprinkled throughout the code. I've gotten as far as allowing training on >3 channels, but I can't currently get inferencing working. The hubconf.py custom loader doesn't like custom input configurations. |
I pulled down a fresh copy of the repo and moved in my custom dataloader. Adding "ch: 5" to the .yaml file did allow me to train 5 channel images pretty easily. However, later running inference with a trained model is a problem. One change I've had to make is here. Line 618 in e83b422
The code forces my 5 channel input into 3 channels, which errors out because the model is expecting a 5 channel input. |
Aha, I see it. Did you manage to solve the problem by removing this line or changing it somehow? |
In my case, I can change it to this and make it work, but I think there's a bigger problem that I'm trying to pinpoint. I don't think the model output is the correct size, which is affecting the bounding boxes it returns and the confidence scores. This needs to bring in the number of channels into the function, but I don't have time at the moment to suggest the best way to do that. |
Ok, I'm looking at this line. Line 632 in e83b422
For my task, I'm passing the system numpy arrays of [960,960,5] and 12 classes. I printed the shape of y as it comes from the model on line 632. It's returning an array of [1,25500,17], which I think could only be from a [640,640,3] input (but I could be mistaken). I'm getting the size information from #8554 |
Are you sure that is training in the number of bands/channels that you want? if you check |
I wrote a custom data loader that creates [960,960,5] arrays. It replaces |
I have changed my @mullenba: I set a print statement at this line at |
@glenn-jocher I don't know if 3 channel limit is sprinkled through the code or not ... |
@pourmand1376 Does it say it's training but you get 0 mAP after when you test? Did you check that your model is using all of the channels? Here's another point forcing 3 channels: Line 130 in f0e5a60
|
@mullenba. Yes. The model trains completely however It never reaches something acceptable. I have als had some training with mAP of 0.0001 if that helps. Also, If checked out the code that you sent, It is actually ignored later Line 162 in f0e5a60
I didn't check that the model is using all channels. How should I check that? For the record, this is my model's summary: 08:47:32 |base|pourmand@user01 yolov5 ±|test_dataloader ✗|→ python models/yolo.py --cfg models/yolov5s.yaml --batch 10 --device 0 --profile
models/yolo: cfg=models/yolov5s.yaml, batch_size=10, device=0, profile=True, line_profile=False, test=False
YOLOv5 🚀 v6.2-163-gf19d0634 Python-3.9.12 torch-1.12.1+cu102 CUDA:0 (Quadro RTX 8000, 48601MiB)
from n params module arguments
0 -1 1 8128 models.common.Conv [7, 32, 6, 2, 2]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 2 115712 models.common.C3 [128, 128, 2]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 3 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 1182720 models.common.C3 [512, 512, 1]
9 -1 1 656896 models.common.SPPF [512, 512, 5]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 229245 Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
YOLOv5s summary: 270 layers, 7239997 parameters, 7239997 gradients, 17.6 GFLOPs This is also original yolov5s.yaml summary: models/yolo: cfg=models/yolov5s.yaml, batch_size=10, device=, profile=True, line_profile=False, test=False
YOLOv5 🚀 v6.2-53-gf0e5a60 Python-3.7.13 torch-1.12.1+cu113 CUDA:0 (Tesla T4, 15110MiB)
from n params module arguments
0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 2 115712 models.common.C3 [128, 128, 2]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 3 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 1182720 models.common.C3 [512, 512, 1]
9 -1 1 656896 models.common.SPPF [512, 512, 5]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 229245 Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
YOLOv5s summary: 270 layers, 7235389 parameters, 7235389 gradients, 16.6 GFLOPs From above, it is clear that model architecture is not problematic. However there is something wrong when training this beast ... |
Ok, when you're running the model for validation, how are you initializing it? For example, here's mine.
|
Actually I haven't changed the code here. I am using standard syntax to load custom yaml model (Reference). Something like this: python train.py \
--img-size 512 \
--weights /mnt/new_ssd/projects/Anevrism/Models/pourmand/yolov5/runs/train/exp141/weights/last.pt \
--data /mnt/new_ssd/projects/Anevrism/Data/brain_cta/output_folder/database.yaml \
--hyp data/hyps/hyp.aneurisym.yaml \
--epochs 200 --batch-size $(batch) --device 0 --save-period 5 --workers 2 \
--cfg models/yolov5s.yaml \ |
You're taking a different approach to running the model than I am, so I'm not quite sure what the issue could be. Have you checked to see if it's getting the classifications correct? There could be a situation (like I'm currently dealing with) where it's detecting the classes in the image, but the bounding boxes aren't very good. |
That's right. Mine is the opposite. My model detects bboxes very good but it doesn't detect classes. |
@pourmand1376 I made 6 channel tiff files out of my image pairs, added
So as @Camilochiang said, opencv does complain. Is there a work around for this? @mullenba Could you tell us how you modified your dataloader and made the training work? I only wish to train with image pairs of 6-channels but run inference regularly on 3-channel images. |
Tiff method doesn't work. I changed the dataloader to return a [channel_count, ...] vector. You can basically do that by renaming the class DataLoader():
def getitem_pre(self, index):
yolo original get_item method.
def __getitem__(self, index):
previous_image=getitem_pre(index-1)
current_image = getitem_pre(index)
after_image = getitem_pre(index+1)
# then combine previous image, current image and after image however you want
return it |
@kristinatel I created a custom dataset similar to this. I then changed the utils.dataloaders file to call my new dataset here, where instead of calling LoadImagesAndLabels, I call my own dataset. Line 122 in 1aea74c
|
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. Access additional YOLOv5 🚀 resources:
Access additional Ultralytics ⚡ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed! Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐! |
The number of my data channels is 8. I replaced the dataset in yolov5 with a self-made dataset, and modified a few places where the error was reported because the channel was solidified to 3, but the accuracy was very low during training. After 10 epochs, the accuracy was 0, I would like to ask how you modified the data preprocessing steps. |
@xiaoche-24 I suppose the preprocessing steps will depend on the nature of your dataset. My use case was that I was trying to concatenate every 2 consecutive images in my dataset and use the annotations of the second image only. First I added the
After changing the
Since the order of my images matters I also made sure to sort my self.im_files numerically before they are loaded. And finally in train.py I set augment and shuffle to false, as well as rect to false in the val_loader. |
Basically, I took two images and stacked them together into a (5,960,960) array (I'm working with 5 channels). The big issue is that the default dataloader isn't terribly well documented and isn't clear when it's changing array format (sometimes it bounces between (channels, height, width) and (height, width, channels). I found that if you aren't careful, you can sometimes pass in an array with the wrong shape and it will still run. Also, if you had to make any changes to the portion of the dataloader that pulls labels and bounding boxes, make sure you have that in the right format too. The code bounces between (x1,y1,x2,y2) and (x,y,w,h) in different places, so if you have the wrong format you'll end up with the zero accuracy issue too. |
@kristinatel I successfully ran through the 6-channel training code based on your method, but an error was reported during inference. How can I modify the detect.py? |
To run inference you will need to also modify
That is if you will give detect.py 2 images at a time, if you will give more images make sure to sort the images before they are loaded. Regarding detect.py, I believe it will give an error at |
May I ask you what method you used to generate this array, and how this 5-channel array and images and labels were called during training respectively? |
你的数据集是什么格式的呀? |
@wq247726404 嗨!首先感謝你的詳細描述。YOLOv5 加載和處理圖像和標籤的方式可能會因不同的數據集結構而有所不同。我之前的求解方法是將兩個圖像堆疊成一個(5,960,960)的數組,並在模型中對應修改了加載器的函數,這裡我使用了自製的數據集,並在訓練過程中分別調用了圖像和標籤。我們官方文件中有一些關於數據集格式的描述,你可以參考一下 https://docs.ultralytics.com/yolov5/training/data/。希望對你有所幫助! |
Search before asking
Question
Hi! I am trying to train a model with two consecutive images (frames) rather than one, i.e. a tensor of size WxHx6 rather than WxHx3, and use the label file of the most recent frame. I learned from previous issues that while I can create as many channels as I want in the model yaml, the dataloaders are constricted to 3 channels. I am not sure where to begin with modifying the dataloaders, do you perhaps have some tips or are able to point me in the right direction?
Thank you!
Additional
No response
The text was updated successfully, but these errors were encountered: