Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on two consecutive images in the context of spatio-temporal learning #8920

Closed
1 task done
kristinatel opened this issue Aug 10, 2022 · 30 comments
Closed
1 task done
Labels
question Further information is requested Stale

Comments

@kristinatel
Copy link

Search before asking

Question

Hi! I am trying to train a model with two consecutive images (frames) rather than one, i.e. a tensor of size WxHx6 rather than WxHx3, and use the label file of the most recent frame. I learned from previous issues that while I can create as many channels as I want in the model yaml, the dataloaders are constricted to 3 channels. I am not sure where to begin with modifying the dataloaders, do you perhaps have some tips or are able to point me in the right direction?

Thank you!

Additional

No response

@kristinatel kristinatel added the question Further information is requested label Aug 10, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Aug 10, 2022

👋 Hello @kristinatel, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

@pourmand1376
Copy link
Contributor

pourmand1376 commented Aug 10, 2022

You don't need to touch dataloaders. Just create a simple Python script that iterates over your images and creates 6-channel tiff files out of each pair of images. I suppose that shouldn't require changing YOLO and should work just fine.

EDIT: This idea doesn't work. I changed the dataloader to make it work ...

@mullenba
Copy link

I'm trying to do something similar. Working with more than 3 channels is something of a mess apparently. It's not just the dataloader, the 3 channel limit is sprinkled throughout the code.

I've gotten as far as allowing training on >3 channels, but I can't currently get inferencing working. The hubconf.py custom loader doesn't like custom input configurations.

@pourmand1376
Copy link
Contributor

Have you seen #1739? This comment claims to resolve the issue.

I also have a project which needs multi-channel input. I will test it soon.

@mullenba
Copy link

I pulled down a fresh copy of the repo and moved in my custom dataloader. Adding "ch: 5" to the .yaml file did allow me to train 5 channel images pretty easily. However, later running inference with a trained model is a problem.

One change I've had to make is here.

im = im[..., :3] if im.ndim == 3 else np.tile(im[..., None], 3) # enforce 3ch input

The code forces my 5 channel input into 3 channels, which errors out because the model is expecting a 5 channel input.

@pourmand1376
Copy link
Contributor

Aha, I see it. Did you manage to solve the problem by removing this line or changing it somehow?

@mullenba
Copy link

In my case, I can change it to this and make it work, but I think there's a bigger problem that I'm trying to pinpoint. I don't think the model output is the correct size, which is affecting the bounding boxes it returns and the confidence scores.

This needs to bring in the number of channels into the function, but I don't have time at the moment to suggest the best way to do that.
im = im[..., :5] if im.ndim == 3 else np.tile(im[..., None], 3)

@mullenba
Copy link

Ok, I'm looking at this line.

y = self.model(x, augment, profile) # forward

For my task, I'm passing the system numpy arrays of [960,960,5] and 12 classes. I printed the shape of y as it comes from the model on line 632. It's returning an array of [1,25500,17], which I think could only be from a [640,640,3] input (but I could be mistaken).

I'm getting the size information from #8554

@Camilochiang
Copy link

Camilochiang commented Aug 19, 2022

I'm trying to do something similar. Working with more than 3 channels is something of a mess apparently. It's not just the dataloader, the 3 channel limit is sprinkled throughout the code.

I've gotten as far as allowing training on >3 channels, but I can't currently get inferencing working. The hubconf.py custom loader doesn't like custom input configurations.

Are you sure that is training in the number of bands/channels that you want? if you check LoadImagesAndLabels you will see that load_image is using cv2.imread() for reading the images, therefore it may be ignoring the additional channels. You need to modify the function to make it work properly. Im actually surprise that cv2 is not complaining with the images. which format are you passing?

@mullenba
Copy link

I wrote a custom data loader that creates [960,960,5] arrays. It replaces LoadImagesAndLabels.

@pourmand1376
Copy link
Contributor

pourmand1376 commented Aug 22, 2022

I have changed my __getitem__ function in LoadImagesAndLabels to return a [7,512,512] vector. Also, I have set ch:7 in yolov5s.yaml file. However, I get 0 mAP when I do validation. Is there anything I should do to make this work?

@mullenba: I set a print statement at this line at common.py but it seems that it never reaches this point.

@pourmand1376
Copy link
Contributor

@glenn-jocher
Do you have any idea? I've been trying this thing for days and still not a single clue how to train this model successfully. Is there any tips for training models with more than one channel?

I don't know if 3 channel limit is sprinkled through the code or not ...

@mullenba
Copy link

@pourmand1376 Does it say it's training but you get 0 mAP after when you test? Did you check that your model is using all of the channels? Here's another point forcing 3 channels:

yolov5/train.py

Line 130 in f0e5a60

model = Model(cfg, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # create

@pourmand1376
Copy link
Contributor

pourmand1376 commented Aug 24, 2022

@mullenba. Yes. The model trains completely however It never reaches something acceptable. I have als had some training with mAP of 0.0001 if that helps.

Also, If checked out the code that you sent, It is actually ignored later DetectionModel in:

ch = self.yaml['ch'] = self.yaml.get('ch', ch) # input channels

I didn't check that the model is using all channels. How should I check that?

For the record, this is my model's summary:

08:47:32  |base|pourmand@user01 yolov5 ±|test_dataloader ✗|→ python models/yolo.py --cfg models/yolov5s.yaml --batch 10 --device 0 --profile
models/yolo: cfg=models/yolov5s.yaml, batch_size=10, device=0, profile=True, line_profile=False, test=False
YOLOv5 🚀 v6.2-163-gf19d0634 Python-3.9.12 torch-1.12.1+cu102 CUDA:0 (Quadro RTX 8000, 48601MiB)


                 from  n    params  module                                  arguments
  0                -1  1      8128  models.common.Conv                      [7, 32, 6, 2, 2]
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]
  2                -1  1     18816  models.common.C3                        [64, 64, 1]
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]
  4                -1  2    115712  models.common.C3                        [128, 128, 2]
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]
  6                -1  3    625152  models.common.C3                        [256, 256, 3]
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]
  8                -1  1   1182720  models.common.C3                        [512, 512, 1]
  9                -1  1    656896  models.common.SPPF                      [512, 512, 5]
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 12           [-1, 6]  1         0  models.common.Concat                    [1]
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 16           [-1, 4]  1         0  models.common.Concat                    [1]
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]
 19          [-1, 14]  1         0  models.common.Concat                    [1]
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]
 22          [-1, 10]  1         0  models.common.Concat                    [1]
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]
 24      [17, 20, 23]  1    229245  Detect                                  [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
YOLOv5s summary: 270 layers, 7239997 parameters, 7239997 gradients, 17.6 GFLOPs

This is also original yolov5s.yaml summary:

models/yolo: cfg=models/yolov5s.yaml, batch_size=10, device=, profile=True, line_profile=False, test=False
YOLOv5 🚀 v6.2-53-gf0e5a60 Python-3.7.13 torch-1.12.1+cu113 CUDA:0 (Tesla T4, 15110MiB)


                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Conv                      [3, 32, 6, 2, 2]              
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  2    115712  models.common.C3                        [128, 128, 2]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  3    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1   1182720  models.common.C3                        [512, 512, 1]                 
  9                -1  1    656896  models.common.SPPF                      [512, 512, 5]                 
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  Detect                                  [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
YOLOv5s summary: 270 layers, 7235389 parameters, 7235389 gradients, 16.6 GFLOPs

From above, it is clear that model architecture is not problematic. However there is something wrong when training this beast ...

@mullenba
Copy link

Ok, when you're running the model for validation, how are you initializing it?

For example, here's mine.

model = torch.hub.load('yolov5',  # Folder with my customized repo
                           'custom',  # This affects A LOT
                           path=YOLO_WEIGHTS,  # Path to saved weights
                           source='local',
                           autoshape=True,
                           channels=5,
                           classes=12).to("cuda:0")

@pourmand1376
Copy link
Contributor

Actually I haven't changed the code here. I am using standard syntax to load custom yaml model (Reference).

Something like this:

	python train.py \
		--img-size 512 \
		--weights /mnt/new_ssd/projects/Anevrism/Models/pourmand/yolov5/runs/train/exp141/weights/last.pt \
		--data /mnt/new_ssd/projects/Anevrism/Data/brain_cta/output_folder/database.yaml \
		--hyp data/hyps/hyp.aneurisym.yaml \
		--epochs 200 --batch-size $(batch) --device 0 --save-period 5 --workers 2 \
		--cfg models/yolov5s.yaml \

@mullenba
Copy link

You're taking a different approach to running the model than I am, so I'm not quite sure what the issue could be.

Have you checked to see if it's getting the classifications correct? There could be a situation (like I'm currently dealing with) where it's detecting the classes in the image, but the bounding boxes aren't very good.

@pourmand1376
Copy link
Contributor

That's right. Mine is the opposite. My model detects bboxes very good but it doesn't detect classes.

@kristinatel
Copy link
Author

kristinatel commented Aug 31, 2022

@pourmand1376 I made 6 channel tiff files out of my image pairs, added ch: 6 in the model yaml, and upon training I get,

>it/s]imdecode_(''): can't read header: OpenCV(4.6.0) /io/opencv/modules/imgcodecs/src/grfmt_tiff.cpp:152: error:
>(-2:Unspecified error) in function 'int cv::TiffDecoder::normalizeChannelsNumber(int) const'

> Unsupported number of channels:
>     'channels >= 1 && channels <= 4'
> where
>     'channels' is 6

So as @Camilochiang said, opencv does complain. Is there a work around for this?

@mullenba Could you tell us how you modified your dataloader and made the training work? I only wish to train with image pairs of 6-channels but run inference regularly on 3-channel images.

@pourmand1376
Copy link
Contributor

pourmand1376 commented Sep 1, 2022

Tiff method doesn't work. I changed the dataloader to return a [channel_count, ...] vector.

You can basically do that by renaming the __getitem__ method to getitem_pre and write your own __getitem__ which returns your vector.

class DataLoader():

  def getitem_pre(self, index):
      yolo original get_item method. 
  
  def __getitem__(self, index):
      previous_image=getitem_pre(index-1)
      current_image = getitem_pre(index)
      after_image = getitem_pre(index+1)
      # then combine previous image, current image and after image however you want
      return it

@mullenba
Copy link

mullenba commented Sep 5, 2022

@kristinatel I created a custom dataset similar to this.

I then changed the utils.dataloaders file to call my new dataset here, where instead of calling LoadImagesAndLabels, I call my own dataset.

dataset = LoadImagesAndLabels(

@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2022

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@github-actions github-actions bot added the Stale label Oct 6, 2022
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
@xiaoche-24
Copy link

The number of my data channels is 8. I replaced the dataset in yolov5 with a self-made dataset, and modified a few places where the error was reported because the channel was solidified to 3, but the accuracy was very low during training. After 10 epochs, the accuracy was 0, I would like to ask how you modified the data preprocessing steps.

@kristinatel
Copy link
Author

@xiaoche-24 I suppose the preprocessing steps will depend on the nature of your dataset. My use case was that I was trying to concatenate every 2 consecutive images in my dataset and use the annotations of the second image only. First I added the ch: 6 argument in my model yaml.
I changed the default def __getitem__ in dataloaders.py to def getitem_pre, and created a new def __getitem__ as follows:

def __getitem__(self, index):
       first_frame, _, first_frame_name, _ = LoadImagesAndLabels.getitem_pre(self, 2 * index)
       second_frame, labels_out, second_frame_name, shapes = LoadImagesAndLabels.getitem_pre(self, 2 * index + 1)
       img_pair = torch.cat((first_frame, second_frame), 0)
       return img_pair, labels_out, second_frame_name, shapes

After changing the def __len__ of the LoadImagesAndLabels class to be half as follows as each loop will essentially load 2 of my images at a time:

def __len__(self):
    return len(self.im_files) // 2

Since the order of my images matters I also made sure to sort my self.im_files numerically before they are loaded. And finally in train.py I set augment and shuffle to false, as well as rect to false in the val_loader.
By running train.py my model was able to learn normally.

@mullenba
Copy link

mullenba commented Dec 7, 2022

The number of my data channels is 8. I replaced the dataset in yolov5 with a self-made dataset, and modified a few places where the error was reported because the channel was solidified to 3, but the accuracy was very low during training. After 10 epochs, the accuracy was 0, I would like to ask how you modified the data preprocessing steps.

Basically, I took two images and stacked them together into a (5,960,960) array (I'm working with 5 channels). The big issue is that the default dataloader isn't terribly well documented and isn't clear when it's changing array format (sometimes it bounces between (channels, height, width) and (height, width, channels). I found that if you aren't careful, you can sometimes pass in an array with the wrong shape and it will still run.

Also, if you had to make any changes to the portion of the dataloader that pulls labels and bounding boxes, make sure you have that in the right format too. The code bounces between (x1,y1,x2,y2) and (x,y,w,h) in different places, so if you have the wrong format you'll end up with the zero accuracy issue too.

@xiaoche-24
Copy link

@kristinatel I successfully ran through the 6-channel training code based on your method, but an error was reported during inference. How can I modify the detect.py?

@kristinatel
Copy link
Author

@kristinatel I successfully ran through the 6-channel training code based on your method, but an error was reported during inference. How can I modify the detect.py?

To run inference you will need to also modify class LoadImages in dataloaders.py, where images are read one by one. It needs to be changed to read two images at once;

else:
            # Read image
            #frame1
            self.count += 1
            im01 = cv2.imread(path)  # BGR
            ...
            #frame2
            self.count += 1
            im02 = cv2.imread(path)  # BGR
            ...
.
.
.
else:
            im1 = letterbox(im01, self.img_size, stride=self.stride, auto=self.auto)[0]  # padded resize
            im1 = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
            im1 = np.ascontiguousarray(im)  # contiguous
            # Do the same for im2
            im = np.concatenate((im1, im2), axis=0)
return path, im, im02, self.cap, s2    
#returns concatenated im but im2 filename as predictions will be made on im2

That is if you will give detect.py 2 images at a time, if you will give more images make sure to sort the images before they are loaded.

Regarding detect.py, I believe it will give an error at model.warmup(imgsz=(..., 3,...)) , so you can either change the channels to 6 or skip the warmup entirely.
Hope this helps.

@wq247726404
Copy link

I wrote a custom data loader that creates [960,960,5] arrays. It replaces LoadImagesAndLabels.

May I ask you what method you used to generate this array, and how this 5-channel array and images and labels were called during training respectively?

@wq247726404
Copy link

The number of my data channels is 8. I replaced the dataset in yolov5 with a self-made dataset, and modified a few places where the error was reported because the channel was solidified to 3, but the accuracy was very low during training. After 10 epochs, the accuracy was 0, I would like to ask how you modified the data preprocessing steps.

The number of my data channels is 8. I replaced the dataset in yolov5 with a self-made dataset, and modified a few places where the error was reported because the channel was solidified to 3, but the accuracy was very low during training. After 10 epochs, the accuracy was 0, I would like to ask how you modified the data preprocessing steps.

你的数据集是什么格式的呀?

@glenn-jocher
Copy link
Member

@wq247726404 嗨!首先感謝你的詳細描述。YOLOv5 加載和處理圖像和標籤的方式可能會因不同的數據集結構而有所不同。我之前的求解方法是將兩個圖像堆疊成一個(5,960,960)的數組,並在模型中對應修改了加載器的函數,這裡我使用了自製的數據集,並在訓練過程中分別調用了圖像和標籤。我們官方文件中有一些關於數據集格式的描述,你可以參考一下 https://docs.ultralytics.com/yolov5/training/data/。希望對你有所幫助!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

7 participants