Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Abnormality #14184

Open
DaCheng1823 opened this issue Jul 3, 2024 · 15 comments
Open

Training Abnormality #14184

DaCheng1823 opened this issue Jul 3, 2024 · 15 comments
Labels
bug Something isn't working

Comments

@DaCheng1823
Copy link

DaCheng1823 commented Jul 3, 2024

When I configured rt-detr, yolo printed out the model file but reported an error, reporting RuntimeError: Given groups=1, weight of size [128, 128, 3, 3], expected input[4, 256, 40, 40] to have 128 channels, but got 256 channels instead

@DaCheng1823 DaCheng1823 added the bug Something isn't working label Jul 3, 2024
@DaCheng1823
Copy link
Author

2

@DaCheng1823
Copy link
Author

Uploading 1.png…

@glenn-jocher
Copy link
Member

@DaCheng1823 hello,

Thank you for reaching out and providing the error details. It looks like there might be a mismatch in the channel dimensions during the model's forward pass.

To help us diagnose the issue more effectively, could you please provide a minimum reproducible example of your code? This will allow us to better understand the context and configuration you're using. You can find guidelines on how to create a reproducible example here.

Additionally, please ensure that you are using the latest version of the Ultralytics package and dependencies. Sometimes, issues are resolved in newer releases.

Looking forward to your response so we can assist you further!

@DaCheng1823
Copy link
Author

DaCheng1823 commented Jul 4, 2024

23
Hello, lines 4, 5, 6, and 7 of the model file seem to be abnormal. What causes this?

@glenn-jocher
Copy link
Member

Hello @DaCheng1823,

Thank you for providing the details and the screenshot. It appears there might be a configuration issue in the model file, leading to the channel mismatch error.

To help us diagnose the issue more effectively, could you please provide a minimum reproducible example of your code? This will allow us to better understand the context and configuration you're using. You can find guidelines on how to create a reproducible example here.

Additionally, please ensure that you are using the latest version of the Ultralytics package and dependencies. Sometimes, issues are resolved in newer releases.

Looking forward to your response so we can assist you further! 😊

@DaCheng1823
Copy link
Author

class ConvNormLayer(nn.Module):
def init(self, ch_in, ch_out, kernel_size, stride, act=None, padding=None, bias=False):
super().init()
self.conv = nn.Conv2d(
ch_in,
ch_out,
kernel_size,
stride,
padding=(kernel_size - 1) // 2 if padding is None else padding,
bias=bias)
self.act = act
self.norm = nn.BatchNorm2d(ch_out)
self.act = nn.Identity() if act is None else getattr(F, self.act)

def forward(self, x):
    return self.act(self.norm(self.conv(x)))

ResNet18、34

class BasicBlock(nn.Module):
expansion = 1

def __init__(self, ch_in, ch_out, stride, shortcut, act='relu', variant='b'):
    super().__init__()

    self.shortcut = shortcut

    self.act = act

    if not shortcut:
        if variant == 'd' and stride == 2:
            self.short = nn.Sequential(OrderedDict([
                ('pool', nn.AvgPool2d(2, 2, 0, ceil_mode=True)),
                ('conv', ConvNormLayer(ch_in, ch_out, 1, 1))
            ]))
        else:
            self.short = ConvNormLayer(ch_in, ch_out, 1, stride)

    self.branch2a = ConvNormLayer(ch_in, ch_out, 3, stride, act=act)
    self.branch2b = ConvNormLayer(ch_out, ch_out, 3, 1, act=None)
    self.act = nn.Identity() if act is None else getattr(F, self.act)

def forward(self, x):
    out = self.branch2a(x)
    out = self.branch2b(out)
    if self.shortcut:
        short = x
    else:
        short = self.short(x)

    out = out + short
    out = self.act(out)

    return out

class Blocks(nn.Module):
def init(self, ch_in, ch_out, block, count, stage_num, act='relu', variant='b'):
super().init()

    if block == "BasicBlock":
        block = BasicBlock
    elif block == "BottleneckBlock":
        block = BottleNeck
    else:
        return False

    self.blocks = nn.ModuleList()
    for i in range(count):
        self.blocks.append(
            block(
                ch_in,
                ch_out,
                stride=2 if i == 0 and stage_num != 2 else 1,
                shortcut=False if i == 0 else True,
                variant=variant,
                act=act)
        )

        if i == 0:
            ch_in = ch_out * block.expansion

def forward(self, x):
    out = x
    for block in self.blocks:
        out = block(out)
    return out

@DaCheng1823
Copy link
Author

image
This is my yaml,but i got a strange model

@DaCheng1823
Copy link
Author

image

@DaCheng1823
Copy link
Author

image

@DaCheng1823
Copy link
Author

image
when i dug it , It shows normal

@DaCheng1823
Copy link
Author

When I remove the 'b' from the yaml file, it works fine. Is there a bug? If I only set up the RT-DETR network structure to match the structure in the paper but not the related parameter Settings. And then I use this model to improve, can you say RT-DETR as the baseline model?
image

@DaCheng1823
Copy link
Author

This exceeds the memory of my GPU.
image

@glenn-jocher
Copy link
Member

Hello @DaCheng1823,

Thank you for providing the details and the screenshots. It looks like the issue might be related to the specific configuration in your YAML file. When you removed the 'b', it worked fine, which suggests that there might be a bug or a misconfiguration related to that parameter.

Regarding your memory issue, here are a few suggestions to help manage GPU memory usage:

  1. Reduce Batch Size: Lowering the batch size can significantly reduce memory usage.

    batch: 4  # Example of reducing batch size
  2. Image Size: Reducing the image size can also help manage memory usage.

    imgsz: 512  # Example of reducing image size
  3. Mixed Precision Training: If supported, using mixed precision training can help reduce memory usage.

    model.train(data="coco8.yaml", epochs=100, imgsz=640, amp=True)
  4. Model Pruning: Simplifying the model architecture by reducing the number of layers or channels can also help.

Regarding your question about using RT-DETR as a baseline model: If you are using the RT-DETR network structure as described in the paper but modifying the parameters or making improvements, it is still valid to refer to RT-DETR as your baseline model. Just make sure to clearly document the changes and improvements you have made in your work.

If you continue to experience issues, please provide a minimum reproducible example of your code here so we can assist you further.

Feel free to reach out if you have any more questions or need further assistance! 😊

@DaCheng1823
Copy link
Author

DaCheng1823 commented Jul 5, 2024

When passing a block parameter, it is a string value, but in self.blocks.append, block is an object. How can I change this?block is a class that inherits nn.model
image

@glenn-jocher
Copy link
Member

Hello @DaCheng1823,

Thank you for your question! It looks like you're trying to dynamically instantiate a class based on a string value. You can achieve this by using Python's getattr function to convert the string to a class reference. Here's an example of how you can modify your code:

class Blocks(nn.Module):
    def __init__(self, ch_in, ch_out, block, count, stage_num, act='relu', variant='b'):
        super().__init__()

        # Dynamically get the class from the string
        block_class = globals()[block]

        self.blocks = nn.ModuleList()
        for i in range(count):
            self.blocks.append(
                block_class(
                    ch_in,
                    ch_out,
                    stride=2 if i == 0 and stage_num != 2 else 1,
                    shortcut=False if i == 0 else True,
                    variant=variant,
                    act=act)
            )

            if i == 0:
                ch_in = ch_out * block_class.expansion

    def forward(self, x):
        out = x
        for block in self.blocks:
            out = block(out)
        return out

In this example, globals()[block] dynamically retrieves the class from the string name. Make sure that the class name provided in the block parameter matches exactly with the class name defined in your code.

If you encounter any further issues, please provide a minimum reproducible example here to help us better understand and assist you.

Feel free to reach out if you have any more questions! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants