Training Abnormality #14184

DaCheng1823 · 2024-07-03T10:45:02Z

When I configured rt-detr, yolo printed out the model file but reported an error, reporting RuntimeError: Given groups=1, weight of size [128, 128, 3, 3], expected input[4, 256, 40, 40] to have 128 channels, but got 256 channels instead

DaCheng1823 · 2024-07-03T10:46:27Z

DaCheng1823 · 2024-07-03T10:49:08Z

glenn-jocher · 2024-07-03T16:10:32Z

@DaCheng1823 hello,

Thank you for reaching out and providing the error details. It looks like there might be a mismatch in the channel dimensions during the model's forward pass.

To help us diagnose the issue more effectively, could you please provide a minimum reproducible example of your code? This will allow us to better understand the context and configuration you're using. You can find guidelines on how to create a reproducible example here.

Additionally, please ensure that you are using the latest version of the Ultralytics package and dependencies. Sometimes, issues are resolved in newer releases.

Looking forward to your response so we can assist you further!

DaCheng1823 · 2024-07-04T02:55:33Z

Hello, lines 4, 5, 6, and 7 of the model file seem to be abnormal. What causes this?

glenn-jocher · 2024-07-04T08:33:41Z

Hello @DaCheng1823,

Thank you for providing the details and the screenshot. It appears there might be a configuration issue in the model file, leading to the channel mismatch error.

To help us diagnose the issue more effectively, could you please provide a minimum reproducible example of your code? This will allow us to better understand the context and configuration you're using. You can find guidelines on how to create a reproducible example here.

Additionally, please ensure that you are using the latest version of the Ultralytics package and dependencies. Sometimes, issues are resolved in newer releases.

Looking forward to your response so we can assist you further! 😊

DaCheng1823 · 2024-07-04T13:13:57Z

class ConvNormLayer(nn.Module):
def init(self, ch_in, ch_out, kernel_size, stride, act=None, padding=None, bias=False):
super().init()
self.conv = nn.Conv2d(
ch_in,
ch_out,
kernel_size,
stride,
padding=(kernel_size - 1) // 2 if padding is None else padding,
bias=bias)
self.act = act
self.norm = nn.BatchNorm2d(ch_out)
self.act = nn.Identity() if act is None else getattr(F, self.act)

def forward(self, x):
    return self.act(self.norm(self.conv(x)))

ResNet18、34

class BasicBlock(nn.Module):
expansion = 1

def __init__(self, ch_in, ch_out, stride, shortcut, act='relu', variant='b'):
    super().__init__()

    self.shortcut = shortcut

    self.act = act

    if not shortcut:
        if variant == 'd' and stride == 2:
            self.short = nn.Sequential(OrderedDict([
                ('pool', nn.AvgPool2d(2, 2, 0, ceil_mode=True)),
                ('conv', ConvNormLayer(ch_in, ch_out, 1, 1))
            ]))
        else:
            self.short = ConvNormLayer(ch_in, ch_out, 1, stride)

    self.branch2a = ConvNormLayer(ch_in, ch_out, 3, stride, act=act)
    self.branch2b = ConvNormLayer(ch_out, ch_out, 3, 1, act=None)
    self.act = nn.Identity() if act is None else getattr(F, self.act)

def forward(self, x):
    out = self.branch2a(x)
    out = self.branch2b(out)
    if self.shortcut:
        short = x
    else:
        short = self.short(x)

    out = out + short
    out = self.act(out)

    return out

class Blocks(nn.Module):
def init(self, ch_in, ch_out, block, count, stage_num, act='relu', variant='b'):
super().init()

    if block == "BasicBlock":
        block = BasicBlock
    elif block == "BottleneckBlock":
        block = BottleNeck
    else:
        return False

    self.blocks = nn.ModuleList()
    for i in range(count):
        self.blocks.append(
            block(
                ch_in,
                ch_out,
                stride=2 if i == 0 and stage_num != 2 else 1,
                shortcut=False if i == 0 else True,
                variant=variant,
                act=act)
        )

        if i == 0:
            ch_in = ch_out * block.expansion

def forward(self, x):
    out = x
    for block in self.blocks:
        out = block(out)
    return out

DaCheng1823 · 2024-07-04T13:16:20Z

This is my yaml，but i got a strange model

DaCheng1823 · 2024-07-04T13:16:33Z

DaCheng1823 · 2024-07-04T13:18:38Z

DaCheng1823 · 2024-07-04T13:19:22Z

when i dug it , It shows normal

DaCheng1823 · 2024-07-04T14:11:42Z

When I remove the 'b' from the yaml file, it works fine. Is there a bug? If I only set up the RT-DETR network structure to match the structure in the paper but not the related parameter Settings. And then I use this model to improve, can you say RT-DETR as the baseline model?

DaCheng1823 · 2024-07-04T14:19:35Z

This exceeds the memory of my GPU.

glenn-jocher · 2024-07-04T21:22:12Z

Hello @DaCheng1823,

Thank you for providing the details and the screenshots. It looks like the issue might be related to the specific configuration in your YAML file. When you removed the 'b', it worked fine, which suggests that there might be a bug or a misconfiguration related to that parameter.

Regarding your memory issue, here are a few suggestions to help manage GPU memory usage:

Reduce Batch Size: Lowering the batch size can significantly reduce memory usage.
```
batch: 4  # Example of reducing batch size
```
Image Size: Reducing the image size can also help manage memory usage.
```
imgsz: 512  # Example of reducing image size
```
Mixed Precision Training: If supported, using mixed precision training can help reduce memory usage.
```
model.train(data="coco8.yaml", epochs=100, imgsz=640, amp=True)
```
Model Pruning: Simplifying the model architecture by reducing the number of layers or channels can also help.

Regarding your question about using RT-DETR as a baseline model: If you are using the RT-DETR network structure as described in the paper but modifying the parameters or making improvements, it is still valid to refer to RT-DETR as your baseline model. Just make sure to clearly document the changes and improvements you have made in your work.

If you continue to experience issues, please provide a minimum reproducible example of your code here so we can assist you further.

Feel free to reach out if you have any more questions or need further assistance! 😊

DaCheng1823 · 2024-07-05T13:18:01Z

When passing a block parameter, it is a string value, but in self.blocks.append, block is an object. How can I change this?block is a class that inherits nn.model

glenn-jocher · 2024-07-05T22:09:08Z

Hello @DaCheng1823,

Thank you for your question! It looks like you're trying to dynamically instantiate a class based on a string value. You can achieve this by using Python's getattr function to convert the string to a class reference. Here's an example of how you can modify your code:

class Blocks(nn.Module):
    def __init__(self, ch_in, ch_out, block, count, stage_num, act='relu', variant='b'):
        super().__init__()

        # Dynamically get the class from the string
        block_class = globals()[block]

        self.blocks = nn.ModuleList()
        for i in range(count):
            self.blocks.append(
                block_class(
                    ch_in,
                    ch_out,
                    stride=2 if i == 0 and stage_num != 2 else 1,
                    shortcut=False if i == 0 else True,
                    variant=variant,
                    act=act)
            )

            if i == 0:
                ch_in = ch_out * block_class.expansion

    def forward(self, x):
        out = x
        for block in self.blocks:
            out = block(out)
        return out

In this example, globals()[block] dynamically retrieves the class from the string name. Make sure that the class name provided in the block parameter matches exactly with the class name defined in your code.

If you encounter any further issues, please provide a minimum reproducible example here to help us better understand and assist you.

Feel free to reach out if you have any more questions! 😊

DaCheng1823 added the bug Something isn't working label Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Abnormality #14184

Training Abnormality #14184

DaCheng1823 commented Jul 3, 2024 •

edited

Loading

DaCheng1823 commented Jul 3, 2024

DaCheng1823 commented Jul 3, 2024

glenn-jocher commented Jul 3, 2024

DaCheng1823 commented Jul 4, 2024 •

edited

Loading

glenn-jocher commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

glenn-jocher commented Jul 4, 2024

DaCheng1823 commented Jul 5, 2024 •

edited

Loading

glenn-jocher commented Jul 5, 2024

Training Abnormality #14184

Training Abnormality #14184

Comments

DaCheng1823 commented Jul 3, 2024 • edited Loading

DaCheng1823 commented Jul 3, 2024

DaCheng1823 commented Jul 3, 2024

glenn-jocher commented Jul 3, 2024

DaCheng1823 commented Jul 4, 2024 • edited Loading

glenn-jocher commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

ResNet18、34

DaCheng1823 commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

DaCheng1823 commented Jul 4, 2024

glenn-jocher commented Jul 4, 2024

DaCheng1823 commented Jul 5, 2024 • edited Loading

glenn-jocher commented Jul 5, 2024

DaCheng1823 commented Jul 3, 2024 •

edited

Loading

DaCheng1823 commented Jul 4, 2024 •

edited

Loading

DaCheng1823 commented Jul 5, 2024 •

edited

Loading