Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于roi头 #29

Open
UncleNiNi opened this issue Jul 20, 2024 · 4 comments
Open

关于roi头 #29

UncleNiNi opened this issue Jul 20, 2024 · 4 comments

Comments

@UncleNiNi
Copy link

当我仅更换模型的backbone的时候,两个roi_extractor的结果出现了维度不匹配的问题,多尺度的extractor(roi_extractor[0])由于作用了mfm层,通道数变为768,但是单尺度的extractor(roi_extractor[1])结果的通道数仍然是256,其输出通道数没有发生变化,我为roi_extractor[1]设定的输出通道数和roi_extractor[0]一样是768,但是其结果并不是和预期的一样相同
bbox_feats = ss_bbox_feats + ms_bbox_feats * factor
这行代码的前半部分的通道数为256,后半部分的通道数为768
这是为什么呢

@yuhongtian17
Copy link
Owner

Base级别的模型应该是参考cfg的这一部分?可以看一下您的cfg写法吗

@UncleNiNi
Copy link
Author

这是我的配置文件,我几乎没有修改任何参数

rpn_head=dict(
        type='OrientedRPNHead',
        in_channels=256,
        feat_channels=256,
        version=angle_version,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='MidpointOffsetCoder',
            angle_range=angle_version,
            target_means=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0, 0.5, 0.5]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_skip_fpn=False,
    with_mfm=True,
    roi_head=dict(
        type='OrientedStandardRoIHeadimTED',
        bbox_roi_extractor=[dict(
            type='RotatedSingleRoIExtractor',
            roi_layer=dict(     # 多尺度
                type='RoIAlignRotated',
                out_size=7,
                sample_num=2,
                clockwise=True),
            out_channels=768,
            featmap_strides=[4, 8, 16, 32]),
                            dict(
            type='RotatedSingleRoIExtractor',
            roi_layer=dict(     # 单尺度,为什么单尺度没有改变输入特征图的通道数?
                type='RoIAlignRotated',
                out_size=7,
                sample_num=2,
                clockwise=True),
            out_channels=768,
            featmap_strides=[16])],
        bbox_head=dict(
            type='RotatedMAEBBoxHead',
            init_cfg=dict(type='Pretrained', checkpoint=pretrained),
            use_checkpoint=True,
            in_channels=768,
            img_size=224,
            patch_size=16, 
            embed_dim=512, 
            depth=8,
            num_heads=16, 
            mlp_ratio=4., 
            # reg_decoded_bbox=True,
            # 以下参数照抄Oriented RCNN ##### #####
            num_classes=16,
            bbox_coder=dict(
                type='DeltaXYWHAOBBoxCoder',
                angle_range=angle_version,
                norm_factor=None,
                edge_swap=True,
                proj_xy=True,
                target_means=(.0, .0, .0, .0, .0),
                target_stds=(0.1, 0.1, 0.2, 0.2, 0.1)),
            reg_class_agnostic=True,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))),

@yuhongtian17
Copy link
Owner

我猜想是因为您的backbone的最后一层特征图通道数是256。根据我们提供的vithivit设计,有一个last_feat变量,在cfg里置为True,使得backbone输出一个多尺度特征图元组和最后一层特征图。接下来,在rotated_imted里可以看到,多尺度特征图元组输入FPN,而最后一层特征图原封不动地保留。FPN的输出与最后一层特征图拼合,作为roi_head.forward_train()的第一个参数变量。接下来我们转到roi_head_imted,看到forward_train()的第一个参数变量x里的最后一个对象、即backbone的最后一层特征图,是ss_bbox_roi_extractor的输入,这就决定了ss_bbox_feats的通道数与backbone的最后一层特征图通道数相同。而ms_bbox_roi_extractor的输入是先经过mfm_fc的,通道数变成了768。总结起来就是:

backbone的最后一层特征图通道数 = bbox_head的输入通道数 = bbox_roi_extractor的输出通道数

这一设计其实是由imted的motivation决定的:希望从backbone到bbox_head的其中一条信息流分支(另一条信息流分支即检测器原本的FPN-RPN-RoI结构)与MAE预训练时完全一致。

@UncleNiNi
Copy link
Author

感谢您的提醒!我刚刚试了一下,确实是因为backbone的最后一层输出的通道没有对齐,我会尝试修改我的代码解决这个问题,祝您科研顺利!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants