-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected results from DotD papre #10
Comments
Hi, Thanks for your attention. It seems that there is no problem with the DotD calculation. Moreover, could you please provide the config file, so that I can help you figure out the problems? And here is a link to the official implementation for your reference. As for the mAP drop, could you please provide more details about the in-house dataset? such as the average absolute object size and the largest/smallest object size. I think the performance degradation may result from the large size variation of the dataset, and we found that the DotD may generate sub-optimal results when the dataset contains many medium and large objects (>32*32 pixels). For a substitute, we recommend our newly released NWD-RKA (also in the above link) which may better handle tiny object detection when there is large size variation. As for the AI-TOD dataset, we have attached a download link (BaiduPan) in this repo, you may need to download the xview training set and the remaining part of AI-TOD (AI-TOD_wo_xview) to generate the whole AI-TOD dataset with the end2end tools. We will update other download links (Google Drive or OneDrive) to the AI-TOD_wo_xview in a week. Please feel free to contact me if you have further issues. |
Thanks @Chasel-Tsui, Here is the config model = dict(
type='FasterRCNN',
backbone=dict(
type='mmcls.ConvNeXt',
arch='base',
out_indices=[0, 1, 2, 3],
drop_path_rate=0.4,
layer_scale_init_value=1.0,
gap_before_final_norm=False,
init_cfg=dict(
type='Pretrained',
checkpoint=
'https://download.openmmlab.com/mmclassification/v0/convnext/downstream/convnext-base_3rdparty_in21k_20220301-262fd037.pth',
prefix='backbone.')),
neck=dict(
type='FPN',
in_channels=[128, 256, 512, 1024],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=1,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1,
iou_calculator=dict(
type='DotDistOverlaps', average_size=900.0)),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=-1,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1,
iou_calculator=dict(
type='DotDistOverlaps', average_size=900.0)),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100))) The stats of the data set is as the following: Thanks again. |
Hi, my pleasure. Second, I notice that the "avg_annotation_area" of this dataset is 16761 pixels, which indicates that DotD is not a good candidate for this dataset. Although it contains some tiny objects, from a global perspective, it might not be very effective to apply tiny object detection strategies to a large object dataset (average size) to improve the mAP. To my best knowledge, the DotD strategy may not be effective for a dataset containing many large objects. The tested AI-TOD contains objects mainly on a tiny scale (smaller than 1024 pixels). If you want to use DotD, I am not sure but it might be helpful to ensemble two models, one of IoU training, and another of DotD training. |
The last solution might be trying our newly released NWD-RKA (https://github.com/Chasel-Tsui/mmdet-aitod), which might be helpful for tiny object detection when objects are of size variation. But I could not guarantee the improvement since the dataset on average seems to be a large object detection dataset as mentioned above. |
Thanks @Chasel-Tsui - I have created a new dataset in which the maximum area of a bbox is 2024 and the DotD performed better than IoU. |
Hi,
I came across your interesting paper
Dot Distance for Tiny Object Detection in Aerial Images.
I tried to implement it in mmdetection framework and tested on our in house dataset, however, I got unexpected results. Replacing the IoU metric for assigning by DotD showed a degredation of 0.8 of mAP.
I attached here the code I wrote.
I am not sure if I missed something there. Could you please check that?
Moreover, I tried to download the AI-TOD dataset, to test my implementation on it but I could not, could you please advise me how can I get the data.
Thanks,
Mohammed Jabreel
The text was updated successfully, but these errors were encountered: