-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A minor query about the image channel number check using im.shape[0] < 5
#13029
Comments
Hello! Thanks for your detailed question and for diving deep into the YOLOv5 code! 🌟 The line The condition Your suggestion I hope this clears up the confusion! Let me know if you have any more questions. Happy coding! 😊 |
Thank you very much for your response! @glenn-jocher Actually, I didn't think of the RGBA image format, and your explanation has given me inspiration. I have another small question. When I use the default training parameters ( At this point, the conditional statement in the code # Pre-process
n, ims = (len(ims), list(ims)) if isinstance(ims, (list, tuple)) else (1, [ims]) # number, list of images
shape0, shape1, files = [], [], [] # image and inference shapes, filenames
for i, im in enumerate(ims):
f = f"image{i}" # filename
if isinstance(im, (str, Path)): # filename or uri
im, f = Image.open(requests.get(im, stream=True).raw if str(im).startswith("http") else im), im
im = np.asarray(exif_transpose(im))
elif isinstance(im, Image.Image): # PIL Image
im, f = np.asarray(exif_transpose(im)), getattr(im, "filename", f) or f
files.append(Path(f).with_suffix(".jpg").name)
# if im.shape[0] < 5: # image in CHW
if im.ndim < 5: # 💡 This is the modification/change.
im = im.transpose((1, 2, 0)) # reverse dataloader .transpose(2, 0, 1)
im = im[..., :3] if im.ndim == 3 else cv2.cvtColor(im, cv2.COLOR_GRAY2BGR) # enforce 3ch input
s = im.shape[:2] # HWC
shape0.append(s) # image shape
g = max(size) / max(s) # gain
shape1.append([int(y * g) for y in s])
ims[i] = im if im.data.contiguous else np.ascontiguousarray(im) # update
shape1 = [make_divisible(x, self.stride) for x in np.array(shape1).max(0)] # inf shape
x = [letterbox(im, shape1, auto=False)[0] for im in ims] # pad
x = np.ascontiguousarray(np.array(x).transpose((0, 3, 1, 2))) # stack and BHWC to BCHW
x = torch.from_numpy(x).to(p.device).type_as(p) / 255 # uint8 to fp16/32 Thank you very much for your patience and response! |
Hello again! I appreciate your follow-up question and the code snippet you've provided. The suggestion to use The original intent of If you're consistently finding that For now, the existing check should suffice in most scenarios, but if you're encountering specific issues with image formats, you might need to add additional checks or transformations based on your particular use case or dataset. Thank you for your keen observations, and feel free to reach out if you have more questions! 😊 |
@glenn-jocher Thank you very much for your reply. If we directly use To be honest, the method you have written is really great and can be applied to the majority of datasets. I suggest adding a comment after this code segment, as without any explanation, others might also find it confusing. Overall, thank you very much for your reply! 😊 |
@Le0v1n hello! Thank you for your understanding and for the suggestion to add a comment for clarity. It's a great idea to help others who might be reviewing the code in the future. I'll pass this feedback along to the team to consider adding a descriptive comment in the next update. We appreciate your engagement and thoughtful suggestions! If you have any more ideas or questions, feel free to share. Happy coding! 😊 |
Search before asking
Question
Today, I attempted to observe the operation process of YOLOv5 step by step. While reviewing the
check_amp(model)
function, I had some minor doubts. The specific code is in theAutoShape
class within the forward method of themodels/common.py
file:I reviewed the code using debug mode, with the IDE being VSCode, and the DEBUG command as follows:
In this process, the image used is
'../yolov5/data/images/bus.jpg'
(which is the default), and I'm not sure about the purpose of the code annotated with 💡. The shape of the imageim
at this time is(1080, 810, 3)
, so the result ofim.shape[0] < 5
is False. I'm not sure if your team members wanted to check the number of channels when writing the code, but I would still be very confused even if it was changed toim.shape[-1] < 5
. In my opinion, the code should beim.shape[0] <= 3
.I'm a bit unsure about your original intention here. If you have time, could you please answer my question? Thank you very much! 🤗
Additional
This is not urgent; I hope it doesn't interrupt your regular work. 🥰
The text was updated successfully, but these errors were encountered: