Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting coordinates instead of drawing a box #69

Open
psi43 opened this issue Jun 15, 2018 · 126 comments
Open

Extracting coordinates instead of drawing a box #69

psi43 opened this issue Jun 15, 2018 · 126 comments

Comments

@psi43
Copy link

psi43 commented Jun 15, 2018

EDIT: Nevermind, got it to work!

Hey, first off, great tutorial, thank you so much.

I got it to run on ubuntu 16.04 as well with ease but I have a problem. I'm running on a CLI Ubuntu server, so instead of using an image as output, I'd just like to have the coordinates of the boxes.

I looked into the Object_detection_image.py and found where the boxes are being drawn, but it uses a function named visualize_boxes_and_labels_on_image_array to draw them.
If I try to ouput the np.squeeze(boxes), it returns this:

[[0.5897823  0.35585764 0.87036747 0.5124078 ]
 [0.6508235  0.13419046 0.85757935 0.2114587 ]
 [0.64070517 0.14992228 0.8580698  0.23488007]
 ...
 [0.         0.         0.         0.        ]
 [0.         0.         0.         0.        ]
 [0.         0.         0.         0.        ]]

Is there a way to just get the coordinates from that?

Thank you for your time!

EDIT:
Okay, I added a new function to the visualization_utils.py that returns the "ymin, ymax, xmin, xmax" variables, used in other functions of that file to draw the boxes.
The problem is, they look like this:
[[0.5897822976112366, 0.8703674674034119, 0.35585764050483704, 0.5124077796936035], [0.6508234739303589, 0.8575793504714966, 0.13419045507907867, 0.2114586979150772]]
I was expecting coordinates. These seem like percentages.

EDIT:
Okay, I got it to work.

@alvinxiii
Copy link

I'm facing the same problem. Do you have a solution for this? Mind to share?

@psi43
Copy link
Author

psi43 commented Aug 5, 2018

I found a solution. I'll share it on here tomorrow, when I'm at work (don't have the solution at home).

@psi43
Copy link
Author

psi43 commented Aug 6, 2018

add this to the utils/visualization_utils.py

def return_coordinates(
    image,
    boxes,
    classes,
    scores,
    category_index,
    instance_masks=None,
    instance_boundaries=None,
    keypoints=None,
    use_normalized_coordinates=False,
    max_boxes_to_draw=20,
    min_score_thresh=.5,
    agnostic_mode=False,
    line_thickness=4,
    groundtruth_box_visualization_color='black',
    skip_scores=False,
    skip_labels=False):
  # Create a display string (and color) for every box location, group any boxes
  # that correspond to the same location.
  box_to_display_str_map = collections.defaultdict(list)
  box_to_color_map = collections.defaultdict(str)
  box_to_instance_masks_map = {}
  box_to_instance_boundaries_map = {}
  box_to_score_map = {}
  box_to_keypoints_map = collections.defaultdict(list)
  if not max_boxes_to_draw:
    max_boxes_to_draw = boxes.shape[0]
  for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    if scores is None or scores[i] > min_score_thresh:
      box = tuple(boxes[i].tolist())
      if instance_masks is not None:
        box_to_instance_masks_map[box] = instance_masks[i]
      if instance_boundaries is not None:
        box_to_instance_boundaries_map[box] = instance_boundaries[i]
      if keypoints is not None:
        box_to_keypoints_map[box].extend(keypoints[i])
      if scores is None:
        box_to_color_map[box] = groundtruth_box_visualization_color
      else:
        display_str = ''
        if not skip_labels:
          if not agnostic_mode:
            if classes[i] in category_index.keys():
              class_name = category_index[classes[i]]['name']
            else:
              class_name = 'N/A'
            display_str = str(class_name)
        if not skip_scores:
          if not display_str:
            display_str = '{}%'.format(int(100*scores[i]))
          else:
            display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
        box_to_display_str_map[box].append(display_str)
        box_to_score_map[box] = scores[i]
        if agnostic_mode:
          box_to_color_map[box] = 'DarkOrange'
        else:
          box_to_color_map[box] = STANDARD_COLORS[
              classes[i] % len(STANDARD_COLORS)]

  # Draw all boxes onto image.
  coordinates_list = []
  counter_for = 0
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    height, width, channels = image.shape
    ymin = int(ymin*height)
    ymax = int(ymax*height)
    xmin = int(xmin*width)
    xmax = int(xmax*width)
    coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100)])
    counter_for = counter_for + 1

  return coordinates_list

add this to Object_detection_dir.py

coordinates = vis_util.return_coordinates(
                        image,
                        np.squeeze(boxes),
                        np.squeeze(classes).astype(np.int32),
                        np.squeeze(scores),
                        category_index,
                        use_normalized_coordinates=True,
                        line_thickness=8,
                        min_score_thresh=0.80)

as well as this:

textfile = open("json/"+filename_string+".json", "a")
                    textfile.write(json.dumps(coordinates))
                    textfile.write("\n")

I think this should be all.

@PraveenNellihela
Copy link

This was very helpful. thank you so much.
If anyone needs to access each coordinate separately change the 3rd last line in the newly added code to utils/visualization_utils.py which is

coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100)])

into
coordinates_list=[ymin, ymax, xmin, xmax, (box_to_score_map[box]*100)]

and you can access each ymin, ymax , xmin, xmax values separately using ymin=coordinate_list[0] etc. in your object detection file.

@iqrammm
Copy link

iqrammm commented Jul 6, 2019

@PraveenNellihela I have been getting an error of IndexError: list index out of range
Below I attached the codings,maybe I missed one of your points,Do kindly assist.
error
Uploading error.png…

coordinates = vis_util.return_coordinates(
frame,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8,
min_score_thresh=0.85)
ymin=int(coordinates[0])
ymax=int(coordinates[1])
xmin=int(coordinates[2])
xmax=int(coordinates[3])

@PraveenNellihela
Copy link

PraveenNellihela commented Jul 6, 2019

@PraveenNellihela I have been getting an error of IndexError: list index out of range
Below I attached the codings,maybe I missed one of your points,Do kindly assist.

coordinates = vis_util.return_coordinates(
frame,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8,
min_score_thresh=0.85)
ymin=int(coordinates[0])
ymax=int(coordinates[1])
xmin=int(coordinates[2])
xmax=int(coordinates[3])

Try using some form of error handling such as "try and except". You are most likely getting this error when there are no detections. I think I got the same issue when I was doing this, when the object I was trying to detect went out of the video frame. I used "try and except" to ignore the values when there werent any so it didnt produce an error. Hope this helps.

@iqrammm
Copy link

iqrammm commented Jul 6, 2019

@PraveenNellihela Thank you for the suggestion,it worked flawlessly. Best of luck in life.

@KwonJoo
Copy link

KwonJoo commented Jul 8, 2019

@iqrammm Can you share the code that you wrote? like how to make "try and except" in this situation
thanks a lot :)

@iqrammm
Copy link

iqrammm commented Jul 8, 2019 via email

@KwonJoo
Copy link

KwonJoo commented Jul 9, 2019 via email

@iqrammm
Copy link

iqrammm commented Jul 10, 2019 via email

@iqrammm
Copy link

iqrammm commented Jul 10, 2019

@PraveenNellihela Do you by any chance know how to return the percentage scores as well? I added to return scores but cant seem to get it to work.

@psi43
Copy link
Author

psi43 commented Jul 10, 2019

@PraveenNellihela Do you by any chance know how to return the percentage scores as well? I added to return scores but cant seem to get it to work.

Hey, the code I originally posted should already return the class that was detected and the accuracy in percent. You can get all 6 values from the return like this:

coordinates = vis_util.return_coordinates(
        image,
        np.squeeze(boxes),
        np.squeeze(classes).astype(np.int32),
        np.squeeze(scores),
        category_index,
        use_normalized_coordinates=True,
        line_thickness=8,
        min_score_thresh=0.80)

for coordinate in coordinates:
            print(coordinate)
            (y1, y2, x1, x2, accuracy, classification) = coordinate

With "accuracy" being the value you are looking for and "classification" the ID you associated with your object's class.

EDIT:
I forgot that I edited my code to include the classification after I posted the initial code here. If you want the classification ID to be returned with the coordinates, change the last few lines of your return_coordinates function to the following:

  coordinates_list = []
  counter_for = 0
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    height, width, channels = image.shape
    ymin = int(ymin*height)
    ymax = int(ymax*height)
    xmin = int(xmin*width)
    xmax = int(xmax*width)
    coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100), int(class_name)])
    counter_for = counter_for + 1

  return coordinates_list

int(class_name) should contain the ID of the detected object.

EDIT 2:
By the way, if anyone wants to not just extract the coordinates, but also crop the image, I recently implemented that into my program and it's just a few lines of code:

for coordinate in coordinates:
            (y1, y2, x1, x2, acc, classification) = coordinate
            height = y2-y1
            width = x2-x1
            crop = image[y1:y1+height, x1:x1+width]
            cv2.imwrite("[PATH TO WHERE THE CROP SHOULD BE SAVED]")

@psi43
Copy link
Author

psi43 commented Jul 10, 2019

@iqrammm in your case, with the changes you made to the return_coordinates function, it would be something like this:

        coordinates = vis_util.return_coordinates(
                         frame,
                         np.squeeze(boxes),
                         np.squeeze(classes).astype(np.int32),
                         np.squeeze(scores),
                         category_index,
                         use_normalized_coordinates=True,
                         line_thickness=10,
                         min_score_thresh=0.85)
        ymin=int(coordinates[0])
        ymax=int(coordinates[1])
        xmin=int(coordinates[2])
        xmax=int(coordinates[3])
        accuracy = float(coordinates[4])

I haven't tested the code above, but theoretically it should work like that, since the return_coordinates function returns a list with ymin, ymax, xmin, xmax, accuracy. The (box_to_score_map[box]*100) is the accuracy in percentage.

@SKY24
Copy link

SKY24 commented Jul 17, 2019

for coordinate in coordinates:
print(coordinate)
(y1, y2, x1, x2, accuracy, classification) = coordinate

I believe this not complete since class_name will return the last class that was assigned to the variable.

@anzy0621
Copy link

anzy0621 commented Jul 17, 2019

UPDATE: I figured out how to write the values to a text file by making some modifications in the last few lines of the object_detection.py file! Thank you

@psi43 I'm facing issues with these lines (writing the coordinates into the text file)
textfile = open("json/"+filename_string+".json", "a")
textfile.write(json.dumps(coordinates))
textfile.write("\n")

does the filename_string simply refer to an empty text file that I should create to save the outputs?

Thank you in advance, I really appreciate it!
error

@psi43
Copy link
Author

psi43 commented Jul 20, 2019

for coordinate in coordinates:
print(coordinate)
(y1, y2, x1, x2, accuracy, classification) = coordinate

I believe this not complete since class_name will return the last class that was assigned to the variable.

I'll try to look into this. I use this code daily and have heavily modified it since (my example does work for me, but I might have missed a change). So me looking it up a few weeks ago might be very different to what I posted months ago.

UPDATE: I figured out how to write the values to a text file by making some modifications in the last few lines of the object_detection.py file! Thank you

@psi43 I'm facing issues with these lines (writing the coordinates into the text file)
textfile = open("json/"+filename_string+".json", "a")
textfile.write(json.dumps(coordinates))
textfile.write("\n")

does the filename_string simply refer to an empty text file that I should create to save the outputs?

Thank you in advance, I really appreciate it!
error

filename_string is a variable I used to write the coordinates to a json file. I was very new to python and was frustrated that building the filename out of strings and integers didn't work, so I had to do that and then cast the variable to a string. Hence the name.
If you don't use a mix of ints and strings for your filename and just want to save it to one file, just define it as filename_string = "test" or something and the json should be saved as "test.json" in the /json/ directory.

@SKY24
Copy link

SKY24 commented Jul 22, 2019

I got it working. The piece that you have modified for your use case it is open source ?

@psi43
Copy link
Author

psi43 commented Jul 23, 2019

I got it working. The piece that you have modified for your use case it is open source ?

Sadly no. Even if I planned on making it open source, I'd probably have to make a lot of changes, due to dumb variable names like "filename_string" and general bad code.
It works, but I would feel bad having other look at it, thinking "what an idiot". Modifying this project was one of my first few encounters with python :/

Glad you got yours to work though! If you (or anyone else) has any more questions, I'll definitely try and answer them as best I can.

@Lhogeshwaran
Copy link

Lhogeshwaran commented Aug 13, 2019

I got it working. The piece that you have modified for your use case it is open source ?

@SKY24 Can you please advise on how to fix the issue you pointed out earlier? 'class_name will return the last class that was assigned to the variable'

@SKY24
Copy link

SKY24 commented Aug 16, 2019

I got it working. The piece that you have modified for your use case it is open source ?

@SKY24 Can you please advise on how to fix the issue you pointed out earlier? 'class_name will return the last class that was assigned to the variable'

make the following change sand it should work

`if agnostic_mode:
box_to_class_map[box] = classes[i]
else:
box_to_class_map[box] = classes[i]

  # Draw all boxes onto image.
  coordinates_list = []
  counter_for = 0
  for box, class_name in box_to_class_map.items():
      ymin, xmin, ymax, xmax = box
      height, width, channels = image.shape
      ymin = int(ymin*height)
      ymax = int(ymax*height)
      xmin = int(xmin*width)
      xmax = int(xmax*width)
      data = {}
      data['ymin'] = ymin
      data['ymax'] = ymax
      data['xmin'] = xmin
      data['xmax'] = xmax
      data['confidence'] = (box_to_score_map[box]*100)
      data['className'] = int(class_name)
      coordinates_list.append(data)
      counter_for = counter_for + 1

  return coordinates_list`

@SinaMojtahedi
Copy link

It`s an amazing thread for my work also.

I have another question regarding ### object_detection_image.py###.

After finishing the training session, I want to read a folder of image files and do detection issues on them but instead of showing them, the results are saved as image filename and detection score in a file (.csv maybe).

Could anyone help me with my problem?

Best,

@hiteshreddy95
Copy link

Hello @psi43 ,

There is no Object_detection_dir.py, so where do i have to add this code?
"add this to Object_detection_dir.py

coordinates = vis_util.return_coordinates(
image,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8,
min_score_thresh=0.80)
as well as this:

textfile = open("json/"+filename_string+".json", "a")
textfile.write(json.dumps(coordinates))
textfile.write("\n")"
Kindly awaiting for your reply

@akshay-bahulikar
Copy link

Hello @psi43 ,
I used the below code to crop the image> It works very well. ->

for coordinate in coordinates:
(y1, y2, x1, x2, acc, classification) = coordinate
height = y2-y1
width = x2-x1
crop = image[y1:y1+height, x1:x1+width]
cv2.imwrite("[PATH TO WHERE THE CROP SHOULD BE SAVED]")

But there are multiple images of same label that i want to crop.
The above code works for cropping only single image.
please could you help in cropping and saving multiple images from a single image ?

Thanks and Regards. :)

@psi43
Copy link
Author

psi43 commented Oct 23, 2019

@SinaMojtahedi In the example I gave for cropping, where I loop through the coordinates, you could replace that with writing the coordinates and other info into a .csv file. That was actually the natural progression of my project as well, instead of cropping one image, I now go through a directory of images and extract all the coordinates and save them in a .json file with the same name as the image.
Just google for something like "python3 how to make csv files". Since you already have the data, all you need now is the saving part.

@hiteshreddy95 Sorry, I made an Object_detection_dir.py by basically putting a huge for-loop around the Object_detection_image.py so it would cycle through an entire directory of images.
You can just add it to the Object_detection_image.py.

@akshay-bahulikar With that for-loop, it should crop out every object. Keep in mind, that you would have to implement a counter or something so the filename changes every time. Like crop_1.jpg, crop_2.jpg, etc. If you just name it crop.jpg, it will only be the last object recognized, because it would keep replacing crop.jpg with every iteration.

@MousaAlnajjar
Copy link

Hi
Does the first solution for extracting the coordinates delete the boxes drawn about the objects?

I'm new at the python language and TF tool so sorry for my dump question 🙋

@psi43
Copy link
Author

psi43 commented Nov 5, 2019

@MousaAlnajjar I believe it does, I can't remember for sure. But I seem to remember needing to remove them because they were saved as part of the image, not entirely sure though.

If you do want the boxes, just look at the lines below:
# Draw all boxes onto image.
in the utils/visualization_utils.py
If you added my method into that file, you should have that "Draw all boxes onto image" line twice in the file. One for the coordination extraction and one for drawing the boxes.
Play around with that and see if you can get it to do both.

@MousaAlnajjar
Copy link

I'll try it, but I want to ask you another question
i can't find the file "object_detection_dir" in the models to add the code above on it
and i noticed that it wasn't used neither in the detection code and "utils/visualization_utils.py"

@ManafMukred
Copy link

ManafMukred commented Nov 6, 2019

@psi43 There is a weird contradiction I came across when there is multi object detection ... I'm doing rock, paper, scissor detection ... so if I use vis_util.return_coordinates to return classes .. it will return 2 different coordinates .. but it will print the same class (which is wrong) ..
but when it comes to using the drawing functionality in vis_util .. it will draw the 2 boxes, but each label is different (which is true)
looks like the function doesn't take more than one class per frame
note: ignore the false detection as I've trained on few images
this is the detection:
image

and this is the classification:

[231, 404, 352, 616, 99.99584555625916, 'scissor']
[159, 424, 33, 216, 92.08916425704956, 'scissor']

representing y1, y2, x1, x2, accuracy, classification respectively

@jix1710
Copy link

jix1710 commented Jun 21, 2020

I just solved the issue ... looks like the label is not updated in the for loop .. so if there are multiple labels in the same frame, it will return the latest one only ... I've edited the last few lines in vis_util.return_coordinates function to be like this:

# Draw all boxes onto image.
  coordinates_list = []
  counter_for = 0
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    height, width, channels = image.shape
    ymin = int(ymin*height)
    ymax = int(ymax*height)
    xmin = int(xmin*width)
    xmax = int(xmax*width)
    coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100),display_strs[counter_for]])
    counter_for = counter_for + 1

  return coordinates_list 

[[483.25], [681.0], ['VO: 99%'], 'V']
[[427.75], [491.0], ['TO: 96%'], 'O']
how i find the label full name??
bc the not come VO is come to outpot only V ?
wacth may output in last value?

@jix1710
Copy link

jix1710 commented Jun 21, 2020 via email

@jix1710
Copy link

jix1710 commented Jun 21, 2020 via email

@Sn0wl3r0ker
Copy link

Sn0wl3r0ker commented Jun 21, 2020

add this to the utils/visualization_utils.py

def return_coordinates(
    image,
    boxes,
    classes,
    scores,
    category_index,
    instance_masks=None,
    instance_boundaries=None,
    keypoints=None,
    use_normalized_coordinates=False,
    max_boxes_to_draw=20,
    min_score_thresh=.5,
    agnostic_mode=False,
    line_thickness=4,
    groundtruth_box_visualization_color='black',
    skip_scores=False,
    skip_labels=False):
  # Create a display string (and color) for every box location, group any boxes
  # that correspond to the same location.
  box_to_display_str_map = collections.defaultdict(list)
  box_to_color_map = collections.defaultdict(str)
  box_to_instance_masks_map = {}
  box_to_instance_boundaries_map = {}
  box_to_score_map = {}
  box_to_keypoints_map = collections.defaultdict(list)
  if not max_boxes_to_draw:
    max_boxes_to_draw = boxes.shape[0]
  for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    if scores is None or scores[i] > min_score_thresh:
      box = tuple(boxes[i].tolist())
      if instance_masks is not None:
        box_to_instance_masks_map[box] = instance_masks[i]
      if instance_boundaries is not None:
        box_to_instance_boundaries_map[box] = instance_boundaries[i]
      if keypoints is not None:
        box_to_keypoints_map[box].extend(keypoints[i])
      if scores is None:
        box_to_color_map[box] = groundtruth_box_visualization_color
      else:
        display_str = ''
        if not skip_labels:
          if not agnostic_mode:
            if classes[i] in category_index.keys():
              class_name = category_index[classes[i]]['name']
            else:
              class_name = 'N/A'
            display_str = str(class_name)
        if not skip_scores:
          if not display_str:
            display_str = '{}%'.format(int(100*scores[i]))
          else:
            display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
        box_to_display_str_map[box].append(display_str)
        box_to_score_map[box] = scores[i]
        if agnostic_mode:
          box_to_color_map[box] = 'DarkOrange'
        else:
          box_to_color_map[box] = STANDARD_COLORS[
              classes[i] % len(STANDARD_COLORS)]

  # Draw all boxes onto image.
  coordinates_list = []
  counter_for = 0
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    height, width, channels = image.shape
    ymin = int(ymin*height)
    ymax = int(ymax*height)
    xmin = int(xmin*width)
    xmax = int(xmax*width)
    coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100)])
    counter_for = counter_for + 1

  return coordinates_list

add this to Object_detection_dir.py

coordinates = vis_util.return_coordinates(
                        image,
                        np.squeeze(boxes),
                        np.squeeze(classes).astype(np.int32),
                        np.squeeze(scores),
                        category_index,
                        use_normalized_coordinates=True,
                        line_thickness=8,
                        min_score_thresh=0.80)

as well as this:

textfile = open("json/"+filename_string+".json", "a")
                    textfile.write(json.dumps(coordinates))
                    textfile.write("\n")

I think this should be all.
Hi! This is not a issue.I add this simple loop to make the code detect all .JPG file the the testimg folder and export the .json file, img with boxes in json, json/testimg folder. Hope this can help someone who wants it.... P.S. i'm still a noob, so hope u can give me some advise

1.import some models:

import glob
import re

2.markdown default IMAGE PATH:
"# IMAGE_NAME ="
"# PATH_TO_IMAGE = os.path.join(CWD_PATH,IMAGE_NAME)"

3.And heres the change:

for filename in sorted(glob.glob("testimg/"+ "*.JPG")):
	print("\n")
	print("Start parsing "+filename+"...")
	image = cv2.imread(filename)
	image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
	image_expanded = np.expand_dims(image_rgb, axis=0)
	filenameJ = re.sub(r".*?\/", "", filename)
	open("json/"+filenameJ+".json", "a").write(json.dumps(coordinates)+"\n"

	for coordinate in coordinates:
	     (y1, y2, x1, x2, acc, classification) = coordinate
	     height = y2-y1
	     width = x2-x1
	     crop = image[y1:y1+height, x1:x1+width]
	     cv2.imwrite("json/"+filename, image)
	     print(coordinate)

@jix1710
Copy link

jix1710 commented Jun 22, 2020

I just solved the issue ... looks like the label is not updated in the for loop .. so if there are multiple labels in the same frame, it will return the latest one only ... I've edited the last few lines in vis_util.return_coordinates function to be like this:

# Draw all boxes onto image.
  coordinates_list = []
  counter_for = 0
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    height, width, channels = image.shape
    ymin = int(ymin*height)
    ymax = int(ymax*height)
    xmin = int(xmin*width)
    xmax = int(xmax*width)
    coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100),display_strs[counter_for]])
    counter_for = counter_for + 1

  return coordinates_list 

@psi43 There is a weird contradiction I came across when there is multi object detection ... I'm doing rock, paper, scissor detection ... so if I use vis_util.return_coordinates to return classes .. it will return 2 different coordinates .. but it will print the same class (which is wrong) ..
but when it comes to using the drawing functionality in vis_util .. it will draw the 2 boxes, but each label is different (which is true)
looks like the function doesn't take more than one class per frame
note: ignore the false detection as I've trained on few images
this is the detection:
image

and this is the classification:

[231, 404, 352, 616, 99.99584555625916, 'scissor']
[159, 424, 33, 216, 92.08916425704956, 'scissor']

representing y1, y2, x1, x2, accuracy, classification respectively

hii my out IMAGENAME BOXCENTRLPOINT Y-MP C-A
0 cap_000 [[616.0] [821.0] 'VO: 87%']
this is my return coordinates . but i have to 'vo' in a separate column and accrue '87%' is separate column.
how is it possible?
How is change in visualization.until code?

@psi43
Copy link
Author

psi43 commented Jun 22, 2020

@jix1710 I'm sorry, but I don't understand what you're trying to ask :(.

@jix1710
Copy link

jix1710 commented Jun 22, 2020

@jix1710 I'm sorry, but I don't understand what you're trying to ask :(.
'VO: 87%' this coordinates to separate just ex== 'vo' ,'87%' ihave this type output

@jix1710
Copy link

jix1710 commented Jun 25, 2020

add this to the utils/visualization_utils.py

def return_coordinates(
    image,
    boxes,
    classes,
    scores,
    category_index,
    instance_masks=None,
    instance_boundaries=None,
    keypoints=None,
    use_normalized_coordinates=False,
    max_boxes_to_draw=20,
    min_score_thresh=.5,
    agnostic_mode=False,
    line_thickness=4,
    groundtruth_box_visualization_color='black',
    skip_scores=False,
    skip_labels=False):
  # Create a display string (and color) for every box location, group any boxes
  # that correspond to the same location.
  box_to_display_str_map = collections.defaultdict(list)
  box_to_color_map = collections.defaultdict(str)
  box_to_instance_masks_map = {}
  box_to_instance_boundaries_map = {}
  box_to_score_map = {}
  box_to_keypoints_map = collections.defaultdict(list)
  if not max_boxes_to_draw:
    max_boxes_to_draw = boxes.shape[0]
  for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    if scores is None or scores[i] > min_score_thresh:
      box = tuple(boxes[i].tolist())
      if instance_masks is not None:
        box_to_instance_masks_map[box] = instance_masks[i]
      if instance_boundaries is not None:
        box_to_instance_boundaries_map[box] = instance_boundaries[i]
      if keypoints is not None:
        box_to_keypoints_map[box].extend(keypoints[i])
      if scores is None:
        box_to_color_map[box] = groundtruth_box_visualization_color
      else:
        display_str = ''
        if not skip_labels:
          if not agnostic_mode:
            if classes[i] in category_index.keys():
              class_name = category_index[classes[i]]['name']
            else:
              class_name = 'N/A'
            display_str = str(class_name)
        if not skip_scores:
          if not display_str:
            display_str = '{}%'.format(int(100*scores[i]))
          else:
            display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
        box_to_display_str_map[box].append(display_str)
        box_to_score_map[box] = scores[i]
        if agnostic_mode:
          box_to_color_map[box] = 'DarkOrange'
        else:
          box_to_color_map[box] = STANDARD_COLORS[
              classes[i] % len(STANDARD_COLORS)]

  # Draw all boxes onto image.
  coordinates_list = []
  counter_for = 0
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    height, width, channels = image.shape
    ymin = int(ymin*height)
    ymax = int(ymax*height)
    xmin = int(xmin*width)
    xmax = int(xmax*width)
    coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100)])
    counter_for = counter_for + 1

  return coordinates_list

add this to Object_detection_dir.py

coordinates = vis_util.return_coordinates(
                        image,
                        np.squeeze(boxes),
                        np.squeeze(classes).astype(np.int32),
                        np.squeeze(scores),
                        category_index,
                        use_normalized_coordinates=True,
                        line_thickness=8,
                        min_score_thresh=0.80)

as well as this:

textfile = open("json/"+filename_string+".json", "a")
                    textfile.write(json.dumps(coordinates))
                    textfile.write("\n")

I think this should be all.

how is possible to find the object detection output data is right ya wrong?

@jix1710
Copy link

jix1710 commented Jul 13, 2020

How to call all Images open at once time ?
in object detection modle

@Sn0wl3r0ker
Copy link

Sn0wl3r0ker commented Jul 15, 2020

How to call all Images open at once time ?
in object detection modle

Hey man! i don't know if it's right.
But i think u can try my "for loop" method to call all images in the target folder.

add this to the utils/visualization_utils.py

def return_coordinates(
    image,
    boxes,
    classes,
    scores,
    category_index,
    instance_masks=None,
    instance_boundaries=None,
    keypoints=None,
    use_normalized_coordinates=False,
    max_boxes_to_draw=20,
    min_score_thresh=.5,
    agnostic_mode=False,
    line_thickness=4,
    groundtruth_box_visualization_color='black',
    skip_scores=False,
    skip_labels=False):
  # Create a display string (and color) for every box location, group any boxes
  # that correspond to the same location.
  box_to_display_str_map = collections.defaultdict(list)
  box_to_color_map = collections.defaultdict(str)
  box_to_instance_masks_map = {}
  box_to_instance_boundaries_map = {}
  box_to_score_map = {}
  box_to_keypoints_map = collections.defaultdict(list)
  if not max_boxes_to_draw:
    max_boxes_to_draw = boxes.shape[0]
  for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    if scores is None or scores[i] > min_score_thresh:
      box = tuple(boxes[i].tolist())
      if instance_masks is not None:
        box_to_instance_masks_map[box] = instance_masks[i]
      if instance_boundaries is not None:
        box_to_instance_boundaries_map[box] = instance_boundaries[i]
      if keypoints is not None:
        box_to_keypoints_map[box].extend(keypoints[i])
      if scores is None:
        box_to_color_map[box] = groundtruth_box_visualization_color
      else:
        display_str = ''
        if not skip_labels:
          if not agnostic_mode:
            if classes[i] in category_index.keys():
              class_name = category_index[classes[i]]['name']
            else:
              class_name = 'N/A'
            display_str = str(class_name)
        if not skip_scores:
          if not display_str:
            display_str = '{}%'.format(int(100*scores[i]))
          else:
            display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
        box_to_display_str_map[box].append(display_str)
        box_to_score_map[box] = scores[i]
        if agnostic_mode:
          box_to_color_map[box] = 'DarkOrange'
        else:
          box_to_color_map[box] = STANDARD_COLORS[
              classes[i] % len(STANDARD_COLORS)]

  # Draw all boxes onto image.
  coordinates_list = []
  counter_for = 0
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    height, width, channels = image.shape
    ymin = int(ymin*height)
    ymax = int(ymax*height)
    xmin = int(xmin*width)
    xmax = int(xmax*width)
    coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100)])
    counter_for = counter_for + 1

  return coordinates_list

add this to Object_detection_dir.py

coordinates = vis_util.return_coordinates(
                        image,
                        np.squeeze(boxes),
                        np.squeeze(classes).astype(np.int32),
                        np.squeeze(scores),
                        category_index,
                        use_normalized_coordinates=True,
                        line_thickness=8,
                        min_score_thresh=0.80)

as well as this:

textfile = open("json/"+filename_string+".json", "a")
                    textfile.write(json.dumps(coordinates))
                    textfile.write("\n")

I think this should be all.
Hi! This is not a issue.I add this simple loop to make the code detect all .JPG file the the testimg folder and export the .json file, img with boxes in json, json/testimg folder. Hope this can help someone who wants it.... P.S. i'm still a noob, so hope u can give me some advise

1.import some models:

import glob
import re

2.markdown default IMAGE PATH:
"# IMAGE_NAME ="
"# PATH_TO_IMAGE = os.path.join(CWD_PATH,IMAGE_NAME)"

3.And heres the change:

for filename in sorted(glob.glob("testimg/"+ "*.JPG")):
	print("\n")
	print("Start parsing "+filename+"...")
	image = cv2.imread(filename)
	image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
	image_expanded = np.expand_dims(image_rgb, axis=0)
	filenameJ = re.sub(r".*?\/", "", filename)
	open("json/"+filenameJ+".json", "a").write(json.dumps(coordinates)+"\n"

	for coordinate in coordinates:
	     (y1, y2, x1, x2, acc, classification) = coordinate
	     height = y2-y1
	     width = x2-x1
	     crop = image[y1:y1+height, x1:x1+width]
	     cv2.imwrite("json/"+filename, image)
	     print(coordinate)

@jix1710
Copy link

jix1710 commented Jul 15, 2020 via email

@jix1710
Copy link

jix1710 commented Jul 15, 2020 via email

@jix1710
Copy link

jix1710 commented Aug 8, 2020

Hi everyone,
I have some problems, basically i am making a project for smart refrigerator where i use object detection to know what's inside the fridge, i use two classes for a start, bottle and can, I use 1 bottle and 1 can as sample, thankfully it can classify bottle and can in the frame/image but my prob is that for the code below:

coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100), str(class_name)

for the class name when it show this:

[358, 705, 990, 1256, 99.93228912353516, 'Bottle'] [341, 708, 284, 469, 99.69232678413391, 'Bottle']

I think it overwrite the class for can, if i remove the bottle out from the frame/image, then only the 'Can' shows at the class_name

I attached an image for the bottle and can detection
Thank you in advance
result

how to solve this problem? because of same problem with me? any solution

@jix1710
Copy link

jix1710 commented Aug 8, 2020

I just solved the issue ... looks like the label is not updated in the for loop .. so if there are multiple labels in the same frame, it will return the latest one only ... I've edited the last few lines in vis_util.return_coordinates function to be like this:

# Draw all boxes onto image.
  coordinates_list = []
  counter_for = 0
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    height, width, channels = image.shape
    ymin = int(ymin*height)
    ymax = int(ymax*height)
    xmin = int(xmin*width)
    xmax = int(xmax*width)
    coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100),display_strs[counter_for]])
    counter_for = counter_for + 1

  return coordinates_list 

how to solve this problem ? i have use this code ([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100),display_strs[counter_for]]) but output coordinates same class name not to multi class find

@jix1710
Copy link

jix1710 commented Aug 17, 2020 via email

@jix1710
Copy link

jix1710 commented Aug 17, 2020 via email

@jix1710
Copy link

jix1710 commented Aug 18, 2020

I just solved the issue ... looks like the label is not updated in the for loop .. so if there are multiple labels in the same frame, it will return the latest one only ... I've edited the last few lines in vis_util.return_coordinates function to be like this:

# Draw all boxes onto image.
  coordinates_list = []
  counter_for = 0
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    height, width, channels = image.shape
    ymin = int(ymin*height)
    ymax = int(ymax*height)
    xmin = int(xmin*width)
    xmax = int(xmax*width)
    coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100),display_strs[counter_for]])
    counter_for = counter_for + 1

  return coordinates_list 

i have same code for u to use me but class name is same for evertime ?
plz help me......
i have use to coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100),display_str[counter_for]])

@Raman1121
Copy link

Hello all
I added the piece of code suggested by @psi43 but I am getting this error.
AttributeError: module 'object_detection.utils.visualization_utils' has no attribute 'return_coordinates'

Can anyone help me with this? I am adding the code in the right location in the visualization_utils file.

@jix1710
Copy link

jix1710 commented Oct 8, 2020 via email

@Aakanksha3010
Copy link

I ran the OpenPose on Colab and also got JSON files of the video which contains the key points however I'm confused as to how to use these JSON files in order to calculate rep counts and other pose evaluation techniques. Please help me out !

this is my code link: https://colab.research.google.com/drive/1tl0NvOSGLzP0vEpKLjlYEHVtEuJs77it?usp=sharing

@vinilreddy36
Copy link

I just solved the issue ... looks like the label is not updated in the for loop .. so if there are multiple labels in the same frame, it will return the latest one only ... I've edited the last few lines in vis_util.return_coordinates function to be like this:

# Draw all boxes onto image.
  coordinates_list = []
  counter_for = 0
  for box, color in box_to_color_map.items():
    ymin, xmin, ymax, xmax = box
    height, width, channels = image.shape
    ymin = int(ymin*height)
    ymax = int(ymax*height)
    xmin = int(xmin*width)
    xmax = int(xmax*width)
    coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100),display_strs[counter_for]])
    counter_for = counter_for + 1

  return coordinates_list 

i have same code for u to use me but class name is same for evertime ?
plz help me......
i have use to coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100),display_str[counter_for]])

add this, it will solve ur problem.

#Draw all boxes onto image.
 coordinates_list = []
 counter_for = 0
 for box, color in box_to_color_map.items():
   ymin, xmin, ymax, xmax = box
   height, width, channels = image.shape
   ymin = int(ymin*height)
   ymax = int(ymax*height)
   xmin = int(xmin*width)
   xmax = int(xmax*width)
   if classes[counter_for] in category_index.keys():
     class_name = category_index[classes[counter_for]]['name']
   else:
     class_name = 'N/A'
   coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100), str(class_name)])
   counter_for = counter_for + 1
 return coordinates_list

@unrivalle
Copy link

coordinates = vis_util.return_coordinates(

it is not not working could you please share the latest code .
I am getting AttributeError: module 'object_detection.utils.visualization_utils' has no attribute 'return_coordinates'.

@vinilreddy36
Copy link

coordinates = vis_util.return_coordinates(

it is not not working could you please share the latest code .
I am getting AttributeError: module 'object_detection.utils.visualization_utils' has no attribute 'return_coordinates'.

did u add this to your utils/visualization_utils.py?

def return_coordinates(
   image,
   boxes,
   classes,
   scores,
   category_index,
   instance_masks=None,
   instance_boundaries=None,
   keypoints=None,
   use_normalized_coordinates=False,
   max_boxes_to_draw=20,
   min_score_thresh=.5,
   agnostic_mode=False,
   line_thickness=4,
   groundtruth_box_visualization_color='black',
   skip_scores=False,
   skip_labels=False):
 # Create a display string (and color) for every box location, group any boxes
 # that correspond to the same location.
 box_to_display_str_map = collections.defaultdict(list)
 box_to_color_map = collections.defaultdict(str)
 box_to_instance_masks_map = {}
 box_to_instance_boundaries_map = {}
 box_to_score_map = {}
 box_to_keypoints_map = collections.defaultdict(list)
 if not max_boxes_to_draw:
   max_boxes_to_draw = boxes.shape[0]
 for i in range(min(max_boxes_to_draw, boxes.shape[0])):
   if scores is None or scores[i] > min_score_thresh:
     box = tuple(boxes[i].tolist())
     if instance_masks is not None:
       box_to_instance_masks_map[box] = instance_masks[i]
     if instance_boundaries is not None:
       box_to_instance_boundaries_map[box] = instance_boundaries[i]
     if keypoints is not None:
       box_to_keypoints_map[box].extend(keypoints[i])
     if scores is None:
       box_to_color_map[box] = groundtruth_box_visualization_color
     else:
       display_str = ''
       if not skip_labels:
         if not agnostic_mode:
           if classes[i] in category_index.keys():
             class_name = category_index[classes[i]]['name']
           else:
             class_name = 'N/A'
           display_str = str(class_name)
       if not skip_scores:
         if not display_str:
           display_str = '{}%'.format(int(100*scores[i]))
         else:
           display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
       box_to_display_str_map[box].append(display_str)
       box_to_score_map[box] = scores[i]
       if agnostic_mode:
         box_to_color_map[box] = 'DarkOrange'
       else:
         box_to_color_map[box] = STANDARD_COLORS[
             classes[i] % len(STANDARD_COLORS)]

 # Draw all boxes onto image.
 coordinates_list = []
 counter_for = 0
 for box, color in box_to_color_map.items():
   ymin, xmin, ymax, xmax = box
   height, width, channels = image.shape
   ymin = int(ymin*height)
   ymax = int(ymax*height)
   xmin = int(xmin*width)
   xmax = int(xmax*width)
   if classes[counter_for] in category_index.keys():
    class_name = category_index[classes[counter_for]]['name']
  else:
    class_name = 'N/A'
  coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100), str(class_name)])
  counter_for = counter_for + 1
return coordinates_list

@unrivalle
Copy link

unrivalle commented Feb 24, 2021 via email

@unrivalle
Copy link

unrivalle commented Feb 24, 2021 via email

@unrivalle
Copy link

unrivalle commented Mar 3, 2021 via email

@ghost
Copy link

ghost commented May 29, 2021

I read this thread and it's quite interesting to crop bounding boxes after training of tensorflow object detection API.
Can anyone please put all the chunks at one place that worked for you, step by step. It would look like a tutorial and new readers will be able to easily implement.
Thanks.

@wael-mahdi
Copy link

hiii...friends ..I am working on a project(computer vision-video tracking-multi objects) i need help with how I can put unique id and fixed for each object in frames I know detected contours and draw box rectangles for each object but I cannot fix id for each one.
second, my order if anyone have a dataset called DUKE MTMC I need it in my project thanx a lot for all

@ghost
Copy link

ghost commented Dec 25, 2021

You do not need to add any additional programs (return_coordinates) to determine the objects coordinates. just use dnn.NMSBoxes command in opencv as follow:

# This code return objects coordination in a variable as matrix (signs) and their labels (labels):

idxs = cv2.dnn.NMSBoxes(boxes, scores, 0.5, 1.5)

# define a array as matrix
signs = []
labels = []
for i in range(len(idxs)):
        signs.append(i)
        labels.append(i)

# ensure at least one detection exists
if len(idxs) > 0:
    # loop over the indexes we are keeping
    for i in idxs.flatten():
        # extract the bounding box coordinates
        ymin = int((boxes[0][i][0] * height))
        xmin = int((boxes[0][i][1] * width))
        ymax = int((boxes[0][i][2] * height))
        xmax = int((boxes[0][i][3] * width))
        signs[i] = [ymin,ymax,xmin,xmax]
        labels[i] = int(classes[0][i])

print(signs)
print(labels)

@Mehran970 🙏🙏🙏 Thanks a lot. It worked for multiple objects in a frame.
You made my day.
Thank you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests