MaskRCNN Inference #884

kunwar31 · 2023-05-31T20:11:43Z

So far I've created base classes based on reference implementation (thanks @wozeparrot ), and I'm able to load the weights @geohot

https://github.com/mlcommons/training/tree/master/object_detection/pytorch/maskrcnn_benchmark

TODO:

Marcelo5444 · 2023-05-31T20:43:27Z

I started the same project today but you are head of me. Maybe you need to drop the last fc layer of the backbone right?

wozeparrot · 2023-05-31T20:53:33Z

models/mask_rcnn.py

+      module = make_conv3x3(next_feature, layer_features,
+                            dilation=dilation, stride=1, use_gn=use_gn
+                            )
+      exec(f"self.{layer_name} = module")


this is kinda cursed, you should be able to change the name during the weight loading process.

@wozeparrot yes, a lot of things are cursed right now, will be taking this up one by one, while I'm adding the calls

kunwar31 · 2023-05-31T21:04:46Z

I started the same project today but you are head of me. Maybe you need to drop the last fc layer of the backbone right?

yes, they can be removed, but I won't be using them anyway in the forward call

tinyb0t · 2023-06-01T17:25:50Z

Changes made in tinygrad/:

------------------------------------------------------------
files                             insertions       deletions
------------------------------------------------------------
tinygrad/tensor.py                         2               1
------------------------------------------------------------
lines added in the tinygrad folder: 1

kunwar31 · 2023-06-02T03:29:55Z

@geohot So there are still some torch functions which need to be removed, but here's an example output

kunwar31 · 2023-06-02T03:30:47Z

Reference output for the same image

kunwar31 · 2023-06-02T03:34:21Z

I'm aware that the results aren't exactly the same, this is because the resnet block output doesn't exactly match reference implementation, it matches with atol=1e-3
If I use resnet output from reference, and everything else from my implementation, results match exactly end to end

kunwar31 · 2023-06-02T11:46:57Z

confidence_threshold=0.6

Bbox outputs from tinygrad

Bbox outputs from maskrcnn_benchmark

geohot · 2023-06-02T15:27:58Z

tinygrad/tensor.py

@@ -435,6 +461,7 @@ def dot(self, w:Tensor) -> Tensor:

  def contiguous(self): return mlops.Contiguous.apply(self)
  def log(self): return mlops.Log.apply(self)
+  def log2(self): return mlops.Log.apply(self)/0.69314718056


(math.log(math.e)/math.log(2)) for readability.

@geohot 0.69314718056 is math.log(2), to change base from math.e i divided it by log of 2

tinygrad/tensor.py

geohot · 2023-06-02T15:28:32Z

tinygrad/tensor.py

+    bs, c, py, px = x.shape
+    return x.reshape(bs, c, py, 1, px, 1).expand(bs, c, py, scale_factor, px, scale_factor).reshape(bs, c, py * scale_factor, px * scale_factor)
+
+  @staticmethod


If things use numpy like this, they don't belong in tensor.py

removed from tensor.py

These are still in tensor.py

missed it.. I've kept interpolate in tensor.py after removing staticmethod, moved sort and topk

… topk

…rence

tinygrad/tensor.py

geohot · 2023-06-21T17:51:50Z

Is this ready for me to test? Will run on 7900XTX and confirm it meets the target

kunwar31 · 2023-06-22T17:21:42Z

Is this ready for me to test? Will run on 7900XTX and confirm it meets the target

yes @geohot , run on 7900XTX should take around 3-4 hours
GPU=1 MODEL=mrcnn python examples/mlperf/model_eval.py

geohot · 2023-06-23T03:14:19Z

Testing now, required mkdir datasets/COCO but looks like it's running

models/mask_rcnn.py

geohot · 2023-06-23T06:58:36Z

Made it to:
3%|████ 3%|████ | 136/5000 [1:35:07<56:42:24, 41.97s/it]

and got

pyopencl._cl.MemoryError: create_buffer failed: MEM_OBJECT_ALLOCATION_FAILURE

7900XTX with 24GB of VRAM

kunwar31 · 2023-06-23T14:20:38Z

Made it to: 3%|████ 3%|████ | 136/5000 [1:35:07<56:42:24, 41.97s/it]

and got

pyopencl._cl.MemoryError: create_buffer failed: MEM_OBJECT_ALLOCATION_FAILURE

7900XTX with 24GB of VRAM

So I’ve been using OPT=1 because of the kernel fusion issue, i usually get 8 sec per image on rtx 3060 mobile, I’m suspecting this behaviour is because of OPT=2. Could you please try OPT=1 @geohot ?

geohot · 2023-06-23T17:35:46Z

Pulled, and rerunning with OPT=1

geohot · 2023-06-23T23:37:09Z

OPT=1 PYTHONPATH="." GPU=1 MODEL=mrcnn python examples/mlperf/model_eval.py

3%|████▋ | 142/5000 [27:45<15:49:38, 11.73s/it]

pyopencl._cl.MemoryError: create_buffer failed: MEM_OBJECT_ALLOCATION_FAILURE

wozeparrot · 2023-06-24T00:13:02Z

I think this is actually because you are running out of kernel program space, maybe try with method cache disabled?

geohot · 2023-06-24T02:28:21Z

At 150 now with method cache disabled, but this is brutally slow. ETA is over 24 hours.

geohot · 2023-06-25T22:36:04Z

26 hours later, congrats! Either post e-mail here or reach out to george@tinygrad.org to claim bounty

It should be faster, but the inference bounty didn't have a speed requirement :)

██████████████████████████████| 5000/5000 [26:39:09<00:00, 19.19s/it]
loading annotations into memory...
Done (t=0.35s)
creating index...      
index created!
Loading and preparing results...
DONE (t=0.54s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=20.64s).
Accumulating evaluation results...
DONE (t=3.75s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.378
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.592
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.215
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.499
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.313
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.490
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.514
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.327
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.551
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.651
loading annotations into memory...
Done (t=0.35s)
creating index...
index created!
Loading and preparing results...
DONE (t=1.88s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=23.95s).
Accumulating evaluation results...
DONE (t=3.62s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.342
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.559
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.363
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.155
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.368
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.293
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.448
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.468
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.271
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.505
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.623

kunwar31 · 2023-06-26T13:00:50Z

@geohot thanks! I know this is very slow ATM, and the one who'll pick this up for training is going to have a very hard time. Sharing some thoughts for that person:
2 major reasons why its slow is:

Huge gathers (even doing them in numpy is slow because of data transfer) (currently this also blocks gradient)
topk (doing this is numpy but the data transfers slow it down)

Both of these need some kind of X[y] instructions, so I had a hard time trying to make work in tinygrad. 1 could be done fully in tinygrad, but it was even slower.

I have sent a paypal payment request to george@tinygrad.org, my email is kunwar31@gmail.com

MaskRCNN weights loading

fef25fd

kunwar31 marked this pull request as draft May 31, 2023 20:12

wozeparrot reviewed May 31, 2023

View reviewed changes

geohot added the bounty locked Bounty is locked to someone label May 31, 2023

Kunwar Raj Singh added 4 commits June 1, 2023 04:28

backbone maybe works

5c73509

fix conflict

6bf88c1

backbone works, but resnet body atol 1e-3

218f90e

RPN Call, but veryy wrong output

6c1137e

Kunwar Raj Singh added 6 commits June 1, 2023 23:08

fixed topk

ab4f6cf

RPN maybe works, not sure about nms

6fc87b1

Fix cursed modules

624879c

add back editorconfig

bacf78a

Full call, wrong output

e64e141

Full call works

b0bfe59

fix mask

8d29223

Kunwar Raj Singh added 2 commits June 2, 2023 17:29

use NMS from retinanet

c2f50da

Removing extra funcs

c7b1e89

geohot reviewed Jun 2, 2023

View reviewed changes

tinygrad/tensor.py Outdated Show resolved Hide resolved

geohot reviewed Jun 2, 2023

View reviewed changes

Kunwar Raj Singh added 3 commits June 2, 2023 21:24

refactor

b59c73e

readable

ae6ba69

Add example to run model

a13dfd2

Kunwar Raj Singh added 4 commits June 15, 2023 00:21

flake8

c04142f

removing numpy in anchor gen, use numpy for gather, nonzero, optimize…

20d5db1

… topk

keep using tinygrad for smaller gathers

bbe4f8c

fix empty tensors

f94ec6d

kunwar31 marked this pull request as draft June 20, 2023 17:02

Kunwar Raj Singh added 4 commits June 21, 2023 16:13

comms

c178a4a

Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-infe…

4249af8

…rence

move from tensor.py

45e2a12

resnet test passing

a1aa3e0

kunwar31 marked this pull request as ready for review June 21, 2023 11:15

Kunwar Raj Singh added 2 commits June 21, 2023 16:52

add coco dataset back

14a9057

fix spaces

abb2e5e

geohot reviewed Jun 21, 2023

View reviewed changes

tinygrad/tensor.py Show resolved Hide resolved

add test for log2

daf2fa8

wozeparrot reviewed Jun 23, 2023

View reviewed changes

models/mask_rcnn.py Outdated Show resolved Hide resolved

Kunwar Raj Singh added 2 commits June 23, 2023 20:15

no need to create Tensors

ca56428

no need to create Tensors

36eb169

geohot merged commit 5d3310c into tinygrad:master Jun 25, 2023
13 checks passed

danhipke mentioned this pull request Jul 4, 2023

Mask R-CNN fails when running model_eval.py on Metal backend #1121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MaskRCNN Inference #884

MaskRCNN Inference #884

kunwar31 commented May 31, 2023 •

edited

Loading

Marcelo5444 commented May 31, 2023

wozeparrot May 31, 2023

kunwar31 May 31, 2023

kunwar31 commented May 31, 2023

tinyb0t commented Jun 1, 2023 •

edited

Loading

kunwar31 commented Jun 2, 2023

kunwar31 commented Jun 2, 2023

kunwar31 commented Jun 2, 2023

kunwar31 commented Jun 2, 2023 •

edited

Loading

geohot Jun 2, 2023

kunwar31 Jun 2, 2023

geohot Jun 2, 2023

kunwar31 Jun 2, 2023

geohot Jun 14, 2023

kunwar31 Jun 14, 2023

geohot commented Jun 21, 2023 •

edited

Loading

kunwar31 commented Jun 22, 2023

geohot commented Jun 23, 2023

geohot commented Jun 23, 2023 •

edited

Loading

kunwar31 commented Jun 23, 2023 •

edited

Loading

geohot commented Jun 23, 2023 •

edited

Loading

geohot commented Jun 23, 2023

wozeparrot commented Jun 24, 2023

geohot commented Jun 24, 2023

geohot commented Jun 25, 2023 •

edited

Loading

kunwar31 commented Jun 26, 2023

MaskRCNN Inference #884

MaskRCNN Inference #884

Conversation

kunwar31 commented May 31, 2023 • edited Loading

Marcelo5444 commented May 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kunwar31 commented May 31, 2023

tinyb0t commented Jun 1, 2023 • edited Loading

kunwar31 commented Jun 2, 2023

kunwar31 commented Jun 2, 2023

kunwar31 commented Jun 2, 2023

kunwar31 commented Jun 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geohot commented Jun 21, 2023 • edited Loading

kunwar31 commented Jun 22, 2023

geohot commented Jun 23, 2023

geohot commented Jun 23, 2023 • edited Loading

kunwar31 commented Jun 23, 2023 • edited Loading

geohot commented Jun 23, 2023 • edited Loading

geohot commented Jun 23, 2023

wozeparrot commented Jun 24, 2023

geohot commented Jun 24, 2023

geohot commented Jun 25, 2023 • edited Loading

kunwar31 commented Jun 26, 2023

kunwar31 commented May 31, 2023 •

edited

Loading

tinyb0t commented Jun 1, 2023 •

edited

Loading

kunwar31 commented Jun 2, 2023 •

edited

Loading

geohot commented Jun 21, 2023 •

edited

Loading

geohot commented Jun 23, 2023 •

edited

Loading

kunwar31 commented Jun 23, 2023 •

edited

Loading

geohot commented Jun 23, 2023 •

edited

Loading

geohot commented Jun 25, 2023 •

edited

Loading