-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 version of annotation of benchmark are inaccurate.. if I did any wrong? #30
Comments
I’m not sure what inaccurateness you are referring to.
For the groundtruth, the original AFLW2000-3D annotation exactly contains
much error. Its annotation process (described in the associated CVPR 2016
paper) is automatic but not manual with several failure cases as you shown
here. Thus, there is another reannotated version of AFLW2000-3D for remedy
but it’s more recent and the convention is comparison on the original one.
I think it makes more sense to evaluate on the reannotated version. Or
recently there are some other better-quality datasets for facial alignments.
For the model, I agree that there are still some room of improvement. The
model tends to produce large error on highly occluded or very large pose
faces, but it’s what we always struggle with.
…On Tuesday, June 27, 2023, ken881015 ***@***.***> wrote:
-
Hello, I'm really appreciate the work you completed, *SynergyNet* is
not only light-weighted but also keep an acceptable accuracy on AFLW2000-3D.
-
Although I'm also one of trainer who can't reproduce NME 3.4% (best is
3.674% after fix code problem in here
<https://urldefense.com/v3/__https://github.com/choyingw/SynergyNet/issues/18*issuecomment-1600030352__;Iw!!LIr3w8kk_Xxm!pQ5KerkncxONoAPU1G2QmGWgO3mKJwAwO7WOB8Cfuh8ULlea9I9anVPzlt_NWafh8M-lEXbCIaILwF97KqR6cxjX$>)
on original annotation of benchmark, I keep trying to analysis what kind of
images model failed on it and try improve it. So, I sorted NME of 2000
images, make a grid of 48 worst images and the model alignment on it, and
show the ground truth of annotation beside it. (for each pair, left is
model output and right is ground truth. and its reannotated version).
[image: grid_of_worst_alignment_0~47_re_v2_fix_loss_problem_80]
<https://urldefense.com/v3/__https://user-images.githubusercontent.com/38501223/249102379-3799a5e7-5439-4466-9bf8-73dfd3703a3e.png__;!!LIr3w8kk_Xxm!pQ5KerkncxONoAPU1G2QmGWgO3mKJwAwO7WOB8Cfuh8ULlea9I9anVPzlt_NWafh8M-lEXbCIaILwF97Kpwg4-gH$>
-
As you can see, some annotation of Ground Truth is not accurate,
(index start from 1) pair of (1,1) (1,2) is obvious that its annotation is
not worth for reference..., but by other pair (e.g. (8,6)), it shows that
reason of large NME is due to model performance instead of inaccurate
annotation, namely, it's still have chance to be improved.
-
Here is part of my code to post process the file (roi_box, pts68...)
you offer in the repo, and visualize the alignment on image. For the
inaccurate problem, did I do anything wrong? or is there any opinion you
can share for us? I'll be really appreciated for it.
# put this code in ./aflw2000_data/ and you can run it
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
# you can select by image name
img_name = "image02156.jpg"
img = plt.imread("./AFLW2000-3D_crop/"+img_name)
# choose the version of benchmark annotation (ori or re)
pts68 = np.load("./eval/AFLW2000-3D.pts68.npy")
pts68 = np.load("./eval/AFLW2000-3D-Reannotated.pts68.npy")
bbox = np.load("./eval/AFLW2000-3D_crop.roi_box.npy")
fname_list = Path("./AFLW2000-3D_crop.list").read_text().strip().split('\n')
# coordinate process
pts68[:,0,:] = (pts68[:,0,:] - bbox[:,[0]]) / (bbox[:,[2]] - bbox[:,[0]]) * 120
pts68[:,1,:] = (pts68[:,1,:] - bbox[:,[1]]) / (bbox[:,[3]] - bbox[:,[1]]) * 120
fig, ax = plt.subplots()
# plot image
ax.imshow(img)
# scatter landmarks
idx = fname_list.index(img_name)
ax.scatter(pts68[idx,0,:], pts68[idx,1,:])
fig.savefig("alignment.jpg")
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/choyingw/SynergyNet/issues/30__;!!LIr3w8kk_Xxm!pQ5KerkncxONoAPU1G2QmGWgO3mKJwAwO7WOB8Cfuh8ULlea9I9anVPzlt_NWafh8M-lEXbCIaILwF97KiF-N2mM$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKMLAJI2LHS7D5GFEDLUVQLXNKT7HANCNFSM6AAAAAAZVK3B44__;!!LIr3w8kk_Xxm!pQ5KerkncxONoAPU1G2QmGWgO3mKJwAwO7WOB8Cfuh8ULlea9I9anVPzlt_NWafh8M-lEXbCIaILwF97KtPnkr5j$>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Ph.D. candidate of Computer Science Department
University of Southern California
|
Thanks for your clear visualization. As far as I know, annotation of 3D
landmarks is very challenging, some recent datasets such as NoW benchmark
or DAD-3DHeads (https://www.pinatafarm.com/research/dad-3dheads) may be a
better choice. Otherwise, you can manually filter out bad annotations in
AFLW2000-Reannotation, which is reasonable in my opinion.
Random erasing in my opinion may help in some limited cases such as cropped
out faces, but some other occlusion types such as hands or scarfs are hard
since the occlusion shape is irregular. An easy hack is to add those erased
images in the training set and see if this trick helps in cropped out face
cases.
About the NME, I'm not sure what causes this phenomenon (maybe learning
rate change points), but I think the best NME happens after milestones is
reasonable, since the milestone suggests when to adjust learning rate. The
lower LR indicates better convergence. Something we have in our mind (but
not fully tested yet) is maybe a lower final LR and longer training epochs
may help attain better minima.
Ph.D. candidate of Computer Science Department
University of Southern California
…On Tue, Jun 27, 2023 at 11:40 PM ken881015 ***@***.***> wrote:
- Thanks for the reply.
- The pic I show is *reannotated* version of benchmark. However, what
surprises me is that it still has some bad annotations (such as pair(1,1),
pair(1,2)). So, maybe I will find a new face alignment dataset for
validation. Thank you for your suggestions.
- Regarding occlusion, I am currently trying to add an augmentation
technique of *randomly erasing* parts of the input image to improve
the model's ability in handling occlusions.
- Lastly, while tuning parameters and fixing some issues in the code,
I recorded the NME (Normalized Mean Error) throughout the training process.
I have a few questions that I would like to ask you:
[image: image]
<https://urldefense.com/v3/__https://user-images.githubusercontent.com/38501223/249372723-b800dd1c-1747-4e2c-af14-2bce6d9a0003.png__;!!LIr3w8kk_Xxm!tzdzerviy2QCrtXznJylGRjRSiHv2bckDBRkEF80pzUHoIOGQ-d2P7zqZSoFG_2qCl4UtqZ1GIpfeuhtUWCnpQOK$>
- Coincidentally, between epoch 25~50 almost all of the processes
show a curve in the shape of a hill.
- Surprisingly, the best NME in each process happened after
milestone (default: 48, 64)
- Do you think such phenomenon is explainable or just heuristic ?
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/choyingw/SynergyNet/issues/30*issuecomment-1610852535__;Iw!!LIr3w8kk_Xxm!tzdzerviy2QCrtXznJylGRjRSiHv2bckDBRkEF80pzUHoIOGQ-d2P7zqZSoFG_2qCl4UtqZ1GIpfeuhtUZJ0bXUQ$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKMLAJMZYO3F2DF3LSCYFRLXNPGWLANCNFSM6AAAAAAZVK3B44__;!!LIr3w8kk_Xxm!tzdzerviy2QCrtXznJylGRjRSiHv2bckDBRkEF80pzUHoIOGQ-d2P7zqZSoFG_2qCl4UtqZ1GIpfeuhtUev-Fk-7$>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
For the AFLW2000-3D, if there is some remedy for out-of-distribution cases
(occlusion, underwater, or much large pose), the NME can still be improved
as those cases can break down the whole performance greatly. Much of the
previous methods for facial landmark generally focus on learning a good
representation for face, but didn't specifically incorporate prior for OOD
data. Happy to discuss more via email.
Ph.D. candidate of Computer Science Department
University of Southern California
…On Tue, Jun 27, 2023 at 11:58 PM Cho-Ying Wu ***@***.***> wrote:
Thanks for your clear visualization. As far as I know, annotation of 3D
landmarks is very challenging, some recent datasets such as NoW benchmark
or DAD-3DHeads (https://www.pinatafarm.com/research/dad-3dheads) may be
a better choice. Otherwise, you can manually filter out bad annotations in
AFLW2000-Reannotation, which is reasonable in my opinion.
Random erasing in my opinion may help in some limited cases such as
cropped out faces, but some other occlusion types such as hands or scarfs
are hard since the occlusion shape is irregular. An easy hack is to add
those erased images in the training set and see if this trick helps in
cropped out face cases.
About the NME, I'm not sure what causes this phenomenon (maybe learning
rate change points), but I think the best NME happens after milestones is
reasonable, since the milestone suggests when to adjust learning rate. The
lower LR indicates better convergence. Something we have in our mind (but
not fully tested yet) is maybe a lower final LR and longer training epochs
may help attain better minima.
Ph.D. candidate of Computer Science Department
University of Southern California
On Tue, Jun 27, 2023 at 11:40 PM ken881015 ***@***.***>
wrote:
>
> - Thanks for the reply.
> - The pic I show is *reannotated* version of benchmark. However, what
> surprises me is that it still has some bad annotations (such as pair(1,1),
> pair(1,2)). So, maybe I will find a new face alignment dataset for
> validation. Thank you for your suggestions.
> - Regarding occlusion, I am currently trying to add an augmentation
> technique of *randomly erasing* parts of the input image to improve
> the model's ability in handling occlusions.
> - Lastly, while tuning parameters and fixing some issues in the code,
> I recorded the NME (Normalized Mean Error) throughout the training process.
> I have a few questions that I would like to ask you:
> [image: image]
> <https://urldefense.com/v3/__https://user-images.githubusercontent.com/38501223/249372723-b800dd1c-1747-4e2c-af14-2bce6d9a0003.png__;!!LIr3w8kk_Xxm!tzdzerviy2QCrtXznJylGRjRSiHv2bckDBRkEF80pzUHoIOGQ-d2P7zqZSoFG_2qCl4UtqZ1GIpfeuhtUWCnpQOK$>
> - Coincidentally, between epoch 25~50 almost all of the processes
> show a curve in the shape of a hill.
> - Surprisingly, the best NME in each process happened after
> milestone (default: 48, 64)
> - Do you think such phenomenon is explainable or just heuristic ?
>
> —
> Reply to this email directly, view it on GitHub
> <https://urldefense.com/v3/__https://github.com/choyingw/SynergyNet/issues/30*issuecomment-1610852535__;Iw!!LIr3w8kk_Xxm!tzdzerviy2QCrtXznJylGRjRSiHv2bckDBRkEF80pzUHoIOGQ-d2P7zqZSoFG_2qCl4UtqZ1GIpfeuhtUZJ0bXUQ$>,
> or unsubscribe
> <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKMLAJMZYO3F2DF3LSCYFRLXNPGWLANCNFSM6AAAAAAZVK3B44__;!!LIr3w8kk_Xxm!tzdzerviy2QCrtXznJylGRjRSiHv2bckDBRkEF80pzUHoIOGQ-d2P7zqZSoFG_2qCl4UtqZ1GIpfeuhtUev-Fk-7$>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello, I'm really appreciate the work you completed, SynergyNet is not only light-weighted but also keep an acceptable accuracy on AFLW2000-3D.
Although I'm also one of trainer who can't reproduce NME 3.4% (best is 3.674% after fix code problem in here) on original annotation of benchmark, I keep trying to analysis what kind of images model be failed on it and try improve through training process.
So, I sorted NME of 2000 images, make a grid of 48 worst images and the model alignment on it, and show the ground truth of annotation beside it. (for each pair, left is model output and right is ground truth. and it's reannotated version).
![grid_of_worst_alignment_0~47_re_v2_fix_loss_problem_80](https://private-user-images.githubusercontent.com/38501223/249102379-3799a5e7-5439-4466-9bf8-73dfd3703a3e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA4NTcyNTgsIm5iZiI6MTcyMDg1Njk1OCwicGF0aCI6Ii8zODUwMTIyMy8yNDkxMDIzNzktMzc5OWE1ZTctNTQzOS00NDY2LTliZjgtNzNkZmQzNzAzYTNlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzEzVDA3NDkxOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM0MTYyMGVhMjNlNTJhZWJjZmQ4NWVlZDRmYmNkZTM0MTE3ZGEzN2M5MjVkZGI2MGRhNjczMjBlMTkxNGZlZjQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Nf6bLeywdWRLUXbZ6Wej0nA35cOJ0fzPDp3lh6YYkQY)
As you can see, some annotation of Ground Truth is not accurate, (index start from 1) pair of (1,1) (1,2) is obvious that its annotation is not worth for reference..., but by other pair (e.g. (8,6)), it shows that reason of large NME is due to model performance instead of inaccurate annotation, namely, it's still have chance to be improved.
Here is part of my code to post process the file (roi_box, pts68...) you offer in the repo, and visualize the alignment on image. For the inaccurate problem, did I do anything wrong? or is there any opinion you can share for us? I'll be really appreciated for it.
The text was updated successfully, but these errors were encountered: