Part 1 - MagiScan 3D and Blender to Create the Pikachu Model

Download the free app MagiScan3D and follow the instruction to create the 3D model.
Once the model is ready, export it as glb format. At this stage the 3D scan is raw, and needs a cleanup.
Download Blender 3.6.3 and open it.
File > Import > glTF 2.0 > Load the model from MagiScan3D.
First, on the top right, change the view of the object. Then, change Object Mode to Edit Mode.

Select all the vertices to be deleted > Right click > Delete Vertices.

The final model should be clean and should look as follow.

File > Export > .fbx > On the right column, Path Mode > Copy > Select the box near Copy and save.

UV Editing > Image > Save As... > Save the image texture of the object (as RGBA).

Part 2 - Unity Perception for Synthetic Data Generation

Download Unity Hub and Unity 2022.3.21f1 Silicon.

Start a new High Definition 3D project.

Window > Package Manager > Add package from git URL > Insert com.unity.perception.
Window > Package Manager > Perception > Samples > Tutorial Files > Import.

Project tab > Assets > Create a new folder called Scene.
Inside the Scene folder > Create > Scene, and call it TutorialScene, then double click on it.

In the Hierarchy panel, double click the Main Camera.
In the Inspector panel of the Main Camera modify the values according to the image.

Always in the Inspector panel of the Main Camera click on Add Component and add Perception Camera.
Edit > Project Settings > Editor > disable Asynchronous Shader Compilation.

Project tab > Look for "HDRP High Fidelity" in the search tab > Lit Shader Mode > Both.

Main Camera > Inspector > Perception Camera (Script) > Camera Labelers > +, and add first BoundingBox2DLabeler, and then SemanticSegmentationLabeler.

Project > Assets folder > Create > Perception > ID Label Config, renamed TutorialIdLabelConfig.

Project > Assets folder > Create > Perception > Semantic Segmentation Label Config, renamed TutorialSemanticSegmentationLabelConfig.

Main Camera > Perception Camera (Script) > Drag and drop the newly created files to the corresponding Camera Labelers Label Config (see image).

Project > Scene > Drag and drop the Pikachu model (.fbx), the model texture (.png), and the background image (png).
Project > Scene > Create > Material > Drag and drop the model texture (.png) to the new material's Surface Inputs > Base Map.

Project > Scene > Drag and drop the Pikachu model into the Hierarchy. For the moment, this Pikachu will appear without colors nor texture.
Drag and drop the material ball on the white Pikachu in Scene. Now the Pikachu should appear colored.
Hierarchy > Pikachu object > Inspector > Add Component > Labeling > Use Automatic Labeling > Labeling Scheme > Use asset name > Add to Label Config... > Select both TutorialIdLabelConfig and TutorialSemanticSegmentationLabelConfig (Add Label for both).

Hierarchy > Right click > 3D Object > Cube > Drag the background image and drop it on the Cube object (which should now have the texture of the background image).

Hierarchy > Cube > Inspector > Adjust the values of Transform according to the image.

Before proceeding, it might be necessary to modify the Directional Light to match some better values.

Hierarchy > Pikachu object > Inspector > Add Component > Fixed Lenght Scenario > Add Randomizer > RotationRandomizer > Set the values of the image.

Lastly, always in the Pikachu object Inspector > Add Component > Rotation Randomizer Tag (which is already present in the above image).
Now, by pressing the play button the data generation will begin.

To find where the images are being saved: Edit > Project Settings > Perception > Solo Endpoint > Base Path is the folder where the outputs are collected. (Show Folder) to check.

Part 3 - Train YOLO Model and use it in Real Time

It is convenient to repeat the synthetic data generation process with multiple position of the object in the frame. In this case, repeat the generation with 4 different position-size combination.

For each data generation (4) we have now a folder of sequences. The decision of generating sequences rather than single frames is because the first shot is blurry; the Perception package is so fast in making screenshots that the object movement cannot follow. The structure of the Unity outputs is as follows.

data
 |
 └── pika1
 |    |
 |    └── annotation_definitions.json
 |    └── metadata.json
 |    └── metric_definition.json
 |    └── sensor_definitions.json
 |    └── sequence.0
 |    └── sequence.1
 |    └── ..
 |    └── sequence.2
 |    |    |
 |    |    └── step0.camera.png
 |    |    └── step0.camera.semantic.segmentation.png
 |    |    └── step0.frame_data.json
 |    |    └── ..
 |    |    └── step4.camera.png
 |    |    └── step4.camera.semantic.segmentation.png
 |    |    └── step4.frame_data.json
 |    |
 |    └── ..
 |    └── sequence.110
 |
 └── pika2
 └── pika3
 └── pika4

From each sequence extract the last frame, the 5-th. Together with the frame, we collect the data from the corresponding .json annotation and we save it in the YOLOv8 format, namely <class_id> <x_center> <y_center> <width> <height>. The script that does that is code/extract_frame_and_data.py.
Navigate to the output of the script and create a new file, data.yaml, with the following content. This is needed during the training of the YOLOv8 model.

train: ../images
val: ../images

nc: 1
names: ['Pikachu']

The folder has to have the following format.

dataset
 |
 └── data.yaml
 |
 └── images
 |    |
 |    └── v1__1.png
 |    └── ..
 |    └── v1__100.png
 |    └── v2__1.png
 |    └── ..
 |    └── v2__100.png
 |    └── v3__1.png
 |    └── ..
 |    └── v3__100.png
 |    └── v4__1.png
 |    └── ..
 |    └── v4__100.png
 |
 └── labels
      |
      └── v1__1.txt
      └── ..
      └── v1__100.txt
      └── v2__1.txt
      └── ..
      └── v2__100.txt
      └── v3__1.txt
      └── ..
      └── v3__100.txt
      └── v4__1.txt
      └── ..
      └── v4__100.txt

Zip the folder and load it on Colab.
Move to Colab > Use the notebook code/train_yolov8_model.ipynb to train a YOLOv8 model.
Save /content/runs/detect/train/weights/best.pt locally.
To run the model, connect a webcam and run code/run_realtime_pikachu_detection.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tutorial.md

tutorial.md

Part 1 - MagiScan 3D and Blender to Create the Pikachu Model

Part 2 - Unity Perception for Synthetic Data Generation

Part 3 - Train YOLO Model and use it in Real Time

Files

tutorial.md

Latest commit

History

tutorial.md

File metadata and controls

Part 1 - MagiScan 3D and Blender to Create the Pikachu Model

Part 2 - Unity Perception for Synthetic Data Generation

Part 3 - Train YOLO Model and use it in Real Time