Usage of IT framework

1. Transfer Image to Depth Map

python ./utils/trans_img2depth.py \
    --input_file image.jsonl \
    --output_folder <depth-map-folder/> \
    --image_folder <your-image-folder/> \
    --start_line 0 \
    --end_line 999

2.1 Extract Objects from Images

Prepare your dataset as follows in jsonl format:

{"image": "xxx.jpg"}

Then run:

python extract/extract_fr_img.py \
    --test_task DenseCap \
    --config_file ./extract/configs/GRiT_B_DenseCap_ObjectDet.yaml \
    --confidence_threshold 0.55 \
    --image_folder  <your-image-path/> \
    --input_file  <path to your_image.jsonl> \
    --output_file  <obj_extr_from_img.jsonl> \
    --start_line 0 \
    --end_line 999 \
    --visualize_output <visualize-output-path> \
    --opts MODEL.WEIGHTS ./ckpt/grit_b_densecap_objectdet.pth

You will get <obj_extr_from_img.jsonl>:

{"image": "xxx.jpg", "extr_obj_from_img": ["obj1","obj2"], "bounding_boxes": [[206, 137, 426, 364], [418, 119, 639, 388]]}

2.2 Extract Objects from Descriptions

Prepare your jsonl file that contains image path and corresponding description generated by MLLMs as follows:

{"image": "xxx.jpg", "description": "xxxxxxxx."}

We utilize LLMs to help extract, here provide gpt and llama based extraction way here:

gpt version: remember change the query_ChatGPT function to your own query way.

python extract/extract_fr_desc-gpt.py \
    --input_file_path <path to your.jsonl> \
    --output_file_path <obj_extr_from_desc.jsonl> \
    --start_line 0 \
    --end_line 999

llama version

CUDA_VISIBLE_DEVICES=4,5,6,7 python ./extract/extract_fr_desc-llama.py \
    --input_file description.jsonl \
    --output_file <obj_extr_from_desc.jsonl> \
    --stop_tokens "<|eot_id|>" \
    --prompt_structure "<|begin_of_text|><|start_header_id|>user<|end_header_id|>{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>" \
    --start_line 0 \
    --end_line 999

Then you will get <obj_extr_from_desc.jsonl>:

{"image": "xxx.jpg", "extr_obj_fr_desc": ["obj1","obj2"], "description": "xxxxxxxx."}

3.1 Filter Hallucinations within Description

Run:

python filter/filter_fr_desc.py \
    --model_config ./filter/GroundingDINO/groundingdino/config/GroundingDINO_SwinB_cfg.py \
    --model_checkpoint ./ckpt/groundingdino_swinb_ogc.pth \
    --box_threshold 0.20 \   
    --text_threshold 0.18 \    
    --input_file <obj_extr_from_desc.jsonl> \
    --output_file <hal_from_desc.jsonl> \
    --image_folder <your-image-path/> \
    --start_line 0 \
    --end_line 999

Then you will get <hal_from_desc.jsonl>:

{"image": "xxx.jpg", "del_obj_from_desc": ["hal2"], "description": "xxxxxxxx."}

3.2 Fine-grained Annotation for Objects in Images

Run:

python fg_annotation/mask_depth.py \
    --input_path <obj_extr_from_img.jsonl> \
    --output_path <fg_anno.jsonl> \
    --image_folder <your-image-folder\> \
    --image_depth_folder <depth-map-folder/> \
    --start_line 0 \
    --end_line 999

Then you will get <fg_anno.jsonl>:

{"image": "xxx.jpg", "extr_obj_from_img": ["obj2"], "bounding_boxes": [[418, 119, 639, 388]], "object_depth": [83], "size": [12428], "width": 640, "height": 480}

Textualization Recaptioning

First concat your <fg_anno.jsonl>, <hal_from_desc.jsonl> together into <your.jsonl> as following:

{"image": "xxx.jpg", "del_obj_from_desc": ["hal2"], "extr_obj_from_img": ["obj2"], "bounding_boxes": [[418, 119, 639, 388]], "object_depth": [83], "size": [12428], "width": 640, "height": 480, "description": "xxxxxxxx."}

Then run:

CUDA_VISIBLE_DEVICES=0,1,2,3 python ./refine/add_detail.py \
--input_file <your.jsonl> \
--output_file <refined_desc.jsonl> \
--stop_tokens "<|eot_id|>" \
--prompt_structure "<|begin_of_text|><|start_header_id|>user<|end_header_id|>{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>" \
--start_line 0 \
--end_line 999

You will get a more detailed description:

{"image": "xxx.jpg", "description": "xxxxxxxx."}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use.md

use.md

Usage of IT framework

1. Transfer Image to Depth Map

2.1 Extract Objects from Images

2.2 Extract Objects from Descriptions

3.1 Filter Hallucinations within Description

3.2 Fine-grained Annotation for Objects in Images

Textualization Recaptioning

Files

use.md

Latest commit

History

use.md

File metadata and controls

Usage of IT framework

1. Transfer Image to Depth Map

2.1 Extract Objects from Images

2.2 Extract Objects from Descriptions

3.1 Filter Hallucinations within Description

3.2 Fine-grained Annotation for Objects in Images

Textualization Recaptioning