This repository contains technical explanations relating to the investigation of using tuning methods based on the stable diffusion model to insert objects into human-object interaction videos.
To reproduce my results or use Tune-A-Video for other goals, I wrote detailed installation guidelines that can be found here. I also included the conda list results mentioning all the packages in the working environment and their versions, you can find it here.
Details on how to use the project.
the survey can be found in this link
.
└── object-insertion-video-diffusion/
├── docker
├── human_evaluation/
│ ├── Images/
│ │ └── ...
│ ├── data_survey.csv
│ └── analyzing_survey.ipynb
├── tools
├── Tune-A-Video/
│ ├── data (extract here)/
│ │ └── my_pairs/
│ │ ├── original/
│ │ │ ├── object1.mp4
│ │ │ ├── object2.mp4
│ │ │ └── ...
│ │ └── pretending/
│ │ ├── object1.mp4
│ │ └── object2.mp4
│ ├── configs/
│ │ ├── original/
│ │ │ ├── object1.yml
│ │ │ └── ...
│ │ └── pretending
│ ├── scripts
│ └── infer_args.py
├── pod.yml
├── README.md
├── conda_list.md
└── tuneavideo_installation.md
Guidelines for contributing to the project.
If you use the dataset provided or any other part from this work please cite using
@inproceedings{objectinsert2024,
title={Object Insertion into a Video Using Diffusion model},
author={Israelov, Shani}
year={2024}
}
This work utilizes Tune-A-Video
@inproceedings{wu2023tune,
title={Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation},
author={Wu, Jay Zhangjie and Ge, Yixiao and Wang, Xintao and Lei, Stan Weixian and Gu, Yuchao and Shi, Yufei and Hsu, Wynne and Shan, Ying and Qie, Xiaohu and Shou, Mike Zheng},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={7623--7633},
year={2023}
}