Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AudioLDM2模型复现前向推理 #366

Merged
merged 8 commits into from
Jan 19, 2024
Merged

Conversation

NKNaN
Copy link
Contributor

@NKNaN NKNaN commented Dec 29, 2023

任务:#250

  • text-to-audio推理已跑通
  • text-to-speech还需转换参数并调试一下结果

Copy link

paddle-bot bot commented Dec 29, 2023

Thanks for your contribution!

@luyao-cv
Copy link
Collaborator

luyao-cv commented Jan 2, 2024

文件数量较多,有些代码文件和套件重复。可直接import。例如gpt2, latent_encoder, unet文件夹等

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种config的文件,无需上传,统一配置成from_pretrained()这样的接口,

Copy link
Contributor Author

@NKNaN NKNaN Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种config的文件,无需上传,统一配置成from_pretrained()这样的接口,

HTSAT 的 config 现在改成固定写在代码里面的了,原作者只提供了 HTSAT-base 的对应参数,其他的 config 文件已删除
image

@LokeZhou
Copy link
Collaborator

LokeZhou commented Jan 2, 2024

请提供前向推理的对齐结果,可以是结果文件,或输出tensor对齐截图等

@NKNaN
Copy link
Contributor Author

NKNaN commented Jan 3, 2024

请提供前向推理的对齐结果,可以是结果文件,或输出tensor对齐截图等

inference sample results.zip
这是 text prompt 为 "Musical constellations twinkling in the night sky, forming a cosmic melody. " 时不同 seed 生成的结果文件

@NKNaN
Copy link
Contributor Author

NKNaN commented Jan 12, 2024

文件数量较多,有些代码文件和套件重复。可直接import。例如gpt2, latent_encoder, unet文件夹等

因为原作的这些模型跟套件已有的模型结构和推理过程有一些区别,比如这里面的roberta-base 和 gpt2
differences.xlsx

我尽可能简化一下吧

@NKNaN
Copy link
Contributor Author

NKNaN commented Jan 12, 2024

转换的参数文件 (model_state.pdparams) 和 config 文件:https://aistudio.baidu.com/datasetdetail/252967
model_state.pdparams 里面去掉了原作里的 litema 模块对应的参数

@luotao1 luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Jan 15, 2024
@LokeZhou
Copy link
Collaborator

文件数量较多,有些代码文件和套件重复。可直接import。例如gpt2, latent_encoder, unet文件夹等

因为原作的这些模型跟套件已有的模型结构和推理过程有一些区别,比如这里面的roberta-base 和 gpt2 differences.xlsx

我尽可能简化一下吧

如果模型结构没差别,推理过程有差异,可以只重写forward,参考https://github.com/PaddlePaddle/PaddleMIX/blob/develop/paddlemix/models/qwen_vl/modeling.py#L101

@luyao-cv
Copy link
Collaborator

luyao-cv commented Jan 15, 2024

文件数量较多,有些代码文件和套件重复。可直接import。例如gpt2, latent_encoder, unet文件夹等

因为原作的这些模型跟套件已有的模型结构和推理过程有一些区别,比如这里面的roberta-base 和 gpt2 differences.xlsx

我尽可能简化一下吧

网络定义的名字不完全对齐。建议和套件已有的模型对齐。如果有和已有模型不一样的结构,需重写forward函数

@NKNaN
Copy link
Contributor Author

NKNaN commented Jan 15, 2024

如果模型结构没差别,推理过程有差异,可以只重写forward,参考https://github.com/PaddlePaddle/PaddleMIX/blob/develop/paddlemix/models/qwen_vl/modeling.py#L101

网络定义的名字不完全对齐。建议和套件已有的模型对齐。如果有和已有模型不一样的结构,需重写forward函数

好的,我再改一下

@LokeZhou
Copy link
Collaborator

LGTM,辛苦update到最新的paddlemix,让ci跑过后合入

@NKNaN
Copy link
Contributor Author

NKNaN commented Jan 18, 2024

LGTM,辛苦update到最新的paddlemix,让ci跑过后合入

好的,autoencoder和unet也需要再改一下,今天应该能改好

@NKNaN
Copy link
Contributor Author

NKNaN commented Jan 18, 2024

修改后的参数和config: https://aistudio.baidu.com/datasetdetail/257191

```bash
python run_predict.py \
--text "Musical constellations twinkling in the night sky, forming a cosmic melody." \
--model_name_or_path "/home/aistudio/data/data252967" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个不要用具体的路径

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个examples的__init__.py可以去掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

# = sum_i x[i] sinc(pi * orig_freq * ((i - orig_freq) / orig_freq - j / new_freq))
# = sum_i x[i + orig_freq] sinc(pi * orig_freq * (i / orig_freq - j / new_freq))
# so y[j+new_freq] uses the same filter as y[j], but on a shifted version of x by `orig_freq`.
# This will explain the F.conv1d after, with a stride of orig_freq.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些注释可以酌情删掉一些

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除


# self.time_pool = max(self.cond_stage_config["crossattn_audiomae_pooled"]["params"]["time_pooling_factors"])
# self.freq_pool = max(self.cond_stage_config["crossattn_audiomae_pooled"]["params"]["freq_pooling_factors"])
# self.mae_token_num = int(512/(self.time_pool*self.freq_pool))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以删掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

cond_dict = self.get_input(batch)

# self.model.train()
# print("!!!!!!!!!!!!!train")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

@@ -0,0 +1,5 @@
librosa
ppdiffusers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ppdiffusers这个包可以不放在这里,在README.md文档里引导到这里安装就行https://github.com/PaddlePaddle/PaddleMIX/blob/develop/README.md?plain=1#L62

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@LokeZhou
Copy link
Collaborator

重新给了一些小修的comment。另外ci一直未过,是否基于最新的paddlemix提的pr。可重点对比一下这个脚本单测,https://github.com/PaddlePaddle/PaddleMIX/blob/develop/tests/models/test_minigpt4.py#L561

@NKNaN
Copy link
Contributor Author

NKNaN commented Jan 18, 2024

重新给了一些小修的comment。另外ci一直未过,是否基于最新的paddlemix提的pr。可重点对比一下这个脚本单测,https://github.com/PaddlePaddle/PaddleMIX/blob/develop/tests/models/test_minigpt4.py#L561

感谢review。刚才git pull --rebase了,本地分支这里应该已经是最新的
https://github.com/NKNaN/PaddleMIX/blob/ayase-develop/tests/models/test_minigpt4.py#L561

ci里面报错的 tests.models.test_minigpt4.MiniGPT4VisionModelTest 的 test_save_load 是调用了父类 ModelTesterMixin 的方法,是不是应该在 MiniGPT4VisionModelTest 类里面把 test_save_load 重写成 pass

@LokeZhou
Copy link
Collaborator

MiniGPT4VisionModelTest

后面我们再统一查一下,当前ci不过不影响合入

@LokeZhou LokeZhou merged commit f049e2d into PaddlePaddle:develop Jan 19, 2024
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor HappyOpenSource 快乐开源活动issue与PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants