CogVideo & CogVideoX 微调代码源码解析(十三)
Video Caption
Typically, most video data does not come with corresponding descriptive text, so it is necessary to convert the video
data into textual descriptions to provide the essential training data for text-to-video models.