Skip to content

sekotalk v2.7支持动态target_video_length#973

Open
wangshankun wants to merge 2 commits intomainfrom
dev/seko_custom_target_len
Open

sekotalk v2.7支持动态target_video_length#973
wangshankun wants to merge 2 commits intomainfrom
dev/seko_custom_target_len

Conversation

@wangshankun
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for dynamic video lengths in the Wan audio runner and RS2V inference pipeline by adding a target_video_length parameter to the CLI and SekoTalkInputs dataclass. The changes allow the latent shape and previous frame buffers to be sized based on the input info rather than a fixed configuration. However, the review feedback identifies critical issues where checking for the existence of the attribute is insufficient because the default value is UNSET, which will cause runtime crashes during mathematical operations and tensor allocation. Additionally, a typo and logic error were found in get_latent_shape_with_lat_hw where the latent height was incorrectly assigned to a misspelled frame count attribute.

latent_w = patched_w * self.config["patch_size"][2]

latent_shape = self.get_latent_shape_with_lat_hw(latent_h, latent_w)
if hasattr(self.input_info, "target_video_length"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

仅使用 hasattr(self.input_info, "target_video_length") 检查不足以确保该字段已设置有效值。在 SekoTalkInputs 中,该字段的默认值是 UNSET。如果该字段存在但值为 UNSET,后续调用 get_latent_shape_with_lat_hw 时会因数学运算(如 target_video_length - 1)而抛出异常。建议同时检查值是否为 UNSET,或者确保在调用此方法前已执行 normalize_unset_to_none()

Comment on lines +513 to +516
if hasattr(self.input_info, "target_video_length"):
target_video_length = self.input_info.target_video_length
else:
target_video_length = self.config["target_video_length"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

此处同样存在 UNSET 值的问题。如果 self.input_info.target_video_lengthUNSEThasattr 会返回 True,导致 target_video_length 被赋值为 UNSET 对象,进而导致第 517 行在创建 torch.zeros 时崩溃。建议增加对 UNSET 的判断逻辑。

def get_latent_shape_with_lat_hw(self, latent_h, latent_w):
def get_latent_shape_with_lat_hw(self, latent_h, latent_w, target_video_length=None):
target_video_length = target_video_length if target_video_length is not None else self.config["target_video_length"]
self.input_info.latent_freams = latent_h
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

这一行存在两个明显的问题:

  1. 拼写错误latent_freams 应该是 latent_frames
  2. 逻辑错误:将 latent_h(潜空间高度)赋值给帧数相关的字段是不正确的。此外,SekoTalkInputs 并没有定义此字段,这样做会动态添加一个属性且赋值逻辑有误。如果不需要此字段,建议删除;如果需要记录潜空间帧数,应使用计算后的帧数值:(target_video_length - 1) // self.config["vae_stride"][0] + 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant