Skip to content

RL Training Non-convergence on OmniGen2 with EditScore-7B #18

@xiaoyaopengfei

Description

@xiaoyaopengfei

Hi authors,

Thanks for your impressive work on EditScore! I am currently reproducing the RL training using omnigen2_edit_rl_single_machine_editscore7b. However, I've encountered some stability issues where the reward_mean fluctuates significantly without a clear upward trend.

Image

After analyzing the training logs and specific tasks, I have two major observations and would love to hear your insights:

  1. Reward Sparsity in motion_change Tasks
    I observed that in many motion_change samples , the SC_score is frequently 0.
    even when the model attempts an edit, EditScore often gives a zero score for Semantic Conformity, leading to sparse rewards.
    Do you think this is caused by EditScore being too strict on complex motion, or is it a limitation of the base model's initial exploration? How did you handle these zero-reward samples during your training to avoid gradient instability?
Image
  1. Reward Inconsistency in background_change Tasks
    I noticed cases where two images have very similar background fidelity/similarity, yet their rewards differ significantly.
    This high variance in rewards for similar visual outputs seems to introduce a lot of noise into the policy gradient.
    Is this inconsistency a known behavior of the 7B reward model? Or are there other normalization techniques you found effective?
Image

Environment & Hyperparameters:

Base Model: OmniGen2

Reward Model: EditScore-7B

Tasks: rl_abs_9tasks.jsonl

Training setup: Single machine, default parameters from the repo.

I've attached my training curve and some example cases for reference. Looking forward to your guidance!

Best regards,
Spike

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions