-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Hi authors,
Thanks for your impressive work on EditScore! I am currently reproducing the RL training using omnigen2_edit_rl_single_machine_editscore7b. However, I've encountered some stability issues where the reward_mean fluctuates significantly without a clear upward trend.
After analyzing the training logs and specific tasks, I have two major observations and would love to hear your insights:
- Reward Sparsity in motion_change Tasks
I observed that in many motion_change samples , the SC_score is frequently 0.
even when the model attempts an edit, EditScore often gives a zero score for Semantic Conformity, leading to sparse rewards.
Do you think this is caused by EditScore being too strict on complex motion, or is it a limitation of the base model's initial exploration? How did you handle these zero-reward samples during your training to avoid gradient instability?
- Reward Inconsistency in background_change Tasks
I noticed cases where two images have very similar background fidelity/similarity, yet their rewards differ significantly.
This high variance in rewards for similar visual outputs seems to introduce a lot of noise into the policy gradient.
Is this inconsistency a known behavior of the 7B reward model? Or are there other normalization techniques you found effective?
Environment & Hyperparameters:
Base Model: OmniGen2
Reward Model: EditScore-7B
Tasks: rl_abs_9tasks.jsonl
Training setup: Single machine, default parameters from the repo.
I've attached my training curve and some example cases for reference. Looking forward to your guidance!
Best regards,
Spike
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels