Qwen Storyteller

Upload up to 10 images to generate a creative story.

Key Features

  • Cross-Frame Consistency: Maintains consistent character and object identity across multiple frames through visual similarity and face recognition techniques
  • Structured Reasoning: Employs chain-of-thought reasoning to analyze scenes with explicit modeling of characters, objects, settings, and narrative structure
  • Grounded Storytelling: Uses specialized XML tags to link narrative elements directly to visual entities
  • Reduced Hallucinations: Achieves 12.3% fewer hallucinations compared to the non-fine-tuned base model

Model trained by daniel3303, repository here.

@misc{oliveira2025storyreasoningdatasetusingchainofthought,
      title={StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation}, 
      author={Daniel A. P. Oliveira and David Martins de Matos},
      year={2025},
      eprint={2505.10292},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.10292}, 
}