Qwen Storyteller
Upload up to 10 images to generate a creative story.
Key Features
- Cross-Frame Consistency: Maintains consistent character and object identity across multiple frames through visual similarity and face recognition techniques
- Structured Reasoning: Employs chain-of-thought reasoning to analyze scenes with explicit modeling of characters, objects, settings, and narrative structure
- Grounded Storytelling: Uses specialized XML tags to link narrative elements directly to visual entities
- Reduced Hallucinations: Achieves 12.3% fewer hallucinations compared to the non-fine-tuned base model
Model trained by daniel3303, repository here.
@misc{oliveira2025storyreasoningdatasetusingchainofthought,
title={StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation},
author={Daniel A. P. Oliveira and David Martins de Matos},
year={2025},
eprint={2505.10292},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.10292},
}