Sprite Sheet Diffusion: Generate Game Character for animation

Carnegie Mellon University

Abstract

In the game development process, creating character animations is a vital step that involves several stages. Typically for 2D games, illustrators begin by designing the main character image, which serves as the foundation for all subsequent animations. These subsequent animations involve drawing the character in different poses and actions, such as running, jumping, or attacking, to create a smooth motion sequence. This process requires significant manual effort from illustrators, as they must meticulously ensure consistency in design, proportions, and style across multiple frames of motion. Each frame is drawn individually, making this a time-consuming and labor-intensive task. Generative models, such as diffusion models, have the potential to revolutionize this process by automating the creation of sprite sheets. Diffusion models, known for their ability to generate diverse images, can be adapted to generate character animations. By leveraging the capabilities of diffusion models, we can significantly reduce the manual workload for illustrators, accelerate the animation creation process, and open up new creative possibilities in game development.

Reward Models

Reward Model Architecture

The overview of our framework. In this work, we adapt the framework proposed by Animate Anyone for the novel application of generating sprite sheets tailored for game character animation. The methodology comprises three key components: ReferenceNet, Pose Guider, and Motion Module. ReferenceNet encodes the appearance features of the character from a reference image by leveraging a SD-v1.5 model with modified self-attention layers replaced by spatial-attention layers. Cross-attention, driven by a CLIP image encoder, enhances feature integration between ReferenceNet and denoising net. The Pose Guider encodes motion information using four convolution layers to align the pose image with the same resolution as the noise latent. The processed pose image is then added to the noisy latent before being input to the denoising net. To ensure temporal continuity, the Motion Module is embedded in the Res-Trans block, following spatial- and cross- attention layers, effectively modeling smooth transitions between animation frames.

Results

TBD