Image to Video Animation with AnimateDiff and IP-Adapter (A1111)

Image to Video Animation with AnimateDiff and IP-Adapter (A1111)
Transform images (face portraits) into dynamic videos quickly by utilizing AnimateDiff, LCM LoRA's, and IP-Adapters integrated within Stable Diffusion (A1111).

1. Introduction

Enter our creative realm, where the effortless transformation of images into videos awaits! Join us as we delve into the smooth integration of AnimateDiff, LCM LoRA's, and IP-Adapters, designed to bring static images to life effortlessly. Our journey will navigate through the innovative Stable Diffusion framework (A1111), emphasizing its exceptional performance in transforming images, specifically face portraits or images focusing on the upper body or close-up personas, into dynamic videos. Elevate your content creation skills and effortlessly animate your static images with our specialized method!

2. Requirements: Image to Video

Before embarking on the journey to craft incredible videos from images (optimal for close-up shots or face portraits), it's essential to establish the necessary prerequisites. Below, you'll discover a comprehensive list of the requirements to unleash the potential for creating remarkable videos from images.

Requirement 1: AnimateDiff Extension and LCM LoRA's

AnimateDiff is our favored extension for effortlessly generating videos or even GIFs. If you haven't yet installed the AnimateDiff extension and the LCM LoRA's, which accelerate the rendering process, you can consult a dedicated article below for instructions on downloading and installing them:

Fast Video Generation with AnimateDiff & LCM LoRA's (A1111)

Requirement 2: ControlNet

To proceed, make sure you have ControlNet installed and updated to its latest version. For detailed guidance on the installation process, refer to our comprehensive ControlNet Installation Guide, especially if you haven't installed ControlNet yet.

Requirement 3: IP-Adapter Model for ControlNet

  1. Obtain the necessary IP-adapter models for ControlNet, conveniently available on the Huggingface website.
  2. For the purpose of this tutorial, focus on using a particular IP-adapter model file named as "ip-adapter-plus_sd15.safetensors"
  3. Once you have downloaded the IP adapter model, proceed to relocate the file to the designated directory: "stable-diffusion-webui > extensions > sd-webui-controlnet > models"


If you desire an in-depth article on utilizing IP-Adapter models, Feel free to delve into and uncover additional information about the diverse IP-adapter models and their specific applications.

Requirement 4: Initial Image

Drawing from our experiments, we highly recommend using either a face portrait image or a body close-up image as the basis for crafting a video. You have the option to generate an initial reference image using txt2img or simply choose a beautiful picture from the internet. Nevertheless, don't hesitate to explore different reference images for a variety of outcomes.

Stable Diffusion in the Cloud⚡️Run Automatic1111 in your browser in under 90 seconds
20% bonus on first deposit

3. ControlNet Settings (IP-Adapter Model)

Access the Stable Diffusion UI, go to the Txt2img subtab, and scroll down to locate the ControlNet settings. We will utilize the IP-Adapter control type in ControlNet, enabling image prompting. This means that our initial image will be the reference for the style, facial structures, and resemblance in our final video animation, if you want to learn more about image prompting with the use of IP-Adapters, you can refer to our stand alone article. However, at this moment, we will proceed with the configuration specified below:


  • Provide the canvas with the reference image.
  • Enable ControlNet and select "pixel perfect"
  • Control Type: "IP-Adapter".
  • Preprocessor: "ip-adapter_clip_sd15".
  • Model: "ip-adapter-plus_sd15" (This represents the IP-Adapter model that we downloaded earlier).
  • Control Weight: 1

The remaining settings can remain in their default state. Once the ControlNet settings are configured, we are prepared to move on to our AnimateDiff settings. Let's proceed.

4. AnimateDiff Settings (Video and GIF Animation)

Next, we'll find our AnimateDiff dropdown menu within the Txt2Img subtab and customize the settings to generate a video or GIF animation from the provided image in ControlNet. The settings are listed below, but feel free to experiment with alternative configurations if desired.


  • Enable the "AnimateDiff" checkbox
  • Motion module: "mm_sd_v15_v2.ckpt"
  • Save Format: MP4 and GIF (Alternatively, consider WebM for optimal streaming and uploading due to compatibility with modern browsers and HTML5. MP4 may be preferred for higher-quality playback and broader device compatibility).
  • Number of Frames: 32
  • Frames per Second (FPS): 8

Maintain the default state for the remaining settings. For a more comprehensive understanding of the AnimateDiff extensions, it is recommended to explore the official AnimateDiff GitHub page.

Stable Diffusion in the Cloud⚡️Run Automatic1111 in your browser in under 90 seconds

5. Txt2img Settings (LCM LoRA)

Head to the top, choose a checkpoint for generating a video animation, it's your call whether you prefer a realistic or cartoon-style outcome. The decision is entirely yours, but for the tutorial's sake, we'll go with a realistic checkpoint. Here are the settings employed to produce the ultimate video animation:


We included the LCM LoRA to speed up rendering. We avoid adding extra keywords to our positive prompt because we rely on image prompting using our reference image in ControlNet, along with the IP-Adapter.

  • Checkpoint: Realistic Vision
  • Sampling Method: LCM
  • Sampling Steps: 8
  • Width & Height: 368 x 656 (9:16 Ratio)
  • CFG Scale: 2 (This incorporates the negative prompt; selecting 1 will exclude it)
  • Seed: -1

Note: Depending on the checkpoint and the data it has been trained on, your animation might split horizontally into two strange parts if your dimensions are too large. Choose smaller width and height values and decide if you want to use "Hires. fix" to upscale your final video or GIF animation.

6. Examples: Image to Video Animations

With all the settings configured, you can now click on "Generate" to experience faster video generation, thanks to the inclusion of LCM LoRA. Once the generation is complete, you can find the generated video in the specified file path: "stable-diffusion-webui\outputs\txt2img-images\AnimateDiff".

Remember, these video animations are produced without employing "Hires. Fix" and are not upscaled. Additionally, it's worth noting that using face portraits as reference images typically produces superior results compared to full-body reference images. Nevertheless, we'll still present a variety of impressive videos for you below.

Stable Diffusion in the Cloud⚡️Run Automatic1111 in your browser in under 90 seconds

7. Troubleshooting

When facing fuzzy or irrelevant outputs while using Animatediff alongside Controlnet, the problem likely stems from the version of Controlnet being incompatible. To resolve this, ensure you revert to a Controlnet version that functions well with Animatediff by following these steps:

  • Navigate to the "extensions/sd-webui-controlnet" directory and access the terminal by typing "cmd" into the file location text field located in the top bar.


  • Execute the following command in the terminal: git checkout -b new_branch 10bd9b25f62deab9acb256301bbf3363c42645e7


  • Next, execute the following command in the terminal: git pull


  • Close the Terminal and Restart Stable Diffusion for the changes to take effect.

When using the appropriate version of Controlnet that is compatible with the Animatediff extension, this workflow should function correctly.

8. Conclusion

In conclusion, our exploration into transforming static images into dynamic videos or GIFS through AnimateDiff, LCM LoRA's, and IP-Adapters within the Stable Diffusion framework (A1111) showcases a powerful and efficient process. With streamlined settings and careful integration, this method empowers creators to effortlessly breathe life into face portraits and close-up images. The showcased examples demonstrate the impressive results, emphasizing the potential of this approach for elevating content creation. The synergy of advanced technologies within Stable Diffusion offers a user-friendly yet highly customizable solution, making it a valuable tool for artists, influencers, and storytellers seeking to create captivating video animations with ease.

Frequently Asked Questions

To employ this innovative approach, content creators should install AnimateDiff and LCM LoRA's, making sure to keep ControlNet up to date. The procedure includes acquiring IP-Adapter models, emphasizing the importance of "ip-adapter-plus_sd15.safetensors." By integrating LCM LoRA for faster rendering.

While the method is optimized for face portraits and close-up images, it can also be applied to other types of images. The key recommendation is to use images that focus on the upper body or close-up personas for optimal results. Creators have the flexibility to experiment with different reference images, but it's worth noting that face portraits typically yield superior outcomes compared to full-body shots.

character creation tool card

AI Image Generation Tool

Create Your AI Images in Seconds!

News & Updates