- Published on
IP Adapters: All you need to know
- Authors
- Name
- F4AI
IP-adapter (Image Prompt adapter) is a Stable Diffusion add-on for using images as prompts, similar to Midjourney and DaLLE 3. You can use it to copy the style, composition, or a face in the reference image.
The post will cover:
- IP-Adapter models – Plus, Face ID, Face ID v2, Face ID portrait, etc.
- How to use IP-adapters in AUTOMATIC1111 and ComfyUI.
Table of Contents
IP-adapter models
The number of IP-adapter models is growing rapidly. It can be difficult to keep track of if you are not following closely.
This section gives you an overview of all IP-adapters released so far. You can find the original repository below.
- IP-Adapter GitHub page (codes)
- IP-Adapter models (Hugging Face)
- IP-Adapter Face ID models (Hugging Face)
An image encoder processes the reference image before feeding into the IP-adapter. Two image encoders are used in IP-adapters:
- OpenClip ViT H 14 (aka SD 1.5 version, 632M paramaters)
- OpenClip ViT BigG 14 (aka SDXL version, 1845M parameters)
However, things got muddled when some SDXL IP-Adapter models also got trained with the H version. For clarity, I will refer them as the ViT H and ViT BigG version. (ViT stands for Vision Transformer)
The Original IP-adapter
- Image Encoder: ViT H
- Model: IP-adapter SD 1.5
The original IP-adapter uses the CLIP image encoder to extract features from the reference image. The novelty of the IP-adapter is training separate cross-attention layers for the image. This makes the IP-adapter more effective in steering the image generation processing.
IP-adapter model
Here’s what IP-adapter’s output looks like. It loosely follows the content of the reference image. The DreamShaper 8 model and an empty prompt were used.
Reference image | IP-adapter |
IP adapter Plus
- Image Encoder: ViT H
- Model: IP-Adapter Plus
IP-Adapter Plus uses a patch embedding scheme similar to Flamingo‘s Percepter-Resampler to encode the image. The IP adapter Plus model produces images that follow the original reference more closely. The fine-grained details, like the face, are usually not copied correctly.
Reference image | IP-Adapter Plus |
IP-Adapter Plus Face
- Image Encoder: ViT H
- Model: IP-Adapter Plus Face
IP-Adapter Plus Face model has the same architecture as the IP-Adapter Plus. The model weight is fine-tuned for using a cropped face as the reference.
You should use a close-up face as the reference. (like the one below)
The face is followed much more closely.
Reference | IP-Adapter Plus Face |
With IP-Adapter Plus Face, you can direct the image much easier with the prompt.
A girl in office, white professional shirt
IP-Adapter SDXL
There are two versions of IP-Adapter SDXL. One was trained with ViT BigG, and the other was trained with ViT H.
I will use the DreamShaper SDXL model for SDXL versions of the IP-Adapter.
ViT BigG version
- Image Encoder: ViT BigG
- Model: IP-Adapter SDXL
This is the original SDXL version of the IP-Adapter. It uses the bigger image encoder BigG.
Reference image | IP Adapter SDXL |
IP Adapter SDXL
ViT H version
- Image Encoder: ViT H
- Model: IP-Adapter SDXL ViT H
Reference image | IP-Adapter SDXL ViT H |
IP-Adapter SDXL ViT H
IP-Adapter Plus SDXL
- Image Encoder: ViT H
- Model: IP-Adapter Plus SDXL ViT H
The Plus version likewise uses the patch image embeddings and the ViT H image adapter. It follows the reference images more closely.
Reference image | SDXL Plus |
IP-Adapter Plus Face SDXL
- Image Encoder: ViT H
- Model: IP-Adapter Plus Face SDXL ViT H
IP-Adapter Plus Face SDXL model has the same architecture as the IP Adapter Plus SDXL model but uses images of cropped faces for conditioning.
It copies the face more closely.
Reference | Plus Face SDXL |
IP-Adapter Face ID
- Image Encoder: InsightFace
- Model: IP-Adapter Face ID
- LoRA: Face ID SD 1.5
IP-Adapter Face ID uses InsightFace to extract the Face ID embedding from the reference image.
You need to use the accompanying LoRA with the Face ID model. It is recommended
A girl in office, white professional shirt <lora:ip-adapter-faceid_sd15_lora>
Not quite sure if this is working.
Reference image | Face ID SD 1.5 |
IP-Adapter Face ID SDXL
- Image Encoder: InsightFace
- Model: IP-Adapter Face ID SDXL
- LoRA: Face ID SDXL
IP-Adapter Face ID uses InsightFace to extract the Face ID embedding from the reference image.
The InsightFace model is the same as the SD1.5’s. There’s no SDXL version
You need to use the accompanying LoRA with the Face ID model.
I am not able to get this one to work.
Reference image | Face ID SDXL |
IP-Adapter Face ID Plus
- Image Encoder: InsightFace and CLIP image embedding
- Model: IP-Adapter Face ID Plus
- LoRA: Face ID Plus SD 1.5
IP-Adapter Face ID Plus uses everything in the image encoder toolbox:
- InsightFace for the facial features
- CLIP image encoder for global facial features
- Use Perceiver-Resampler to combine them
Use the LoRA with a weight between 0.5 and 0.7.
A girl in office, white professional shirt <lora:ip-adapter-faceid-plus_sd15_lora>
Reference image | Face ID Plus |
The LoRA seems to have the effect of following the color scheme of the reference image. Removing the LoRA (or setting the weight to 0) also works.
A girl in office, white professional shirt
Reference image | Face ID Plus (No LoRA) |
IP-Adapter Face ID Plus v2
- Image Encoder: InsightFace and CLIP image embedding
- Model: IP-Adapter Face ID Plus v2
- LoRA: Face ID Plus SD Plus v2 1.5
IP-Adapter Face ID Plus is the same as Face ID Plus except
- An improved model checkpoint and LoRA
- Allowing setting a weight on the CLIP image embedding
The LoRA is necessary for the Face ID Plus v2 to work. Use a value between 0.5 and 1.0. The higher it is, the stronger the effect.
A girl in office, white professional shirt <lora:ip-adapter-faceid-plusv2_sd15_lora>
Reference image | Face ID Plus v2 |
IP-Adapter Face ID Plus v2 SDXL
- Image Encoder: InsightFace and CLIP image embedding
- Model: IP-Adapter Face ID Plus v2 SDXL
- LoRA: Face ID Plus SD Plus v2 SDXL
IP-Adapter Face ID Plus v2 SDXL is the SDXL version of Face ID Plus v2.
It doesn’t look so hot in my test.
Reference image | Face ID Plus v2 SDXL |
IP-Adapter Face ID Portrait
- Image Encoder: InsightFace
- Model: IP-Adapter Face ID Portrait
IP-Adpater Face ID Portrait has the same model architecture as Face ID but accepts multiple images of cropped faces.
Using IP-Adapter in AUTOMATIC11111
Software setup
We will use AUTOMATIC1111 , a popular and free Stable Diffusion software, in this section. You can use this GUI on Windows, Mac, or Google Colab.
Check out the Quick Start Guide if you are new to Stable Diffusion. Check out the AUTOMATIC1111 Guide if you are new to AUTOMATIC1111.
Install ControlNet Extension
You will need to install the ControlNet extension to use IP-Adapter.
- Start AUTOMATIC1111 Web-UI normally.
2. Navigate to the Extension Page.
3. Click the Install from URL tab.
4. Enter the following URL in the URL for extension’s git repository field.
https://github.com/Mikubill/sd-webui-controlnet
5. Click the Install button.
6. Wait for the confirmation message that the installation is complete.
7. Restart AUTOMATIC1111.
You will need to select an SDXL checkpoint model to select an SDXL IP Adapter. The list won’t refresh automatically. You can change the Control Type and change back to see the new options.
Download IP-Adapter and LoRA models
For AUTOMATIC1111, you need to download the IP-Adapter and LoRA models to use them.
Pick the one you want to use in the table below.
Version | Preprocessor | IP-Adapter Model | LoRA |
---|---|---|---|
SD 1.5 | ip-adapter_clip_sd15 | ip-adapter_sd15 | |
SD 1.5 Plus | ip-adapter_clip_sd15 | ip-adapter_sd15_plus | |
SD 1.5 Plus Face | ip-adapter_clip_sd15 | ip-adapter-plus-face_sd15 | |
SD 1.5 Face ID | ip-adapter_face_id | ip-adapter-faceid_sd15 | |
SD 1.5 Face ID Plus | ip-adapter_face_id_plus | ip-adapter-faceid-plus_sd15 | ip-adapter-faceid-plus_sd15_lora |
SD 1.5 Face ID Plus V2 | ip-adapter_face_id_plus | ip-adapter-faceid-plusv2_sd15 | ip-adapter-faceid-plusv2_sd15_lora |
SDXL | ip-adapter_clip_sdxl | ip-adapter_sdxl | |
SDXL ViT H | ip-adapter_clip_sdxl_plus_vith | ip-adapter-sdxl_vit-h | |
SDXL Plus ViT H | ip-adapter_clip_sdxl_plus_vith | ip-adapter-plus_sdxl_vit-h | |
SDXL Face ID | ip-adapter_face_id | ip-adapter-faceid_sdxl | ip-adapter-faceid_sdxl_lora |
SDXL Face ID Plus v2 | p-adapter_face_id_plus | ip-adapter-faceid-plusv2_sdxl | ip-adapter-faceid-plusv2_sdxl_lora |
Windows or Mac
Download the IP-Adapter models and put them in the folder stable-diffusion-webui > models > ControlNet.
Download the LoRA models and put them in the folder stable-diffusion-webui > models > Lora.
Google Colab
If you use our AUTOMATIC1111 Colab notebook,
- Put the IP-adapter models in your Google Drive under AI_PICS > ControlNet folder.
- Put the LoRA models in your Google Drive under AI_PICS > Lora folder.
You need to select the ControlNet extension to use the model.
Using an IP-adapter model in AUTOMATIC1111
I will use the SD 1.5 Face ID Plus V2 as an example. The usage of other IP-adapters is similar. You only need to follow the table above and select the appropriate preprocessor and model.
Step 1: Select a checkpoint model
Since we will use an SD 1.5 IP-adapter, you must select an SD 1.5 model. I will use the Dreamshaper 8 model.
On the txt2img page, select dreamshaper_8.safetensors in the Stable Diffusion Checkpoint dropdown menu.
Step 2: Enter a prompt and the LoRA
Enter a prompt and, optionally, a negative prompt. E.g.,
A woman in office, white professional shirt
disfigure, deformed, ugly
Look at the table above to see if the IP-adapter needs to be used with a LoRA. You need one for the SD 1.5 Face ID Plus V2 model.
Select the Lora tab. Select the appropriate LoRA. In our case, it is ip-adapter-faceid-plusv2_sd15_lora.
The LoRA directive should be inserted in the prompt. Set the LorA weight to 0.6 to 1 to adjust the effect.
A woman in office, white professional shirt <lora:ip-adapter-faceid-plusv2_sd15_lora>
Step 3: Enter ControlNet setting
Scroll down to the ControlNet section.
Upload an reference image to the Image Canvas.
Enable: Yes
Control Type: IP-Adapter
Select the preprocessor and model according to the table above. In this example, they are:
- Preprocessor: ip-adapter_face_id_plus
- Model: ip-adapter-faceid-plusv2_sd15
Click Generate to create an image.
Multiple IP-adapter Face ID
You can use multiple IP-adpater to acheive something similar to the IP-adapter portriat model.
Below are 3 pictures from the LoRA training dataset.
You can use them in ControlNet Unit 0, 1 and 2. The setting for each one is the same as above, except you need to set the Control Weight to 0.3 so that they sum up to about 1.
Write a prompt for what you want to generate.
Selfie photo of a woman, smiling, disney, mickey headwear <lora:ip-adapter-faceid-plusv2_sd15_lora>
Now you get her face without training a LoRA!
Using IP-Adapter in ComfyUI
Software setup
We will use ComfyUI to generate images in this section. It is an alternative to AUTOMATIC1111.
Read the ComfyUI installation guide and ComfyUI beginner’s guide if you are new to ComfyUI.
You will need the IP Adapter Plus custom node to use the various IP-adapters.
Install InsightFace for ComfyUI
You must install InsightFace before using the Face ID workflows.
ComfyUI has recently updated the Python version. You first need to determine what Python version you are using.
Open the File Explorer App. Navigate to the ComfyUI_windows_portable folder. In the address bar, type cmd and press Enter.
A terminal should show up.
In the terminal, type the following command and press Enter.
.\python_embeded\python.exe --version
It should show either Python 3.10.x or 3.11.x.
Download the InsightFace installation file.
Put the file in the ComfyUI_windows_portable folder.
Go back to the terminal.
If you use Python 3.10.x, run:
.\python_embeded\python.exe -m pip install .\insightface-0.7.3-cp310-cp310-win_amd64.whl
If you use Python 3.11.x, run:
.\python_embeded\python.exe -m pip install .\insightface-0.7.3-cp311-cp311-win_amd64.whl
Insight Face should have been installed after running the command.
Download models and LoRAs
To use the workflow
- Install the model files according to the instructions below the table.
- Download the workflow JSON in the workflow column.
- Select the IPAdapter Unified Loader Setting in the ComfyUI workflow.
Download the IP-adapter models and LoRAs according to the table above.
Put the IP-adapter models in the folder: ComfyUI > models > ipadapter.
Put the LoRA models in the folder: ComfyUI > models > loras.
You also need these two image encoders.
- OpenClip ViT BigG (aka SDXL – rename to CLIP-ViT-bigG-14-laion2B-39B-b160k.safetensors)
- OpenClip ViT H (aka SD 1.5 – rename to CLIP-ViT-H-14-laion2B-s32B-b79K.safetensors)
Put them in ComfyUI > models > clip_vision.
IP-Adapter SD 1.5
Use the following workflow for IP-Adapter SD 1.5, SD 1.5 Plus, and SD 1.5 Plus Face. Change the unified loader setting according to the table above.
IP-Adapter SDXL
Use the following workflow for IP-Adapter SDXL, SDXL ViT, and SDXL Plus ViT. Set the models according to the table above.
IP-Adapter Face ID SD 1.5
Use the following workflow for IP-Adapter Face ID SD 1.5. Set the models according to the table above.
IP-Adapter Face ID SDXL
Use the following workflow for IP-Adpater Face ID SDXL. Set the models according to the table above.