Model Name	Release	Parameters	Token	Comprehensible Text
T5xxl	October 2020	11 billion	32,000	Long Sentences & Context
T5xxl v1.1	June 2021	11 billion	32,000	Long Sentences & Context
Flan-T5xxl	October 2022	11 billion	32,000	Long Sentences & Context

Flan-T5xxl is a fine-tuned version of T5xxl, further improving its accuracy. Using the Flan-T5xxl model in image generation AI can lead to improved prompt comprehension and enhanced image quality.

Model List

On the Hugging Face Flan-T5xxl release page, the following models are available:

Model	Size	SSIM Similality	Reccomend
FP32	19 GB	100.0 %	🔺
FP16	9.6 GB	98.0 %	✅
FP8	4.8 GB	95.3 %	🔺
Q8_0	6 GB	97.6 %	✅
Q6_K	4.9 GB	97.3 %	🔺
Q5_K_M	4.3 GB	94.8 %
Q4_K_M	3.7 GB	96.4 %

Flan-T5xxl_TE-only_MAE_SSIM_Similarity.png (1200×848)

Flan-T5xxl_TE-only is a lightweight model that extracts only the text encoder portion used in image generation AI.

GGUF format is a further lightweight model created by quantization compression based on Flan-T5xxl_TE-only_FP32.

Optimize Your Flux.1 Experience! How to Use GGUF Format with Forge and ComfyUI | AI image journey

The number in GGUF’s Q8 represents the bit count; smaller numbers mean lighter file sizes but lower accuracy.

For image generation with Flan-T5xxl, I recommend using Q6_K or higher.

How to Use Flan-T5xxl!

Place the downloaded files in one of the following folders:

Installation folder/models/text_encoder
Installation folder/models/clip
Installation folder/Models/CLIP

ComfyUI

When using Flux.1 in ComfyUI, load the text encoder using the DualCLIPLoader node.

As of April 13, 2025, the default DualClipLoader node now includes a device selection option, allowing you to choose where to load the model.

cuda → VRAM
cpu → System RAM

Since Flux.1’s text encoder has a large capacity, in most cases, setting the device to cpu and storing the model in system RAM will improve performance.

Unless your system RAM is 16GB or less, storing the model in system RAM is more effective than lightweighting with GGUF, so there is currently little advantage to using GGUF format in ComfyUI.

When running Flux.1 in ComfyUI, use the FP16 format text encoder.

[ComfyUI Intermediate] Settings to Control VRAM and Unlock Peak Performance! | AI image journey

In ComfyUI, you can also use the FP32 format text encoder by enabling the --fp32-text-enc setting at startup.

Stable Diffusion WebUI Forge

In Stable Diffusion WebUI Forge, select the Flan-T5xxl model instead of the usual T5xxl_v1_1.

To use the FP32 format text encoder in Stable Diffusion WebUI Forge, start with the --clip-in-fp32 option.

Use the Improved CLIP-L Too!

For Flux.1 and SD 3.5, in addition to T5xxl, you can upgrade CLIP-L to a higher-accuracy version.

LongCLIP-SAE-ViT-L-14 Model (ComfyUI only)
CLIP-SAE-ViT-L-14 Model

Since Flan-T5xxl and CLIP-L serve different functions, upgrading both can further improve image quality.

What Are CLIP and T5xxl ? How Text Encoders Can Make Illustrations Stunning! | AI image journey

CLIP-L is lighter than T5xxl, so I highly recommend trying this upgrade as well.

Claude and Grok’s Coding is Amazing!

I don’t write code myself, so all the code used for this conversion was written by AI.

The two AIs I used are:

Claude 3.7 (Released February 25, 2025): Highly accurate code
Grok3 (Released February 15, 2025): Can search the web for the latest information and handle multiple file uploads

I tried the same conversion two months ago using ChatGPT, but the code didn’t work, and it failed.

Flux1_euler_Anime Illustration of A young woman sitting at a table in a ca_00042_

This time, I had Claude 3.7 write the initial code, then uploaded error messages to Grok 3 for iterative fixes, which greatly improved efficiency and allowed me to complete the model.

If you’re using AI to write code, I highly recommend trying this approach!