Releasing Flan-T5xxl_TE-only in FP32, FP16, and GGUF Formats!

Flux1_euler_Anime Illustration of A girl holding a chalkboard that says fl_00004_
  • Flan-T5xxl is a new-generation text encoder.
  • TE-only extracts only the parts needed for image generation.
  • GGUF format is an even lighter model.

Introduction

Hello, this is Easygoing.

Today, I’m excited to introduce the release of the FP32, FP16, and GGUF formats of Flan-T5xxl_TE-only.

What is Flan-T5xxl?

T5 is an AI model for text-to-text generation released by Google in 2020.

Model Name Size (FP32 Format) Number of Parameters
T5-Small 0.24 GB 60 million
T5-Base 0.88 GB 220 million
T5-Large 3.08 GB 770 million
T5-XL 11.2 GB 2.8 billion
T5-XXL 44 GB 11 billion

The T5xxl model is the largest-parameter model in the T5 family and is used as the text encoder for parsing prompts in Flux.1 and SD 3.5.

Model Name Release Parameters Token Comprehensible Text
T5xxl October 2020 11 billion 32,000 Long Sentences & Context
T5xxl v1.1 June 2021 11 billion 32,000 Long Sentences & Context
Flan-T5xxl October 2022 11 billion 32,000 Long Sentences & Context

Flan-T5xxl is a fine-tuned version of T5xxl, further improving its accuracy. Using the Flan-T5xxl model in image generation AI can lead to improved prompt comprehension and enhanced image quality.

Model List

On the Hugging Face Flan-T5xxl release page, the following models are available:

Screenshot of model listing in huggingface's flan-t5xxl repository with comment.png (1600×734)
I recommend starting with flan_t5_xxl_TE-only_FP16.safetensors.
Model Size SSIM Similality Reccomend
FP32 19 GB 100.0 % 🔺
FP16 9.6 GB 98.0 %
FP8 4.8 GB 95.3 % 🔺
Q8_0 6 GB 97.6 %
Q6_K 4.9 GB 97.3 % 🔺
Q5_K_M 4.3 GB 94.8 %
Q4_K_M 3.7 GB 96.4 %
Flan-T5xxl_TE-only_MAE_SSIM_Similarity.png (1200×848)

Flan-T5xxl_TE-only is a lightweight model that extracts only the text encoder portion used in image generation AI.

GGUF format is a further lightweight model created by quantization compression based on Flan-T5xxl_TE-only_FP32.

The number in GGUF’s Q8 represents the bit count; smaller numbers mean lighter file sizes but lower accuracy.

For image generation with Flan-T5xxl, I recommend using Q6_K or higher.

How to Use Flan-T5xxl!

Place the downloaded files in one of the following folders:

  • Installation folder/models/text_encoder
  • Installation folder/models/clip
  • Installation folder/Models/CLIP

ComfyUI

When using Flux.1 in ComfyUI, load the text encoder using the DualCLIPLoader node.

DualClipLoader.png (1195×560)

As of April 13, 2025, the default DualClipLoader node now includes a device selection option, allowing you to choose where to load the model.

  • cuda → VRAM
  • cpu → System RAM

Since Flux.1’s text encoder has a large capacity, in most cases, setting the device to cpu and storing the model in system RAM will improve performance.

Unless your system RAM is 16GB or less, storing the model in system RAM is more effective than lightweighting with GGUF, so there is currently little advantage to using GGUF format in ComfyUI.

When running Flux.1 in ComfyUI, use the FP16 format text encoder.

In ComfyUI, you can also use the FP32 format text encoder by enabling the --fp32-text-enc setting at startup.

Stable Diffusion WebUI Forge

In Stable Diffusion WebUI Forge, select the Flan-T5xxl model instead of the usual T5xxl_v1_1.

Stable Diffusion WebUI Forge Operation Screenshot with Comments.png (1600×475)

To use the FP32 format text encoder in Stable Diffusion WebUI Forge, start with the --clip-in-fp32 option.

Use the Improved CLIP-L Too!

For Flux.1 and SD 3.5, in addition to T5xxl, you can upgrade CLIP-L to a higher-accuracy version.

Since Flan-T5xxl and CLIP-L serve different functions, upgrading both can further improve image quality.

CLIP-L is lighter than T5xxl, so I highly recommend trying this upgrade as well.

Claude and Grok’s Coding is Amazing!

I don’t write code myself, so all the code used for this conversion was written by AI.

The two AIs I used are:

  • Claude 3.7 (Released February 25, 2025): Highly accurate code
  • Grok3 (Released February 15, 2025): Can search the web for the latest information and handle multiple file uploads

I tried the same conversion two months ago using ChatGPT, but the code didn’t work, and it failed.

Flux1_euler_Anime Illustration of A young woman sitting at a table in a ca_00042_

This time, I had Claude 3.7 write the initial code, then uploaded error messages to Grok 3 for iterative fixes, which greatly improved efficiency and allowed me to complete the model.

If you’re using AI to write code, I highly recommend trying this approach!

Conclusion: Try Flan-T5xxl

  • Flan-T5xxl is a new-generation text encoder.
  • TE-only extracts only the necessary components.
  • GGUF format is a further lightweight model.

The biggest appeal of local image generation is the freedom to use high-quality models shared globally for free.

Flux.1 and SD 3.5 can generate high-quality illustrations, but upgrading the text encoder can further enhance image quality.

FAnime Illustration of A young girl holding a chalkboard that says flan t5 and te-only
Text rendering is also highly accurate

With the release of the TE-only model, the storage size disadvantage of using Flan-T5xxl is gone, so why not give this upgrade a try?

Thank you for reading to the end!


Update History

April 20, 2025

Added details on using FP32 format in Stable Diffusion WebUI Forge.

April 15, 2025

Revised content to reflect ComfyUI updates.

April 2, 2025

Added a link to the ComfyUI-MultiGPU guide.

March 20, 2025

Updated the Flan-T5xxl model list and table.

March 15, 2025

Added the Civitai link.

March 10, 2025

Added a sample workflow for Flux.1_MultiGPU.