Releasing Flan-T5xxl_TE-only in FP32, FP16, and GGUF Formats!

Flux1_euler_Anime Illustration of A girl holding a chalkboard that says fl_00004_.png (1600×1600)
  • Flan-T5xxl is a new-generation text encoder.
  • TE-only extracts only the parts needed for image generation.
  • GGUF format is an even lighter model.

Introduction

Hello, I’m Easygoing.

This time, I’ve released the FP32, FP16, and GGUF formats of Flan-T5xxl_TE-only, and I’d like to introduce them to you.

What Is Flan-T5xxl?

T5 is a text-to-text generation AI model released by Google in 2020.

Model Name Size (FP32 Format) Number of Parameters
T5-Small 0.24 GB 60 million
T5-Base 0.88 GB 220 million
T5-Large 3.08 GB 770 million
T5-XL 11.2 GB 2.8 billion
T5-XXL 44 GB 11 billion

The T5xxl model is the version of T5 with the largest number of parameters and is used as the text encoder for image generation AIs like Flux.1 and SD 3.5.

Model Name Release Parameters Token Comprehensible Text
T5xxl October 2020 11 billion 32,000 Long Sentences & Context
T5xxl v1.1 June 2021 11 billion 32,000 Long Sentences & Context
Flan-T5xxl October 2022 11 billion 32,000 Long Sentences & Context

Flan-T5xxl is a fine-tuned version of T5xxl with improved accuracy.

Model List

The following models are available on Hugging Face:

Screenshot of the download list screen for the flan-t5-xxl-fused model with comment.png (1600×692)
I recommend starting with flan_t5_xxl_TE-only_FP16.safetensors.
Model File Size Accuracy (SSIM Similarity) Recommended
flan_t5_xxl_fp32.safetensors 44.1GB 100%
flan_t5_xxl_fp16.safetensors 22.1GB 99.9%
flan_t5_xxl_TE-only_FP32.safetensors 18.7GB 100% 🔺
flan_t5_xxl_TE-only_FP16.safetensors 9.4GB 99.9%
flan_t5_xxl_TE-only_Q8_0.gguf 5.5GB 99.8%
flan_t5_xxl_TE-only_Q6_K.gguf 4.4GB 99.7% 🔺
flan_t5_xxl_TE-only_Q5_K_M.gguf 3.8GB 98.4% 🔺
flan_t5_xxl_TE-only_Q4_K_M.gguf 3.2GB 95.2%
flan_t5_xxl_TE-only_Q3_K_L.gguf 2.6GB 84.9%

Flan-T5xxl_TE-only is a lightweight model where only the text encoder portion, necessary for image generation AI, has been extracted.

When using Flan-T5xxl for image generation AI, the full model and the TE-only model produce the same output, so using the TE-only model can save storage space.

The GGUF format is a further lightweight version of Flan-T5xxl_TE-only_FP32 created through quantized compression.

The "Q8" number represents bit depth; smaller numbers mean lighter file sizes but lower accuracy.

For using Flan-T5xxl in image generation, I recommend Q5_K_M or higher.

How to Use Flan-T5xxl!

Place the downloaded files in one of the following folders:

  • Installation Folder/models/text_encoder
  • Installation Folder/models/clip
  • Installation Folder/Models/CLIP

Stable Diffusion WebUI Forge

In Stable Diffusion WebUI Forge, select the Flan-T5xxl model instead of the usual T5xxl_v1_1.

Stable Diffusion WebUI Forge Operation Screenshot with Comments.png (1600×475)

Stable Diffusion WebUI Forge doesn’t support FP32 format, so please use the FP16 or GGUF format models.

ComfyUI

When using ComfyUI, since T5xxl models are large, keeping them in system RAM instead of VRAM can reduce model loading times.

ComfyUI-MultiGPU Custom Node

ComfyUIManager の MultiGPU の検索画面 コメント入り.png (1600×850)

DualCLIPLoaderMultiGPU and DualCLIPLoaderGGUFMultiGPU Nodes

Screenshots of ComfyUI's DualCLIPLoaderMultiGPU and DualCLIPLoaderGGUFMultiGPU custom nodes with comments.png (1600×679)

Using the DualCLIPLoaderMultiGPU or DualCLIPLoaderGGUFMultiGPU nodes from the ComfyUIMultiGPU custom node pack, you can explicitly load the model into system RAM by selecting cpu as the device.

In ComfyUI, you can also use FP32 text encoder formats by enabling the --fp32-text-enc setting at startup.

Try the Improved CLIP-L Too!

In addition to T5xxl, the text encoder CLIP-L can also be upgraded to a more accurate version:

Since Flan-T5xxl and CLIP-L serve different roles, upgrading both can further improve image quality.

CLIP-L is lighter than T5xxl, so I highly recommend giving it a try.

Claude and Grok’s Coding Is Amazing!

I can’t write code myself, so all the code used for this conversion was written by AI.

The two AIs I used this time are:

  • Claude 3.7 (Released February 25, 2025): Accurate code
  • Grok3 (Released February 15, 2025): Web search for the latest info and ability to upload many files

I tried the same conversion two months ago using ChatGPT, but the code didn’t work, and it failed.

Flux1_euler_Anime Illustration of A young woman sitting at a table in a ca_00042_.png (1600×1600)

This time, I had Claude 3.7 write the initial code, uploaded error messages to Grok 3 for iterative fixes, and as a result, my workflow efficiency improved dramatically, allowing me to complete the model.

If you’re using AI to write code, I highly recommend trying this method!

Conclusion: Give Flan-T5xxl a Try

  • Flan-T5xxl is a new-generation text encoder.
  • TE-only extracts only the parts needed for image generation.
  • GGUF format is an even lighter model.

Image generation AIs like Flux.1 and SD 3.5 can produce high-quality illustrations, and upgrading the text encoder can boost image quality even further.

FAnime Illustration of A young girl holding a chalkboard that says flan t5 and te-only.png (1600×1600)
Text rendering is also highly accurate

One of the charms of local image generation is the ability to freely use models openly shared worldwide.

Now that the storage size disadvantage of using the Flan-T5xxl model is gone, why not take this opportunity to upgrade?

Thank you for reading to the end!