Optimize Your Flux.1 Experience! How to Use GGUF Format with Forge and ComfyUI


- Save memory with the GGUF format.
- Choose a compression format that matches your VRAM capacity.
- Use flux_tool to create the optimal GGUF format for your needs.
Introduction
Hello, I’m Easygoing.
Today, I’ll introduce the GGUF format, which lets you run Flux.1 comfortably.
Flux.1 Models Are Large in Size
As mentioned in my previous post, Flux.1, a new image generation AI, has a large model size, requiring 16GB of VRAM to run in FP16 or FP8 formats.
The GGUF format, which has recently gained popularity, uses quantized compression to significantly reduce memory usage.
In this post, I’ll explain how to use the GGUF format.
Comparing Real Illustrations
Let’s start by looking at some actual illustrations.
- Image to image
- 1440 x 1440 → Hires 2576 x 2576
FP16 (16-bit)

FP8 (8-bit)

Q8.gguf (8-bit)

Q5_K_M.gguf (5-bit)

Q3_K_L.gguf (3-bit)

Q2_K.gguf (2-bit)

Features of GGUF Format
Both FP8 and Q8.gguf are 8-bit and have the same size, but Q8.gguf often offers higher precision and better image quality.
In GGUF format, as the bit count decreases, image quality degrades, losing background detail and color depth.
Generally, Q4 to Q5 strikes a good balance between size and precision, with Q3 being the minimum usable threshold before collapse.
What Do the Letters Mean?
Taking Q4 as an example, the letters that follow have these meanings:
- Q4: Original method
- Q4_K: K-quantization (improved precision)
- Q4_K_S: K-quantized, smaller size, lower precision
- Q4_K_M: K-quantized, intermediate format
- Q4_K_L: K-quantized, larger size, higher precision
Precision increases as you move down the list.
Let’s Try It Out!
Now, let’s go over how to use GGUF format models in practice.
GGUF format files for Flux.1 are available here.
Base Models
Flux.1[dev]
Flux.1[shnell]
blue_pencil-flux1
Text Encoder
T5xxl_v1_1
Flan-T5xxl
Flan-T5xxl is an improved version of T5xxl_v1_1, enhancing prompt comprehension and image quality, so I recommend using this one.
Which One Should You Choose?
GGUF format comes in various types. Based on my environment, here are the formats that fit within VRAM limits:
Stable Diffusion webUI Forge
Base model | T5xxl | |
---|---|---|
VRAM 16GB | FP16 | FP16 |
VRAM 12GB | Q5_K_S | FP16 |
ComfyUI
Base model | T5xxl | |
---|---|---|
VRAM 16GB | FP16 | FP16 |
VRAM 12GB | Q6_K | Q8 |
VRAM 8GB | Q3_K_L | Q6_K |
If VRAM usage exceeds the limit with these models, I recommend using a format one step smaller.
If your display monitor is connected to the GPU instead of the integrated output, it uses additional VRAM, so an even smaller size may be appropriate.

How to Use Them!
Here’s how to use them in practice.
Placing the Downloaded Files
Place the downloaded files in one of these locations:
Base Models
- Installation Folder/models/Stable-diffusion
- Installation Folder/models/unet
- Installation Folder/Models/Unet
Text Encoders
- Installation Folder/models/text_encoder
- Installation Folder/models/clip
- Installation Folder/Models/CLIP
Stable Diffusion webUI Forge: Super Easy!
First, here’s the generation screen in Stable Diffusion webUI Forge.

As usual, select the GGUF format model in the base model selection screen instead of the standard format.
For the text encoder, choose the corresponding GGUF format file under the VAE/text encoder section.
To deselect an option, press the ❌ button on the right.
- T5xxl (any GGUF format)
- clip-l
- ae.safetensors
Confirm that these three are selected in VAE/text encoder, then enter your prompt as usual and press Generate to create the image!

ComfyUI: Requires a Custom Node
To use GGUF format in ComfyUI, you need to install the ComfyUI-GGUF custom node.
Here, I’ll explain how to install it using the ComfyUI Manager.
First, launch ComfyUI and open the Manager.

Next, select Custom Nodes Manager.

Search for “gguf” to find the ComfyUI-GGUF custom node, then install it.

A restart message will appear, so restart ComfyUI.
When creating a workflow, select Unet Loader (GGUF) and Dual Clip Loader (GGUF).

This allows you to select GGUF format models.
If you’re using EasyForge, the ComfyUI-GGUF custom node is already installed!
Drawbacks of GGUF Format
GGUF format offers significant advantages in saving storage and memory, but there are drawbacks in certain cases.
It Doesn’t Speed Up Much If VRAM Is Plentiful
GGUF format files are generated by quantizing and compressing the original format.
- Example: FP16 → Q8.gguf
Q8.gguf performs calculations with quantized integers, making it generally faster than FP16, but key parts are reconverted to FP16 for processing.

So, if you have ample VRAM and can process in FP16, using GGUF might not speed things up.
ComfyUI Might Overuse VRAM!?
ComfyUI keeps models in VRAM as much as possible to speed up processing.
When using non-default custom nodes, capacity might not be counted correctly, preventing model unloading and causing VRAM overflow.
In this case, connecting an Unload All Models node before the VRAM-exceeding process resolves the issue.
Example: Connect Unload All Models between conditioning and Sampler

The Unload All Models custom node can be installed by searching for “ComfyUI-Unload-Model” in the Manager.
Recently, both ComfyUI and ComfyUI-GGUF receive daily updates, and in my environment, VRAM overflow varies day by day.

This issue will likely be resolved eventually, but the Unload All Models node is useful in other scenarios too, so it’s worth learning.
Create Your Own GGUF Files!
So far, I’ve explained how to use GGUF format.
Now, I’ll show you how to create your own GGUF files.
Use Flux_tool!
As introduced in a previous article, the flux_tool in Zuntan’s EasyForge includes a GGUF conversion feature.
With flux_tool, you can create GGUF files in any format you want.
How to Use ConvertGguf.bat
Let’s go over how to use ConvertGguf.bat in flux_tool.
First, prepare the model you want to convert.
When quantizing to GGUF format, the higher the precision of the original file, the better the resulting file’s quality.
- FP32 > FP16 > BF16 > FP8
So, prepare a model with the highest precision possible in this order.
Run ConvertGguf.bat
Next, run ConvertGguf.bat from the flux_tool folder in EasyForge.


A black command prompt window will appear; drag and drop the file you want to convert here.

Follow the prompts and press Enter to generate the GGUF format model!
Creating Other Formats
Running ConvertGguf.bat generates Q8 and Q6_K formats by default.
To create other GGUF formats, edit and save ConvertGgufFormat.txt in the env folder of flux_tool, then rerun ConvertGguf.bat.


Example Entries
- Q8_0
- Q6_K
- Q5_K_M
- Q4_K_M
- Q3_K_L, etc.
This lets you create GGUF files in any format you want!
Conclusion: GGUF Format Is Amazing!
Here’s a summary of using GGUF format:
- Save memory with GGUF format
- Choose a compression format that fits your VRAM capacity
- Use flux_tool to create the optimal format
While I used Flux.1 as an example, GGUF format is already implemented in SD 3.5 and SDXL too.

GGUF format is a groundbreaking technology, expected to drive AI adoption on smaller devices like smartphones in the future.
We might soon see an era where anyone can generate images on their phone.
Thank you for reading to the end!