Optimize Your Flux.1 Experience! How to Use GGUF Format with Forge and ComfyUI

2024-11-282025-3-19

Anime illustration of a woman performing a waterfall 7.png (1600×1600)

Save memory with the GGUF format.
Choose a compression format that matches your VRAM capacity.
Use flux_tool to create the optimal GGUF format for your needs.

Introduction

Hello, I’m Easygoing.

Today, I’ll introduce the GGUF format, which lets you run Flux.1 comfortably.

Flux.1 Models Are Large in Size

As mentioned in my previous post, Flux.1, a new image generation AI, has a large model size, requiring 16GB of VRAM to run in FP16 or FP8 formats.

The Best GPUs for Image Generation: Comfortable Use of Flux.1 / SD 3.5 / AuraFlow! | AI image journey

The GGUF format, which has recently gained popularity, uses quantized compression to significantly reduce memory usage.

In this post, I’ll explain how to use the GGUF format.

Comparing Real Illustrations

Let’s start by looking at some actual illustrations.

Image to image
1440 x 1440 → Hires 2576 x 2576

FP16 (16-bit)

Anime illustration of a woman performing a waterfall FP16.png (2576×2576)

FP8 (8-bit)

Anime illustration of a woman performing a waterfall FP8.png (2576×2576) — Finger details are distorted

Q8.gguf (8-bit)

Anime illustration of a woman performing a waterfall GGUF Format Q8.png (2576×2576) — Nearly identical to FP16

Q5_K_M.gguf (5-bit)

Anime illustration of a woman performing a waterfall GGUF Format Q5_K_M.png (2576×2576)

Q3_K_L.gguf (3-bit)

Anime illustration of a woman performing a waterfall GGUF Format Q_3_K_L.png (2576×2576) — Barely usable

Q2_K.gguf (2-bit)

Anime illustration of a woman performing a waterfall GGUF Format Q_2_K.png (2576×2576) — Significant distortion

Features of GGUF Format

Both FP8 and Q8.gguf are 8-bit and have the same size, but Q8.gguf often offers higher precision and better image quality.

In GGUF format, as the bit count decreases, image quality degrades, losing background detail and color depth.

Generally, Q4 to Q5 strikes a good balance between size and precision, with Q3 being the minimum usable threshold before collapse.

What Do the Letters Mean?

Taking Q4 as an example, the letters that follow have these meanings:

Q4: Original method
Q4_K: K-quantization (improved precision)
Q4_K_S: K-quantized, smaller size, lower precision
Q4_K_M: K-quantized, intermediate format
Q4_K_L: K-quantized, larger size, higher precision

Precision increases as you move down the list.

Let’s Try It Out!

Now, let’s go over how to use GGUF format models in practice.

GGUF format files for Flux.1 are available here.

Base Models

Text Encoder

T5xxl_v1_1

city96/t5-v1_1-xxl-encoder-gguf at main

Flan-T5xxl

easygoing0114/flan-t5-xxl-fused at main

Flan-T5xxl is an improved version of T5xxl_v1_1, enhancing prompt comprehension and image quality, so I recommend using this one.

Which One Should You Choose?

GGUF format comes in various types. Based on my environment, here are the formats that fit within VRAM limits:

Stable Diffusion webUI Forge

	Base model	T5xxl
VRAM 16GB	FP16	FP16
VRAM 12GB	Q5_K_S	FP16

ComfyUI

	Base model	T5xxl
VRAM 16GB	FP16	FP16
VRAM 12GB	Q6_K	Q8
VRAM 8GB	Q3_K_L	Q6_K

If VRAM usage exceeds the limit with these models, I recommend using a format one step smaller.

If your display monitor is connected to the GPU instead of the integrated output, it uses additional VRAM, so an even smaller size may be appropriate.

Anime illustration of a woman performing a waterfall 5.png (2576×2576)

How to Use Them!

Here’s how to use them in practice.

Placing the Downloaded Files

Place the downloaded files in one of these locations:

Base Models

Installation Folder/models/Stable-diffusion
Installation Folder/models/unet
Installation Folder/Models/Unet

Text Encoders

Installation Folder/models/text_encoder
Installation Folder/models/clip
Installation Folder/Models/CLIP

Stable Diffusion webUI Forge: Super Easy!

First, here’s the generation screen in Stable Diffusion webUI Forge.

stable diffusion webui forge image generation screen with English comment2.png (1600×1000)

As usual, select the GGUF format model in the base model selection screen instead of the standard format.

For the text encoder, choose the corresponding GGUF format file under the VAE/text encoder section.

To deselect an option, press the ❌ button on the right.

T5xxl (any GGUF format)
clip-l
ae.safetensors

Confirm that these three are selected in VAE/text encoder, then enter your prompt as usual and press Generate to create the image!

Anime illustration of a woman performing a waterfall 2.png (2576×2576)

ComfyUI: Requires a Custom Node

To use GGUF format in ComfyUI, you need to install the ComfyUI-GGUF custom node.

Here, I’ll explain how to install it using the ComfyUI Manager.

First, launch ComfyUI and open the Manager.

Screenshot of comfyui browser screen with comment.png (3840×2400)

Next, select Custom Nodes Manager.

comfyui manager search screenshots with comment.png (3840×2280)

Search for “gguf” to find the ComfyUI-GGUF custom node, then install it.

A restart message will appear, so restart ComfyUI.

When creating a workflow, select Unet Loader (GGUF) and Dual Clip Loader (GGUF).

This allows you to select GGUF format models.

If you’re using EasyForge, the ComfyUI-GGUF custom node is already installed!

Drawbacks of GGUF Format

GGUF format offers significant advantages in saving storage and memory, but there are drawbacks in certain cases.

It Doesn’t Speed Up Much If VRAM Is Plentiful

GGUF format files are generated by quantizing and compressing the original format.

Example: FP16 → Q8.gguf

Q8.gguf performs calculations with quantized integers, making it generally faster than FP16, but key parts are reconverted to FP16 for processing.

Anime illustration of a woman performing a waterfall 9.png (2576×2576)

So, if you have ample VRAM and can process in FP16, using GGUF might not speed things up.

ComfyUI Might Overuse VRAM!?

ComfyUI keeps models in VRAM as much as possible to speed up processing.

When using non-default custom nodes, capacity might not be counted correctly, preventing model unloading and causing VRAM overflow.

In this case, connecting an Unload All Models node before the VRAM-exceeding process resolves the issue.

Example: Connect Unload All Models between conditioning and Sampler

The Unload All Models custom node can be installed by searching for “ComfyUI-Unload-Model” in the Manager.

Recently, both ComfyUI and ComfyUI-GGUF receive daily updates, and in my environment, VRAM overflow varies day by day.

Anime illustration of a woman performing a waterfall 8.png (2579×2579)

This issue will likely be resolved eventually, but the Unload All Models node is useful in other scenarios too, so it’s worth learning.

Create Your Own GGUF Files!

So far, I’ve explained how to use GGUF format.

Now, I’ll show you how to create your own GGUF files.

Use Flux_tool!

As introduced in a previous article, the flux_tool in Zuntan’s EasyForge includes a GGUF conversion feature.

The Easiest Way to Install Flux.1 – Perfect for Beginners! | AI image journey

With flux_tool, you can create GGUF files in any format you want.

How to Use ConvertGguf.bat

Let’s go over how to use ConvertGguf.bat in flux_tool.

First, prepare the model you want to convert.

When quantizing to GGUF format, the higher the precision of the original file, the better the resulting file’s quality.

FP32 > FP16 > BF16 > FP8

So, prepare a model with the highest precision possible in this order.

Run ConvertGguf.bat

Next, run ConvertGguf.bat from the flux_tool folder in EasyForge.

Screen shot of EasyForge folder opened with comment.png (2235×1675)

Screen shot of flux_tool folder opened with comment.png (2235×1675)

A black command prompt window will appear; drag and drop the file you want to convert here.

Screenshot of drag and drop into command prompt with English comment.png (3055×2060)

Follow the prompts and press Enter to generate the GGUF format model!

Creating Other Formats

Running ConvertGguf.bat generates Q8 and Q6_K formats by default.

To create other GGUF formats, edit and save ConvertGgufFormat.txt in the env folder of flux_tool, then rerun ConvertGguf.bat.

Screenshot of env folder opened with comment.png (1887×1675)

ConvertGgufFormat.txt screenshot.png (1110×880)

Example Entries

Q8_0
Q6_K
Q5_K_M
Q4_K_M
Q3_K_L, etc.

This lets you create GGUF files in any format you want!

Conclusion: GGUF Format Is Amazing!

Here’s a summary of using GGUF format:

Save memory with GGUF format
Choose a compression format that fits your VRAM capacity
Use flux_tool to create the optimal format

While I used Flux.1 as an example, GGUF format is already implemented in SD 3.5 and SDXL too.

Anime illustration of a woman performing a waterfall 6.png (2576×2576)

GGUF format is a groundbreaking technology, expected to drive AI adoption on smaller devices like smartphones in the future.

We might soon see an era where anyone can generate images on their phone.

Thank you for reading to the end!

Introduction

Flux.1 Models Are Large in Size

Comparing Real Illustrations

FP16 (16-bit)

FP8 (8-bit)

Q8.gguf (8-bit)

Q5_K_M.gguf (5-bit)

Q3_K_L.gguf (3-bit)

Q2_K.gguf (2-bit)

Features of GGUF Format

What Do the Letters Mean?

Let’s Try It Out!

Base Models

Flux.1[dev]

Flux.1[shnell]

blue_pencil-flux1

Text Encoder

T5xxl_v1_1

Flan-T5xxl

Which One Should You Choose?

Stable Diffusion webUI Forge

ComfyUI

How to Use Them!

Placing the Downloaded Files

Base Models

Text Encoders

Stable Diffusion webUI Forge: Super Easy!

ComfyUI: Requires a Custom Node

Drawbacks of GGUF Format

It Doesn’t Speed Up Much If VRAM Is Plentiful

ComfyUI Might Overuse VRAM!?

Create Your Own GGUF Files!

Use Flux_tool!

How to Use ConvertGguf.bat

Run ConvertGguf.bat

Creating Other Formats

Example Entries

Conclusion: GGUF Format Is Amazing!