Is Negative Prompt Necessary? Unleashing AI’s Creativity!

Anime illustration of a woman with brown hair and blue eyes smiling at us in a snowy town
  • Keep CFG Scale low.
  • Use Negative Prompts sparingly.
  • Let AI create freely and observe its original illustrations

Introduction

Hello, this is Kimama / Easygoing.

Today, let’s dive into the topic of Negative Prompts in image generation AI.

Theme: Nighttime Snapshot

This time, our theme is a nighttime snapshot.

anime illustration of an asian woman standing in the street with a bunch of snow falling from her arms looking over her shoulder at the window.png (2579×2579)

We’ll capture the fleeting expression of someone spotted while walking through the town.

How Does AI Interpret Prompts?

We input prompts while imagining the images we want AI to generate.

Graph showing the relationship between Target Image, Conditioning and Unconditioning.png (1200×800)
  • Red: Target Image
  • Green: Conditioning (The image generated based on the input prompt)

The process of linking the input prompt to the actual image is handled by CLIP, resulting in an image referred to as Conditioning.

When Prompts Fall Short

CLIP is trained on image and text captioning, but its accuracy is not particularly high.

In many cases, the effect of the input prompt is less impactful than we expect.

As a result, image generation AI requires some method to enhance the prompt’s effectiveness.

Boosting Prompts with CFG Scale!

The first tool used to enhance prompt effectiveness is the CFG Scale.

Graph showing the relationship between Target Image, Conditioning and Unconditioning and CFG scale
  • CFG Scale: Classifier-Free Diffusion Guidance Scale
    A diffusion guidance scale without classifiers (no external models).

CFG Scale amplifies the prompt’s effect based on the input value.

The Importance of a Baseline!

When using CFG Scale, the reference point is crucial.

All image generation AIs generate various types of noise during the training phase or computational processes.

anime illustration of boy wearing winter clothes looking off to a window with snow coming up and buildings in the background in the foreground of the photo, snowy night.png (2579×2579)

Even if you generate an image without inputting a prompt, the resulting image will be affected by this noise, deviating from the true zero point.

Using Unconditioning as the Baseline!

Thus, the image generated without an input prompt is called Unconditioning, and it’s used as the baseline.

When generating an image, an arrow is drawn from Unconditioning to Conditioning, and by applying CFG Scale, the prompt’s effect is amplified.

Graph showing the relationship between Target Image, Conditioning and Unconditioning and CFG scale 2

As shown in the graph, using CFG Scale brings the generated image closer to the target.

Shifting the Baseline with Negative Prompt!

Building on this mechanism, the Negative Prompt was developed.

As mentioned earlier, Unconditioning is typically generated with an empty prompt.

Graph showing the relationship between Target Image, Conditioning and Unconditioning and CFG scale 3 Graph showing the relationship between Target Image, Conditioning, Unconditioning, CFG scale and Negative Prompt.png (1200×800)

Left: Without Negative Prompt | Right: With Negative Prompt

In contrast, Negative Prompt involves inputting a prompt into Unconditioning, shifting the baseline itself.

In the right graph, using Negative Prompt adjusts the baseline, changing the arrow’s direction and bringing the image closer to the target.

Negative Prompt Has a Big Impact!

Since Negative Prompt shifts the baseline, it has a significant impact on the entire illustration.

Negative Prompts typically include things you don’t want to generate, but their effect goes beyond simply avoiding specific elements.

Anime illustration of a woman with brown hair and blue eyes warming her hands with her breath on a snowy night in the city_cleanup.png (2456×2456)
Negatives are powerful!

With the many settings in modern image generation AI, shifting the baseline too much with Negative Prompt can make you lose your bearings.

When using Negative Prompt, it’s essential to maintain balance and aim for minimal usage.

When Is Negative Prompt Effective?

So, when is Negative Prompt particularly useful?

Stable Diffusion 1 Had Low Prompt Fidelity

In the era of Stable Diffusion 1, prompt fidelity was limited due to CLIP’s constraints.

anime illustration of young girl holding gun in snowy city alley area with heavy snowdrops and buildings behind her, and brick wall overhead from left side shot shot.png (2456×2456)
Are you even listening?

The CLIP-L in Stable Diffusion 1 had a limited vocabulary and was trained on restricted data.

To improve prompt fidelity in Stable Diffusion 1, it was necessary to set a high CFG Scale and input many Negative Prompts to pull Unconditioning in a negative direction, bringing the result closer to the target.

Cases Where Custom Models Specify Negative Prompt Use

Even with SDXL, Negative Prompt can be effective in specific cases.

Namely, when custom models explicitly recommend using Negative Prompt.

anime illustration of kid in city street wearing scarf and scarf around neck reading book, with lights in distance behind him over blue eyes as snow falles in background.png (2456×2456)
Be sure to read carefully.

Example: Recommended Negative Prompt for Animagine-XL 3.1

nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan,

In such cases, the custom model is tuned with the expectation of Negative Prompt input, so using Negative Prompt results in higher-quality images.

The Direct Approach: Improving CLIP!

While CFG Scale and Negative Prompt can improve illustration quality when used well, they are quite challenging to fine-tune.

For improving prompt fidelity, the direct approach is enhancing CLIP.

Graph showing the relationship between Target Image, Conditioning, Refine CLIP and Unconditioning.png (1200×800)

By improving CLIP’s performance, Conditioning can be brought closer to the target direction, allowing higher prompt fidelity without relying on CFG Scale or Negative Prompt.

SDXL: CLIP-G, a Major Upgrade from CLIP-L

Improved CLIP-L: Matching CLIP-G’s Performance

Among the various methods tried to improve image quality, upgrading CLIP-L has been the most effective in my experience.


Update: December 31, 2024

I compared the effects of the improved CLIP-L with actual images.


Powerful Support with T5xxl

Another approach to improving prompt fidelity is the introduction of T5xxl.

Unlike CLIP, T5xxl doesn’t have image recognition capabilities, but it leverages advanced text comprehension to optimize the input prompt before passing it to CLIP.

Graph showing the relationship between Target Image and Conditioning and T5xxl and Unconditioning.png (1200×800)

Since June 2024, Stable Diffusion 3 and Flux.1 have incorporated T5xxl, significantly improving Conditioning accuracy, and it’s been announced that Negative Prompt is no longer necessary.

What Is AI’s Creativity?

Now, let’s explore AI’s creativity in the remaining part of the article.

What kind of images does AI envision after receiving our prompts?

Anime illustration of a woman with brown hair and blue eyes looking at you somehow in the evening in a snowy town.png (2456×2456)
What Does AI Imagine?

When considering AI’s freest expression, Conditioning represents the image AI imagines from the prompt.

Outputting Conditioning Directly

Let’s try outputting Conditioning as is.

There are two ways to output Conditioning:

Set CFG Scale to 1

From the earlier graph, CFG Scale can be calculated as follows:

Graph showing the relationship between Target Image, Conditioning, Unconditioning and CFG scale x1.png (1200×800)
  • Generated Image = Unconditioning + (Conditioning − Unconditioning) × CFG Scale

When CFG Scale is 1, Unconditioning is canceled out, as shown in the formula.

While some minor errors may remain, setting CFG Scale to 1 minimizes Unconditioning’s influence to near zero.

Input the Same Prompt for Positive and Negative

Another method is to input the exact same prompt for both Positive and Negative.

anime illustration of a girl writing something into an open book outside in the night time with a crowd on the sidewalk behind her holding her and looking at the camera.png (2576×2576)
Freedom is wonderful!

In this case, Conditioning and Unconditioning are identical, completely eliminating Unconditioning’s influence.

In this state, AI depicts the image directly inspired by the prompt, showcasing maximum creativity.

Does Flux.1 Lack Variety?

Flux.1 and SD 3.5 have greatly improved prompt fidelity with the introduction of T5xxl.

Launched in August 2024, Flux.1 boasts an impressively high success rate and exceptional quality in illustration generation.

Anime illustration of a woman with brown hair and blue eyes smiling as she turns around in a snowy town.png (2576×2576)
Flux.1 is precise.

However, while using Flux.1, I’ve noticed fewer opportunities to encounter surprisingly unique illustrations compared to before.

T5xxl’s strength in interpreting prompts might, in some cases, make it overly considerate.

I’m still experimenting with recreating SDXL’s variability in Flux.1.

Conclusion: Unleashing AI’s Creativity!

  • Keep CFG Scale low.
  • Use Negative Prompt minimally.
  • Explore AI’s raw illustrations.

When I started with image generation AI, I thought it was a tool to create the illustrations I wanted.

But seeing AI’s incredible composition and expressiveness, I’ve come to believe that our role is to let AI generate freely and gently guide it toward the final product.

After much thought, I now set CFG Scale to around 1–2.

I’m grateful for the free availability of improved CLIP and high-quality models, and I look forward to continuing to enjoy image generation.

Thank you for reading to the end!