Let AI Give You Ideas! How to Automatically Generate Prompts


- Generate captions with CLIP.
- Use looping workflows to create variations.
- Make minimal corrections when necessary.
Introduction
Hello, this is Easygoing.
The header image above was laid out using Manga Editor DESU!, an AI manga creation support app published by new-sankaku.
This app offers a variety of features that are completely free!
I’ll introduce how to use this app in detail in another post. For now, let’s explore how to generate prompts automatically using image generation AI.
Theme: Supercar Concept Art
This time, the theme is supercar concept art.

Cool car illustrations are something everyone admires. Let’s see if we can recreate them using AI.
Workflow!
Here’s the workflow used for this process:
flowchart LR
subgraph Input
A1(Prompt)
end
subgraph SDXL
B1(Base Sketch)
end
subgraph FLUX.1-Depth-dev
C1(Fix Composition)
end
subgraph FLUX.1-dev
D1(Final Touches)
end
A1-->B1
B1-->|ControlNet<br>Depth|C1
C1-->D1
D1-.->|Captioning|A1
Now, let’s break down the process step by step, illustrated with examples.
Step 1: SDXL (Base Sketch)
First, we use the previous generation SDXL to create a base sketch.

SDXL anime models can produce diverse compositions. However, since the texture quality of Flux.1 is superior, this composition will be transferred to Flux.1 using ControlNet’s depth feature.
Step 2: Depth-Anything-V2 (Depth Map Extraction)

Here, a depth map is extracted from the base sketch using Depth-Anything-V2.
Step 3: Flux.1-Depth-dev (Fixing Composition)

Using the extracted depth map, the official Flux.1 depth model refines the composition.
The Flux.1-depth-dev model works with ControlNet while consuming regular VRAM, but its texture quality is inferior to the standard model.
Therefore, after partial rendering, we switch to the standard Flux.1 model for the final touches.
Step 4: FluxesCore-Dev_V1.0 (Final Touches)

Finally, we complete the illustration using the high-quality Flux.1[dev] custom model.
Reusing the Prompt for More!
The process doesn’t end here. Using the finalized image, we generate captions with CLIP and feed them into the next image generation.
flowchart LR
subgraph Input
A1(Prompt)
end
subgraph CLIP
B1(CLIP Text Encoder)
B2(CLIP Vision Encoder)
end
subgraph Image
D1(Illustration)
end
subgraph Captioning Model
E1(Cliption)
end
A1 --> B1
B1 --> D1
D1 --> B2
B2 --> E1
E1 --> A1
Note: To save storage, typically only the Text Encoder component of CLIP is distributed.
CLIP, commonly used for generating images from text, can create captions from images when combined with its Vision Encoder and a dedicated captioning model.
The captions generated from the image differ slightly from the original prompt. Repeating this process leads to gradual changes in the illustrations.
Since this workflow relies entirely on AI for prompt generation, the AI naturally produces new variations as the process continues.
Watching the Prompts Evolve!
Let’s look at how the prompts and images evolve through iterations.
First Image

night, supercar, monaco, dutch angle, close up
This is the first image, created with just the five keywords above. Using CLIP, captions from this image are generated for the next iteration.
Second Image

A white sports car is parked on a city street at night. The car is positioned on the left side of the image, with its headlights on, casting a warm glow on the pavement. The street is lined with buildings, and there are people walking in the background.
night, supercar, monaco, dutch angle, close up
The car and its color resemble the first image, but the overall vibe feels slightly different.
Ninth Image!

A red sports car is parked on a street at night, with a person walking by in the background. The car has a sleek design with a prominent front grille, round headlights, and a rear spoiler. The street is illuminated by streetlights, and there are other cars parked along the sides of the road.
night, supercar, monaco, dutch angle, close up
By the ninth iteration, the car’s color, design, and composition have changed significantly, making for a unique and engaging illustration.
This technique allows AI to generate new ideas and creative variations with each iteration.
Flux.1[shnell] models for Commercial Use!
flowchart LR
subgraph Input
A1(Prompt)
end
subgraph SDXL
B1(Base Illustration)
end
subgraph Flux.1-shnell
D1(Final Touches)
end
A1-->B1
B1-->D1
D1-.->|Captioning|A1
The earlier examples used Flux.1[dev], which isn’t commercially available. Let’s try the same process using Flux.1[shnell], a commercially viable model.
First Image

Third Image

Fifth Image

While the quality is slightly inferior to the previous model, it’s still sufficient for generating ideas.
AI's "Bias" Problem
This workflow relies on AI to handle everything from creating prompts to drawing and looping the process.
While this generates new ideas over time, correcting course becomes challenging if the direction deviates.
For instance, image generation AI tends to depict women more often.

In this workflow, once a person is included in the illustration, it keeps generating similar results.
Minimal Course Correction
To address this, we introduced minimal corrections.
Specifically, certain keywords in the generated prompts were removed.
Removed Keywords
- People: girl, woman, female, lady, boy, man, male, gentleman
- Colors: black, white, silver, blue, red

Although removing words might make the prompts grammatically awkward, advanced models like CLIP-G and T5xxl can understand the context well, so it doesn’t cause major issues.
Unlike using negative prompts, this approach doesn’t affect the overall image but simply excludes specific elements.
Workflow!
This Flux.1 workflow is complex, so here’s a simplified version using SDXL alone:
flowchart LR
subgraph Input
A1(Prompt)
end
subgraph SDXL
B1(Illustration)
end
A1-->B1
B1-.->|Captioning|A1
Cliption_auto-prompt_SDXL 2025.1.10.json
This simplified workflow can be adapted by changing the base model or integrating it into existing workflows for various applications.
Introducing Custom Nodes!
Here are the custom nodes used in this workflow.
comfy-cliption

comfy-cliption is a custom node that uses CLIP-L to caption images. It’s lightweight and achieves high accuracy, especially when paired with improved CLIP-L models.
How to Use comfy-cliption
First, download the full CLIP-L model from the following page:

Place the downloaded Long-CLIP-L model in the following folders:
- InstallFolder/Models/CLIP
- InstallFolder/Models/InvokeClipVision or InstallFolder/models/clip_vision
Next, open ComfyUI and arrange the nodes as shown below.

The generated captions can be connected to the CLIP Text Encode node to be used as prompts for image generation.
To reuse captions for subsequent prompts, save them using the Save Text node and reload them with the Load Text node.
D2-nodes-ComfyUI

D2-nodes-ComfyUI, by da2el-AI, is a versatile node pack offering various functionalities.
In this workflow, we used the D2 Regex Replace node to remove specific words from the prompts.
How to Use D2 Regex Replace
To use the D2 Regex Replace node:

- connect the input text you want to modify.
- Enter the words to be removed, separated by a vertical bar |.
- Running the node will delete the specified words!
Although we used it for word removal here, it supports advanced replacement using regular expressions.
More Automatic Prompt Generators
While this workflow used CLIP-L for simple prompt automation, there are other methods available.

In the future, I plan to compare various approaches, including those using large language models (LLMs), for even more sophisticated prompt automation!
Update: 2025.2.7
I compared three different automatic prompt generation methods.
Conclusion: Trust AI to Explore Creativity
- Generate captions with CLIP.
- Use looping workflows to create variations.
- Make minimal corrections when necessary.
AI has learned from billions of images—something no human can match.
AI knows far more designs and compositions than any individual.

Recently, I’ve found the best approach is to let AI work freely and draw out ideas that I wouldn’t have thought of.
Thank you for reading to the end!