Workflows
Workflows let you define multi-step datasety pipelines in YAML or JSON files. This is useful for reproducible dataset preparation.
Quick Start
Create datasety.yaml in your project directory:
steps:
- command: resize
args:
input: ./raw
output: ./dataset
resolution: 1024x1024
- command: caption
args:
input: ./dataset
output: ./dataset
trigger-word: "ohwx,"Validate first, then run:
datasety workflow --dry-run
datasety workflowFile Format
See the workflow command reference for full format details.
Real-World Pipelines
Face/Person LoRA Training
The most common use case: prepare a face LoRA dataset from raw selfies or portrait photos. Resize to square, caption with a rare trigger word, and generate face masks so the trainer can focus loss on the subject.
# face-lora.yaml
# Input: ./raw/ containing 15-30 portrait photos (JPG/PNG from phone camera)
# Output: ./dataset/ with resized images, captions (.txt), and masks
steps:
- command: resize
args:
input: ./raw
output: ./dataset
resolution: 1024x1024
crop-position: top
- command: caption
args:
input: ./dataset
output: ./dataset
trigger-word: "ohwx person,"
- command: mask
args:
input: ./dataset
output: ./dataset/masks
keywords: "person,face,hair"
model: clipseg
threshold: 0.4
padding: 10
blur: 5datasety workflow -f face-lora.yaml --dry-run
datasety workflow -f face-lora.yaml
# Result: ./dataset/ has 001.jpg + 001.txt + masks/001.png for each imageAccessory Augmentation
You have 20 photos of a person and want to expand the dataset with synthetic variations wearing different accessories. This is useful when you want the LoRA to generalize beyond the reference photos.
# augment-accessories.yaml
# Input: ./dataset/ containing resized training images (from face LoRA step above)
# Output: ./augmented/ with synthetic edits, then re-captioned
steps:
- command: synthetic
args:
input: ./dataset
output: ./augmented/hats
prompt: "the person is wearing a knitted beanie hat"
steps: 4
cfg-scale: 2.5
seed: 42
- command: synthetic
args:
input: ./dataset
output: ./augmented/glasses
prompt: "the person is wearing round sunglasses"
steps: 4
cfg-scale: 2.5
seed: 42
- command: synthetic
args:
input: ./dataset
output: ./augmented/scarves
prompt: "the person is wearing a red wool scarf"
steps: 4
cfg-scale: 2.5
seed: 42
- command: caption
args:
input: ./augmented/hats
output: ./augmented/hats
trigger-word: "ohwx person,"
- command: caption
args:
input: ./augmented/glasses
output: ./augmented/glasses
trigger-word: "ohwx person,"
- command: caption
args:
input: ./augmented/scarves
output: ./augmented/scarves
trigger-word: "ohwx person,"Product Photography LoRA
Prepare a dataset from product photos for an object LoRA. Products often have white or cluttered backgrounds, so we use non-square portrait crops to preserve product shape.
# product-lora.yaml
# Input: ./product_photos/ containing product images (various sizes)
# Output: ./dataset/ ready for training
steps:
- command: resize
args:
input: ./product_photos
output: ./dataset
resolution: 768x1024
crop-position: center
- command: caption
args:
input: ./dataset
output: ./dataset
trigger-word: "sks product,"
florence-2-large: true
- command: mask
args:
input: ./dataset
output: ./dataset/masks
keywords: "product,object,item"
model: clipseg
threshold: 0.3Upscale/Restore Training
Create a paired dataset for training an upscale or image restoration model. The degradation step creates realistic artifacts (JPEG compression, noise, blur) that the model learns to reverse.
# upscale-training.yaml
# Input: ./originals/ containing high-quality source images
# Output: ./dataset/ with control/ (degraded) and target/ (original) subdirs
steps:
- command: resize
args:
input: ./originals
output: ./resized
resolution: 1024x1024
- command: degrade
args:
input: ./resized
output: ./dataset
type:
- jpeg
- noise
- blur
chain: true
intensity-range: "0.3-0.7"
paired: true
seed: 42
- command: align
args:
target: ./dataset/target
control: ./dataset/control
- command: caption
args:
input: ./dataset/target
output: ./dataset/targetBackground Replacement
Generate inverted masks (everything except the subject), then use synthetic editing to change backgrounds. Useful for placing subjects in varied environments.
# background-swap.yaml
# Input: ./portraits/ containing people photos with plain backgrounds
# Output: Three sets of re-backgrounded images
steps:
- command: resize
args:
input: ./portraits
output: ./resized
resolution: 1024x1024
crop-position: center
- command: synthetic
args:
input: ./resized
output: ./bg_outdoor
prompt: "the person is standing in a sunny park with trees and grass"
steps: 4
cfg-scale: 2.5
seed: 100
- command: synthetic
args:
input: ./resized
output: ./bg_studio
prompt: "professional studio portrait with soft lighting and gray backdrop"
steps: 4
cfg-scale: 2.5
seed: 100
- command: synthetic
args:
input: ./resized
output: ./bg_urban
prompt: "the person is standing on a city street with buildings"
steps: 4
cfg-scale: 2.5
seed: 100Inpainting Dataset
Create an inpainting training dataset with source images, masks, and captions. The masks mark regions to inpaint (e.g., accessories that should be removable).
# inpainting-dataset.yaml
# Input: ./photos/ containing images of people with accessories
# Output: ./dataset/ with images, masks for accessories, and captions
steps:
- command: resize
args:
input: ./photos
output: ./dataset
resolution: 1024x1024
crop-position: top
- command: mask
args:
input: ./dataset
output: ./dataset/masks
keywords: "hat,glasses,sunglasses,scarf,necklace,earring"
model: sam3
threshold: 0.3
padding: 5
blur: 3
- command: caption
args:
input: ./dataset
output: ./dataset
florence-2-large: trueVision API Captioning with Custom Provider
Use a third-party OpenAI-compatible API for captioning when you want higher-quality descriptions than Florence-2. Works with OpenRouter, Together, or any compatible endpoint.
# api-caption.yaml
# Requires: OPENAI_API_KEY and OPENAI_BASE_URL env vars
steps:
- command: resize
args:
input: ./raw
output: ./dataset
resolution: 1024x1024
- command: caption
args:
input: ./dataset
output: ./dataset
llm-api: true
model: gpt-5-nano
trigger-word: "ohwx person,"
prompt: "Describe this person's appearance, clothing, pose, expression, and setting in one detailed paragraph. Do not mention image quality or photography terms."
temperature: 0.3
max-tokens: 200# Run with OpenRouter
OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
OPENAI_API_KEY=sk-or-... \
datasety workflow -f api-caption.yamlMulti-Resolution Dataset
Some trainers benefit from images at multiple resolutions. This workflow outputs the same source at three common training sizes.
# multi-res.yaml
# Input: ./raw/ containing high-res source images (>= 2048px)
steps:
- command: resize
args:
input: ./raw
output: ./dataset_512
resolution: 512x512
- command: resize
args:
input: ./raw
output: ./dataset_768
resolution: 768x768
- command: resize
args:
input: ./raw
output: ./dataset_1024
resolution: 1024x1024
- command: caption
args:
input: ./dataset_1024
output: ./dataset_512
trigger-word: "ohwx,"
- command: caption
args:
input: ./dataset_1024
output: ./dataset_768
trigger-word: "ohwx,"
- command: caption
args:
input: ./dataset_1024
output: ./dataset_1024
trigger-word: "ohwx,"Sweep Then Train
Use sweep to find optimal generation parameters on a small sample, then apply the best settings to the full dataset.
# Step 1: Test on 2-3 images to find the best steps + cfg-scale
mkdir ./sample && cp ./dataset/001.jpg ./dataset/002.jpg ./sample/
datasety sweep \
-i ./sample -o ./sweep_results \
-p "the person is wearing aviator sunglasses" \
--steps 2,4,8 \
--cfg-scale 1.5,2.5,3.5 \
--seed 42 --run
# Step 2: Visually inspect ./sweep_results/steps4_cfg2.5/ etc.
# Pick the best combination, then apply to the full dataset:# full-augment.yaml
steps:
- command: synthetic
args:
input: ./dataset
output: ./augmented
prompt: "the person is wearing aviator sunglasses"
steps: 4
cfg-scale: 2.5
seed: 42
- command: caption
args:
input: ./augmented
output: ./augmented
trigger-word: "ohwx person,"Cyanotype Style LoRA (API dataset + two models)
Train a cyanotype photographic style LoRA — the 1842 UV contact-print process producing Prussian-blue and bleached-white prints — on images generated via the FLUX API. The finished LoRAs let you:
- FLUX.2-klein-base-4B: add the cyanotype style to any img2img edit
- Qwen-Image-Edit-2511: convert any photograph to a cyanotype print
This workflow was run end-to-end and produced working LoRAs. See
examples/cyanotype_lora/for the full output including trained weights, dataset, and inference results.
# cyanotype-lora.yaml
# Requires: OPENAI_API_KEY + OPENAI_BASE_URL (OpenRouter) + HF_TOKEN
steps:
- command: character
args:
output: ./dataset/raw
num-images: 30
prompts-file: ./prompts.txt # 30 curated cyanotype subject prompts
image-api: true
model: black-forest-labs/flux.2-klein-4b
api-aspect-ratio: "1:1"
api-image-size: 1K
output-format: png
- command: resize
args:
input: ./dataset/raw
output: ./dataset/prepared
resolution: 512x512
crop-position: center
- command: caption
args:
input: ./dataset/prepared
output: ./dataset/prepared
trigger-word: "cyanotype,"
llm-api: true
model: google/gemini-2.5-flash
prompt: >
Describe this image's subject and composition in one sentence.
Focus on WHAT is depicted, not the style or color.
temperature: 0.3
max-tokens: 80
- command: train
args:
input: ./dataset/prepared
output: ./lora/cyanotype_flux4b.safetensors
model: black-forest-labs/FLUX.2-klein-base-4B
steps: 500
lr: 1e-4
lora-rank: 16
timestep-type: sigmoid
caption-dropout: 0.05
lr-scheduler: cosine
lr-warmup-steps: 50
validation-split: 0.1
seed: 42
- command: train
args:
input: ./dataset/prepared
output: ./lora/cyanotype_qwen.safetensors
model: Qwen/Qwen-Image-Edit-2511
steps: 300
lr: 5e-5
lora-rank: 16
timestep-type: sigmoid
caption-dropout: 0.05
lr-scheduler: cosine
lr-warmup-steps: 30
validation-split: 0.1
seed: 42datasety workflow -f cyanotype-lora.yaml --dry-run
datasety workflow -f cyanotype-lora.yaml
# Apply trained LoRA — FLUX img2img
datasety synthetic --input-image photo.jpg --output-image out.png \
--model black-forest-labs/FLUX.2-klein-base-4B \
--lora ./lora/cyanotype_flux4b.safetensors:0.9 \
--prompt "cyanotype, botanical specimen, Prussian blue and white" \
--steps 20 --cfg-scale 3.5 --strength 0.75
# Apply trained LoRA — Qwen photo-to-cyanotype
datasety synthetic --input-image photo.jpg --output-image out.png \
--model Qwen/Qwen-Image-Edit-2511 \
--lora ./lora/cyanotype_qwen.safetensors:0.8 \
--prompt "cyanotype, convert to cyanotype print style, Prussian blue and white" \
--steps 40 --true-cfg-scale 4.0Train a LoRA from a Character Dataset
Prepare a character dataset using character (LLM-generated prompts + FLUX.2), then train a LoRA adapter on the result.
Note: Training requires the base (undistilled) model. The
charactergeneration step uses the fast FP8 inference model; thetrainstep loads the full base model.
# character-lora.yaml
# Input: Optional reference face image(s) at ./reference/
# Output: ./lora/character_lora.safetensors ready to use with --lora
steps:
- command: character
args:
output: ./character_dataset
num-images: 50
llm-ollama: qwen3.5:4b
model: black-forest-labs/FLUX.2-klein-4b-fp8
character-description: "a young woman with short auburn hair and freckles"
steps: 4
seed: 42
- command: train
args:
input: ./character_dataset
output: ./lora/character_lora.safetensors
model: black-forest-labs/FLUX.2-klein-base-4B
steps: 500
lr: 1e-4
lora-rank: 16
image-size: 512datasety workflow -f character-lora.yaml --dry-run
datasety workflow -f character-lora.yaml
# Use the trained LoRA for inference
datasety synthetic \
--input-image photo.jpg \
--output-image result.png \
--prompt "ohwx person in a forest" \
--lora ./lora/character_lora.safetensors:0.8Shuffled Caption Augmentation
Generate randomized captions to add variety to a training dataset. Each image gets a randomly assembled caption from predefined text groups, which helps prevent the model from memorizing exact phrasings.
# shuffle-captions.yaml
# Input: ./raw/ containing images
# Generates randomized captions from text groups
steps:
- command: resize
args:
input: ./raw
output: ./dataset
resolution: 1024x1024
- command: shuffle
args:
input: ./dataset
output: ./dataset
group:
- "ohwx person,|a photo of ohwx,|ohwx,"
- "looking at the camera|facing forward|in a relaxed pose|smiling"
- "natural lighting|soft studio light|bright daylight|warm indoor lighting"
seed: 42TTS Audio Dataset from YouTube
Build a TTS training dataset from a YouTube video or a directory of audio files. The audio command transcribes speech, slices audio at word boundaries, and outputs LJSpeech-compatible wavs/ + metadata.csv.
# tts-from-youtube.yaml
# Input: YouTube URL or local directory
# Output: ./tts_dataset/ with wavs/ and metadata.csv ready for Piper training
steps:
- command: audio
args:
input: "https://www.youtube.com/watch?v=..."
output: ./tts_dataset
whisper-model: large-v3
language: en
normalize-numbers: true
workers: 4
- command: audio
args:
input: ./recordings/
output: ./tts_dataset
whisper-model: base
language: uk
normalize-numbers: true
workers: 4
resume: truedatasety workflow -f tts-from-youtube.yaml --dry-run
datasety workflow -f tts-from-youtube.yaml
# Resume later (skips already-processed files)
datasety workflow -f tts-from-youtube.yamlTip: Use
--workers 4(or more) to transcribe multiple files in parallel. Use--normalize-numbersto expand digits like123into words so the TTS model pronounces them correctly.
Upload TTS Dataset to HuggingFace
After building a TTS dataset, upload it to HuggingFace Hub with auto-generated dataset card:
# upload-tts.yaml
steps:
- command: audio
args:
input: "https://www.youtube.com/watch?v=..."
output: ./tts_dataset
whisper-model: base
language: en
workers: 4
- command: upload
args:
path: ./tts_dataset
repo-id: your-username/my-voice-dataset
type: audio
private: truedatasety workflow -f upload-tts.yaml --dry-run
datasety workflow -f upload-tts.yamlPrepare and Upload LoRA Training Dataset
Resize, caption, and train a LoRA, then upload the adapter:
# train-and-upload.yaml
steps:
- command: resize
args:
input: ./raw
output: ./dataset
resolution: 1024x1024
- command: caption
args:
input: ./dataset
output: ./dataset
trigger-word: "ohwx person,"
- command: train
args:
input: ./dataset
output: ./lora/portrait_lora.safetensors
model: black-forest-labs/FLUX.2-klein-base-4B
steps: 500
lora-rank: 16
- command: upload
args:
path: ./lora/portrait_lora.safetensors
repo-id: your-username/portrait-lora
type: modeldatasety workflow -f train-and-upload.yaml --dry-run
datasety workflow -f train-and-upload.yaml