Skip to content

Workflows

Workflows let you define multi-step datasety pipelines in YAML or JSON files. This is useful for reproducible dataset preparation.

Quick Start

Create datasety.yaml in your project directory:

yaml
steps:
  - command: resize
    args:
      input: ./raw
      output: ./dataset
      resolution: 1024x1024
  - command: caption
    args:
      input: ./dataset
      output: ./dataset
      trigger-word: "ohwx,"

Validate first, then run:

bash
datasety workflow --dry-run
datasety workflow

File Format

See the workflow command reference for full format details.

Real-World Pipelines

Face/Person LoRA Training

The most common use case: prepare a face LoRA dataset from raw selfies or portrait photos. Resize to square, caption with a rare trigger word, and generate face masks so the trainer can focus loss on the subject.

yaml
# face-lora.yaml
# Input: ./raw/ containing 15-30 portrait photos (JPG/PNG from phone camera)
# Output: ./dataset/ with resized images, captions (.txt), and masks
steps:
  - command: resize
    args:
      input: ./raw
      output: ./dataset
      resolution: 1024x1024
      crop-position: top

  - command: caption
    args:
      input: ./dataset
      output: ./dataset
      trigger-word: "ohwx person,"

  - command: mask
    args:
      input: ./dataset
      output: ./dataset/masks
      keywords: "person,face,hair"
      model: clipseg
      threshold: 0.4
      padding: 10
      blur: 5
bash
datasety workflow -f face-lora.yaml --dry-run
datasety workflow -f face-lora.yaml
# Result: ./dataset/ has 001.jpg + 001.txt + masks/001.png for each image

Accessory Augmentation

You have 20 photos of a person and want to expand the dataset with synthetic variations wearing different accessories. This is useful when you want the LoRA to generalize beyond the reference photos.

yaml
# augment-accessories.yaml
# Input: ./dataset/ containing resized training images (from face LoRA step above)
# Output: ./augmented/ with synthetic edits, then re-captioned
steps:
  - command: synthetic
    args:
      input: ./dataset
      output: ./augmented/hats
      prompt: "the person is wearing a knitted beanie hat"
      steps: 4
      cfg-scale: 2.5
      seed: 42

  - command: synthetic
    args:
      input: ./dataset
      output: ./augmented/glasses
      prompt: "the person is wearing round sunglasses"
      steps: 4
      cfg-scale: 2.5
      seed: 42

  - command: synthetic
    args:
      input: ./dataset
      output: ./augmented/scarves
      prompt: "the person is wearing a red wool scarf"
      steps: 4
      cfg-scale: 2.5
      seed: 42

  - command: caption
    args:
      input: ./augmented/hats
      output: ./augmented/hats
      trigger-word: "ohwx person,"

  - command: caption
    args:
      input: ./augmented/glasses
      output: ./augmented/glasses
      trigger-word: "ohwx person,"

  - command: caption
    args:
      input: ./augmented/scarves
      output: ./augmented/scarves
      trigger-word: "ohwx person,"

Product Photography LoRA

Prepare a dataset from product photos for an object LoRA. Products often have white or cluttered backgrounds, so we use non-square portrait crops to preserve product shape.

yaml
# product-lora.yaml
# Input: ./product_photos/ containing product images (various sizes)
# Output: ./dataset/ ready for training
steps:
  - command: resize
    args:
      input: ./product_photos
      output: ./dataset
      resolution: 768x1024
      crop-position: center

  - command: caption
    args:
      input: ./dataset
      output: ./dataset
      trigger-word: "sks product,"
      florence-2-large: true

  - command: mask
    args:
      input: ./dataset
      output: ./dataset/masks
      keywords: "product,object,item"
      model: clipseg
      threshold: 0.3

Upscale/Restore Training

Create a paired dataset for training an upscale or image restoration model. The degradation step creates realistic artifacts (JPEG compression, noise, blur) that the model learns to reverse.

yaml
# upscale-training.yaml
# Input: ./originals/ containing high-quality source images
# Output: ./dataset/ with control/ (degraded) and target/ (original) subdirs
steps:
  - command: resize
    args:
      input: ./originals
      output: ./resized
      resolution: 1024x1024

  - command: degrade
    args:
      input: ./resized
      output: ./dataset
      type:
        - jpeg
        - noise
        - blur
      chain: true
      intensity-range: "0.3-0.7"
      paired: true
      seed: 42

  - command: align
    args:
      target: ./dataset/target
      control: ./dataset/control

  - command: caption
    args:
      input: ./dataset/target
      output: ./dataset/target

Background Replacement

Generate inverted masks (everything except the subject), then use synthetic editing to change backgrounds. Useful for placing subjects in varied environments.

yaml
# background-swap.yaml
# Input: ./portraits/ containing people photos with plain backgrounds
# Output: Three sets of re-backgrounded images
steps:
  - command: resize
    args:
      input: ./portraits
      output: ./resized
      resolution: 1024x1024
      crop-position: center

  - command: synthetic
    args:
      input: ./resized
      output: ./bg_outdoor
      prompt: "the person is standing in a sunny park with trees and grass"
      steps: 4
      cfg-scale: 2.5
      seed: 100

  - command: synthetic
    args:
      input: ./resized
      output: ./bg_studio
      prompt: "professional studio portrait with soft lighting and gray backdrop"
      steps: 4
      cfg-scale: 2.5
      seed: 100

  - command: synthetic
    args:
      input: ./resized
      output: ./bg_urban
      prompt: "the person is standing on a city street with buildings"
      steps: 4
      cfg-scale: 2.5
      seed: 100

Inpainting Dataset

Create an inpainting training dataset with source images, masks, and captions. The masks mark regions to inpaint (e.g., accessories that should be removable).

yaml
# inpainting-dataset.yaml
# Input: ./photos/ containing images of people with accessories
# Output: ./dataset/ with images, masks for accessories, and captions
steps:
  - command: resize
    args:
      input: ./photos
      output: ./dataset
      resolution: 1024x1024
      crop-position: top

  - command: mask
    args:
      input: ./dataset
      output: ./dataset/masks
      keywords: "hat,glasses,sunglasses,scarf,necklace,earring"
      model: sam3
      threshold: 0.3
      padding: 5
      blur: 3

  - command: caption
    args:
      input: ./dataset
      output: ./dataset
      florence-2-large: true

Vision API Captioning with Custom Provider

Use a third-party OpenAI-compatible API for captioning when you want higher-quality descriptions than Florence-2. Works with OpenRouter, Together, or any compatible endpoint.

yaml
# api-caption.yaml
# Requires: OPENAI_API_KEY and OPENAI_BASE_URL env vars
steps:
  - command: resize
    args:
      input: ./raw
      output: ./dataset
      resolution: 1024x1024

  - command: caption
    args:
      input: ./dataset
      output: ./dataset
      llm-api: true
      model: gpt-5-nano
      trigger-word: "ohwx person,"
      prompt: "Describe this person's appearance, clothing, pose, expression, and setting in one detailed paragraph. Do not mention image quality or photography terms."
      temperature: 0.3
      max-tokens: 200
bash
# Run with OpenRouter
OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
OPENAI_API_KEY=sk-or-... \
datasety workflow -f api-caption.yaml

Multi-Resolution Dataset

Some trainers benefit from images at multiple resolutions. This workflow outputs the same source at three common training sizes.

yaml
# multi-res.yaml
# Input: ./raw/ containing high-res source images (>= 2048px)
steps:
  - command: resize
    args:
      input: ./raw
      output: ./dataset_512
      resolution: 512x512

  - command: resize
    args:
      input: ./raw
      output: ./dataset_768
      resolution: 768x768

  - command: resize
    args:
      input: ./raw
      output: ./dataset_1024
      resolution: 1024x1024

  - command: caption
    args:
      input: ./dataset_1024
      output: ./dataset_512
      trigger-word: "ohwx,"

  - command: caption
    args:
      input: ./dataset_1024
      output: ./dataset_768
      trigger-word: "ohwx,"

  - command: caption
    args:
      input: ./dataset_1024
      output: ./dataset_1024
      trigger-word: "ohwx,"

Sweep Then Train

Use sweep to find optimal generation parameters on a small sample, then apply the best settings to the full dataset.

bash
# Step 1: Test on 2-3 images to find the best steps + cfg-scale
mkdir ./sample && cp ./dataset/001.jpg ./dataset/002.jpg ./sample/

datasety sweep \
    -i ./sample -o ./sweep_results \
    -p "the person is wearing aviator sunglasses" \
    --steps 2,4,8 \
    --cfg-scale 1.5,2.5,3.5 \
    --seed 42 --run

# Step 2: Visually inspect ./sweep_results/steps4_cfg2.5/ etc.
# Pick the best combination, then apply to the full dataset:
yaml
# full-augment.yaml
steps:
  - command: synthetic
    args:
      input: ./dataset
      output: ./augmented
      prompt: "the person is wearing aviator sunglasses"
      steps: 4
      cfg-scale: 2.5
      seed: 42

  - command: caption
    args:
      input: ./augmented
      output: ./augmented
      trigger-word: "ohwx person,"

Cyanotype Style LoRA (API dataset + two models)

Train a cyanotype photographic style LoRA — the 1842 UV contact-print process producing Prussian-blue and bleached-white prints — on images generated via the FLUX API. The finished LoRAs let you:

  • FLUX.2-klein-base-4B: add the cyanotype style to any img2img edit
  • Qwen-Image-Edit-2511: convert any photograph to a cyanotype print

This workflow was run end-to-end and produced working LoRAs. See examples/cyanotype_lora/ for the full output including trained weights, dataset, and inference results.

yaml
# cyanotype-lora.yaml
# Requires: OPENAI_API_KEY + OPENAI_BASE_URL (OpenRouter) + HF_TOKEN
steps:
  - command: character
    args:
      output: ./dataset/raw
      num-images: 30
      prompts-file: ./prompts.txt # 30 curated cyanotype subject prompts
      image-api: true
      model: black-forest-labs/flux.2-klein-4b
      api-aspect-ratio: "1:1"
      api-image-size: 1K
      output-format: png

  - command: resize
    args:
      input: ./dataset/raw
      output: ./dataset/prepared
      resolution: 512x512
      crop-position: center

  - command: caption
    args:
      input: ./dataset/prepared
      output: ./dataset/prepared
      trigger-word: "cyanotype,"
      llm-api: true
      model: google/gemini-2.5-flash
      prompt: >
        Describe this image's subject and composition in one sentence.
        Focus on WHAT is depicted, not the style or color.
      temperature: 0.3
      max-tokens: 80

  - command: train
    args:
      input: ./dataset/prepared
      output: ./lora/cyanotype_flux4b.safetensors
      model: black-forest-labs/FLUX.2-klein-base-4B
      steps: 500
      lr: 1e-4
      lora-rank: 16
      timestep-type: sigmoid
      caption-dropout: 0.05
      lr-scheduler: cosine
      lr-warmup-steps: 50
      validation-split: 0.1
      seed: 42

  - command: train
    args:
      input: ./dataset/prepared
      output: ./lora/cyanotype_qwen.safetensors
      model: Qwen/Qwen-Image-Edit-2511
      steps: 300
      lr: 5e-5
      lora-rank: 16
      timestep-type: sigmoid
      caption-dropout: 0.05
      lr-scheduler: cosine
      lr-warmup-steps: 30
      validation-split: 0.1
      seed: 42
bash
datasety workflow -f cyanotype-lora.yaml --dry-run
datasety workflow -f cyanotype-lora.yaml

# Apply trained LoRA — FLUX img2img
datasety synthetic --input-image photo.jpg --output-image out.png \
    --model black-forest-labs/FLUX.2-klein-base-4B \
    --lora ./lora/cyanotype_flux4b.safetensors:0.9 \
    --prompt "cyanotype, botanical specimen, Prussian blue and white" \
    --steps 20 --cfg-scale 3.5 --strength 0.75

# Apply trained LoRA — Qwen photo-to-cyanotype
datasety synthetic --input-image photo.jpg --output-image out.png \
    --model Qwen/Qwen-Image-Edit-2511 \
    --lora ./lora/cyanotype_qwen.safetensors:0.8 \
    --prompt "cyanotype, convert to cyanotype print style, Prussian blue and white" \
    --steps 40 --true-cfg-scale 4.0

Train a LoRA from a Character Dataset

Prepare a character dataset using character (LLM-generated prompts + FLUX.2), then train a LoRA adapter on the result.

Note: Training requires the base (undistilled) model. The character generation step uses the fast FP8 inference model; the train step loads the full base model.

yaml
# character-lora.yaml
# Input: Optional reference face image(s) at ./reference/
# Output: ./lora/character_lora.safetensors ready to use with --lora
steps:
  - command: character
    args:
      output: ./character_dataset
      num-images: 50
      llm-ollama: qwen3.5:4b
      model: black-forest-labs/FLUX.2-klein-4b-fp8
      character-description: "a young woman with short auburn hair and freckles"
      steps: 4
      seed: 42

  - command: train
    args:
      input: ./character_dataset
      output: ./lora/character_lora.safetensors
      model: black-forest-labs/FLUX.2-klein-base-4B
      steps: 500
      lr: 1e-4
      lora-rank: 16
      image-size: 512
bash
datasety workflow -f character-lora.yaml --dry-run
datasety workflow -f character-lora.yaml

# Use the trained LoRA for inference
datasety synthetic \
    --input-image photo.jpg \
    --output-image result.png \
    --prompt "ohwx person in a forest" \
    --lora ./lora/character_lora.safetensors:0.8

Shuffled Caption Augmentation

Generate randomized captions to add variety to a training dataset. Each image gets a randomly assembled caption from predefined text groups, which helps prevent the model from memorizing exact phrasings.

yaml
# shuffle-captions.yaml
# Input: ./raw/ containing images
# Generates randomized captions from text groups
steps:
  - command: resize
    args:
      input: ./raw
      output: ./dataset
      resolution: 1024x1024

  - command: shuffle
    args:
      input: ./dataset
      output: ./dataset
      group:
        - "ohwx person,|a photo of ohwx,|ohwx,"
        - "looking at the camera|facing forward|in a relaxed pose|smiling"
        - "natural lighting|soft studio light|bright daylight|warm indoor lighting"
      seed: 42

TTS Audio Dataset from YouTube

Build a TTS training dataset from a YouTube video or a directory of audio files. The audio command transcribes speech, slices audio at word boundaries, and outputs LJSpeech-compatible wavs/ + metadata.csv.

yaml
# tts-from-youtube.yaml
# Input: YouTube URL or local directory
# Output: ./tts_dataset/ with wavs/ and metadata.csv ready for Piper training
steps:
  - command: audio
    args:
      input: "https://www.youtube.com/watch?v=..."
      output: ./tts_dataset
      whisper-model: large-v3
      language: en
      normalize-numbers: true
      workers: 4

  - command: audio
    args:
      input: ./recordings/
      output: ./tts_dataset
      whisper-model: base
      language: uk
      normalize-numbers: true
      workers: 4
      resume: true
bash
datasety workflow -f tts-from-youtube.yaml --dry-run
datasety workflow -f tts-from-youtube.yaml

# Resume later (skips already-processed files)
datasety workflow -f tts-from-youtube.yaml

Tip: Use --workers 4 (or more) to transcribe multiple files in parallel. Use --normalize-numbers to expand digits like 123 into words so the TTS model pronounces them correctly.

Upload TTS Dataset to HuggingFace

After building a TTS dataset, upload it to HuggingFace Hub with auto-generated dataset card:

yaml
# upload-tts.yaml
steps:
  - command: audio
    args:
      input: "https://www.youtube.com/watch?v=..."
      output: ./tts_dataset
      whisper-model: base
      language: en
      workers: 4

  - command: upload
    args:
      path: ./tts_dataset
      repo-id: your-username/my-voice-dataset
      type: audio
      private: true
bash
datasety workflow -f upload-tts.yaml --dry-run
datasety workflow -f upload-tts.yaml

Prepare and Upload LoRA Training Dataset

Resize, caption, and train a LoRA, then upload the adapter:

yaml
# train-and-upload.yaml
steps:
  - command: resize
    args:
      input: ./raw
      output: ./dataset
      resolution: 1024x1024

  - command: caption
    args:
      input: ./dataset
      output: ./dataset
      trigger-word: "ohwx person,"

  - command: train
    args:
      input: ./dataset
      output: ./lora/portrait_lora.safetensors
      model: black-forest-labs/FLUX.2-klein-base-4B
      steps: 500
      lora-rank: 16

  - command: upload
    args:
      path: ./lora/portrait_lora.safetensors
      repo-id: your-username/portrait-lora
      type: model
bash
datasety workflow -f train-and-upload.yaml --dry-run
datasety workflow -f train-and-upload.yaml

Released under the MIT License.