~# datasetyDataset - it's easy!

One tool for the full dataset pipeline — resize, caption, align, generate, mask, filter, degrade, train LoRA adapters, train TTS voices, upload to HuggingFace, and automate with workflows.

Get Started

GitHub

📐

Resize & Crop

Batch resize to exact dimensions with top/center/bottom crop. Supports single-image and directory modes.

Learn more

📝

Caption

Florence-2 (local, 0.23B or 0.77B) or any OpenAI-compatible vision API. Templates, custom prompts.

Learn more

🎨

Synthetic Editing

Image editing with FLUX.2, Qwen, SDXL, LongCat, HunyuanImage. LoRA, GGUF quantization, CPU offload.

Learn more

🎭

Segmentation Masks

Text-prompted masks with SAM 3, SAM 2, or CLIPSeg. Padding, blur, invert options.

Learn more

🔎

Content Filter

Filter datasets by content with CLIP (any text) or NudeNet (NSFW). Move, copy, delete, or keep.

Learn more

🧑

Character Generation

Identity-preserving datasets from reference faces. LLM-generated prompts.

Learn more

🔍

Parameter Sweep

Grid search over steps, CFG scale, strength, and LoRA. Generates inspectable YAML workflows.

Learn more

🔄

Workflows

Define multi-step pipelines in YAML/JSON. Dry-run validates everything before execution.

Learn more

🎥

Audio Dataset

Build TTS datasets from video/audio/YouTube. Whisper transcription, word-boundary alignment, parallel workers.

Learn more

📹

Video Dataset

Build video datasets from YouTube or local files. Speech-based segmentation, timestamp naming, stream-copy or re-encode.

Learn more

🔗

Align Pairs

Match control/target dimensions, enforce multiples of 32, unify formats for training pairs.

Learn more

📉

Degrade

9 degradation types for upscale training — JPEG artifacts, noise, blur, pixelation, and more. Pure Pillow.

Learn more

🎲

Shuffle Captions

Random caption generation from text groups. Inline, file, or URL sources with seed control.

Learn more

🧠

LoRA Fine-Tuning

Train LoRA adapters for FLUX.2-klein, SDXL, and Qwen from image + caption datasets. Flow-matching and DDPM. Saves .safetensors.

Learn more

🎤

TTS Training

Train Piper TTS voices from audio datasets (metadata.csv + wavs/). Multi-GPU via PyTorch Lightning, background voice watcher for real-time testing.

Learn more

📤

Upload to Hub

Upload datasets and models to HuggingFace Hub. Auto-generates HF-compliant README cards with YAML frontmatter.

Learn more

Quick Install

bash

pip install datasety          # core
pip install datasety[all]     # everything

Example Pipeline

bash

# 1. Resize raw photos
datasety resize -i ./raw -o ./dataset -r 1024x1024

# 2. Generate captions with a template
datasety caption -i ./dataset -o ./dataset --template "[trigger] {{caption}}"

# 3. Generate face masks for focused training
datasety mask -i ./dataset -o ./masks -k "face,hair"

Or define it as a workflow:

yaml

# datasety.yaml
steps:
  - command: resize
    args: { input: ./raw, output: ./dataset, resolution: 1024x1024 }
  - command: caption
    args:
      { input: ./dataset, output: ./dataset, template: "[trigger] {{caption}}" }
  - command: mask
    args: { input: ./dataset, output: ./masks, keywords: "face,hair" }

bash

datasety workflow --dry-run    # validate
datasety workflow              # execute

~# datasetyDataset - it's easy!

Resize & Crop

Caption

Synthetic Editing

Segmentation Masks

Content Filter

Character Generation

Parameter Sweep

Workflows

Audio Dataset

Video Dataset

Align Pairs

Degrade

Shuffle Captions

LoRA Fine-Tuning

TTS Training

Upload to Hub

Quick Install ​

Example Pipeline ​

Quick Install

Example Pipeline