Skip to content

filter

Filter, curate, or clean datasets based on image content using CLIP (arbitrary text queries) or NudeNet (NSFW label detection).

Usage

bash
datasety filter --input ./dataset --output ./rejected --query "leg,male face" --action move

Supported Models

ModelDescriptionSpeed
clip (default)CLIP ViT-B/32 — zero-shot classification, any text~50ms/img
nudenetNudeNet ONNX — NSFW detection, 18 fixed labels~10ms/img

NudeNet Labels

FEMALE_GENITALIA_COVERED, FACE_FEMALE, BUTTOCKS_EXPOSED,
FEMALE_BREAST_EXPOSED, FEMALE_GENITALIA_EXPOSED, MALE_BREAST_EXPOSED,
ANUS_EXPOSED, FEET_EXPOSED, BELLY_COVERED, FEET_COVERED,
ARMPITS_COVERED, ARMPITS_EXPOSED, FACE_MALE, BELLY_EXPOSED,
MALE_GENITALIA_EXPOSED, ANUS_COVERED, FEMALE_BREAST_COVERED,
BUTTOCKS_COVERED

Actions

ActionBehavior
moveMove matching images to --output (default)
copyCopy matching images to --output
deletePermanently delete matching images (requires --confirm)
keepInverse — remove/move everything that does NOT match

With --action keep:

  • If --output is set, non-matching images are moved there (safe).
  • Without --output, non-matching images are deleted (requires --confirm).

Companion files (.txt, .caption, .json) are always handled alongside their images.

Options

OptionDescriptionDefault
--input, -iInput directory(required)
--output, -oOutput directory for matched/rejected images
--query, -qComma-separated text queries (CLIP)
--labels, -lComma-separated NudeNet labels
--modelclip or nudenetclip
--actionmove, copy, delete, or keepmove
--thresholdConfidence threshold (0.0-1.0)0.5
--deviceauto, cpu, cuda, or mpsauto
--invertInvert match logic (act on non-matches)false
--confirmRequired for destructive actionsfalse
--preserve-structureKeep subfolder hierarchy in output (with --recursive)false
--logWrite CSV log of all decisions to this path
--dry-runPreview detections without modifying filesfalse
--recursive, -RSearch input directory recursivelyfalse
--progressShow tqdm progress barfalse

Examples

bash
# Move images with legs or male faces to a reject folder
datasety filter -i ./dataset -o ./rejected --query "leg,male face" --action move

# Delete NSFW images using NudeNet
datasety filter -i ./dataset --labels "FEMALE_BREAST_EXPOSED,MALE_GENITALIA_EXPOSED" \
    --action delete --model nudenet --threshold 0.6 --confirm

# Keep only images matching "hat and socks", move the rest
datasety filter -i ./dataset -o ./rejected --query "hat and socks" --action keep

# Preview without changes
datasety filter -i ./dataset --query "blurry,low quality" --action delete --dry-run -R

# Scan nested folders, preserve structure, write log
datasety filter -i ./dataset -o ./filtered --query "outdoor scene" \
    --action copy -R --preserve-structure --log filter_log.csv

Released under the MIT License.