Salt Shaker Press: Image analysis

Here is a list of 100 command prompts for image analysis, categorized by the typical computer vision workflow.

Convert the image to grayscale.
Convert the image from the [RGB] color space to [HSV].
Convert the image from the [RGB] color space to [LAB].
Resize the image to a fixed [width]x[height] (e.g., 224x224).
Scale the image by a factor of [N], maintaining aspect ratio.
Rotate the image by [N] degrees.
Flip the image [horizontally].
Flip the image [vertically].
Crop the image to the region of interest (ROI) defined by [x, y, width, height].
Pad the image with a [N]-pixel border using [zero/reflection] padding.
Apply a perspective warp to the image using the 4-point transform [source_points] to [dest_points].
Apply an affine transformation (rotate, scale, shear) using [matrix].

Apply a simple binary threshold at a [threshold_value].
Apply Otsu's thresholding to automatically binarize the image.
Apply adaptive (mean/Gaussian) thresholding to handle varying lighting conditions.
Invert the binary image (white to black and black to white).
Perform a "distance transform" on the binary image.

Detect edges using the Canny edge detector with [low] and [high] thresholds.
Detect edges using the [Sobel] operator on the [X/Y] axis.
Detect corners using the Harris corner detector.
Detect corners using the Shi-Tomasi ("Good Features to Track") algorithm.
Detect keypoints and compute descriptors using SIFT (Scale-Invariant Feature Transform).
Detect keypoints and compute descriptors using ORB (Oriented FAST and Rotated BRIEF).
Extract Histogram of Oriented Gradients (HOG) features.
Detect lines in the image using the Hough Line Transform.
Detect circles in the image using the Hough Circle Transform.
Describe the image's texture using Haralick texture features (GLCM).

Find all contours of the objects in the binary image.
Draw all detected contours on the original image.
Calculate the bounding box for each detected contour.
Calculate the minimum enclosing circle for each contour.
Calculate the area, perimeter, and centroid of each contour.
Filter contours based on a minimum [area/perimeter].
Perform image segmentation using K-means clustering on pixel colors.
Perform semantic segmentation to classify each pixel (e.g., 'sky', 'road', 'car').
Perform instance segmentation to identify individual object instances.
Segment the foreground from the background using the GrabCut algorithm.
Segment the image using the Watershed algorithm.

Detect all [faces] in the image using a Haar cascade classifier.
Detect all [cars, pedestrians] in the image using a pre-trained [YOLO] model.
Detect all [objects] in the image using a pre-trained [SSD/Faster R-CNN] model.
Draw bounding boxes and confidence scores for all detected objects.
Perform template matching to find all occurrences of a [template_image] within the main image.
Count the number of [objects/blobs] in the image.
Find all connected components in the binary image.

Classify the entire image into a category (e.g., 'cat', 'dog', 'landscape') using a pre-trained [ResNet/EfficientNet].
Extract the top [N] predicted labels and their confidence probabilities.
Perform Optical Character Recognition (OCR) to extract all text from the image.
Detect and read license plates from the image.
Detect [facial landmarks] (e.g., eyes, nose, mouth).
Perform facial recognition to identify the [person's name] against a database.
Detect and recognize [hand gestures].
Perform human pose estimation and extract the [body keypoint skeleton].
Determine if the image is [blurry/in focus].

Perform image inpainting to [remove an object] and fill the background.
Colorize a [grayscale] image.
Perform image-to-image translation (e.g., [day to night/sketch to photo]).
Generate a text caption for the image.
Perform Visual Question Answering (VQA) based on the image and the prompt: "[Question?]".
Detect anomalies or defects in [manufacturing/medical] images.
Stitch multiple images together to create a [panorama].
Estimate the [depth map] from a [single/stereo] image.
Reconstruct a 3D model from [multiple 2D images].
Determine if the image is [AI-generated/manipulated/a deepfake].

Calculate the Intersection over Union (IoU) between [predicted_box] and [ground_truth_box].
Calculate the [accuracy/precision/recall] for the classification model.
Generate a confusion matrix for the classifier.
Overlay the [segmentation mask] on the original image with [N]% transparency.
Save the processed image with all annotations to [output/path.png].

Salt Shaker Press