Tasks
Explore ML tasks
Find the right models, datasets, and demos for each ML task area.
Natural Language Processing
Text Generation
Generate text given a prompt, including open-ended generation and conditional generation.
Text Classification
Classify text into predefined categories such as sentiment, topic, or intent.
Token Classification
Label individual tokens in a sequence — used for NER, POS tagging, and chunking.
Question Answering
Extract or generate answers to questions based on a given context passage.
Summarization
Produce a shorter version of a document while preserving key information.
Translation
Convert text from one natural language to another.
Fill-Mask
Predict masked tokens in a sequence — the core pre-training objective for BERT-style models.
Sentence Similarity
Compute semantic similarity scores between pairs of sentences.
Feature Extraction
Extract dense vector embeddings from text for downstream tasks like search and clustering.
Zero-Shot Classification
Classify text into categories never seen during training using natural language labels.
Computer Vision
Image Classification
Assign one or more labels to an input image from a fixed set of categories.
Object Detection
Locate and classify multiple objects within an image with bounding boxes.
Image Segmentation
Assign a class label to every pixel in an image (semantic or instance segmentation).
Depth Estimation
Predict per-pixel depth maps from a single RGB image.
Image-to-Image
Transform or enhance an input image — style transfer, super-resolution, inpainting.
Text-to-Image
Generate photorealistic or artistic images from natural language prompts.
Image-to-Text
Generate descriptive text captions or OCR output from images.
Video Classification
Label video clips with action or event categories.
Audio
Automatic Speech Recognition
Transcribe spoken audio to text — the core task for voice assistants and captioning.
Text-to-Speech
Convert written text to natural-sounding speech audio.
Audio Classification
Classify audio clips into categories such as music genre, speaker, or sound event.
Audio-to-Audio
Transform audio signals — noise reduction, speech enhancement, source separation.
Multimodal
Visual Question Answering
Answer natural language questions about the content of an image.
Document Question Answering
Extract answers from structured documents such as PDFs, forms, and charts.
Image-to-Video
Animate a still image into a short video clip.
Text-to-Video
Generate video from natural language descriptions.
Tabular & Other
Tabular Classification
Predict categorical labels from structured tabular features.
Tabular Regression
Predict continuous numerical targets from structured tabular features.
Reinforcement Learning
Train agents to maximize rewards through interaction with an environment.
Graph Machine Learning
Learn from graph-structured data — node classification, link prediction, graph classification.