Predict masked tokens in a sequence — the core pre-training objective for BERT-style models.
182k models
74k models
22k models
18k models