Featured Datasets
- XIfr/Openalex-2005-2025
- nick007x/arxiv-papers
- Building-Energy-LCA-HEIG-VD/EcoDynElec-Results
- nguyenphuc2003/nguyenphuc2003
- hourouu/Unet_data
- sayakpaul/sample-datasets
- natgillin/translations-raw
natgillin/translations-raw Frozen, canonical raw bitext consolidated from upstream alvations/mtdata-raw* snapshots (since deleted). This is the read-only source-of-truth for downstream quality-filtering pipelines. 31,663 parquet files (1566.8 GB) 49 language pairs under data/<src-tgt>/ Schema: 5 columns — see below Read-only for downstream pipelines. Do not delete or modify. Schema Each parquet has 5 columns: column type description source string… See the full description on the dataset page: https://huggingface.co/datasets/natgillin/translations-raw.
- Konst12/1
- disentangled-vla/rw
- InteliLab/ict_s2s_refactored
- spacenship/whiskeyClassification
- lethanhhai1986/lethanhhai1986
Discover
Datasets
Browse public and private datasets with the same focused discovery workflow used by models.
Loading datasets...