KShivendu/dbpedia-entities-openai-1M
OpenAI 1M with DBPedia Entities is a dataset of one million samples designed for feature‑extraction tasks. Each sample includes an `_id`, `title`, `text`, and an `openai` field containing a 1536‑dimensional float32 embedding generated with the text‑embedding‑ada‑002 model. The dataset is English, created in June 2023 for benchmarking pgvector and VectorDB (Qdrant) performance, and will later be expanded to ten million vectors. It is derived from the first one million entries of the BeIR/DBpedia‑Entity dataset.
Description
Dataset Overview
Basic Information
- License: MIT
- Size: 1M < n < 10M
- Language: English (en)
Features
- _id: string
- title: string
- text: string
- openai: sequence of float32 (1536‑dimensional)
Splits
- Training Set:
- Samples: 1,000,000
- Size: 12,383,152 bytes
Task Category
- Feature Extraction
Dataset Name
- Pretty Name: OpenAI 1M with DBPedia Entities
Embedding Details
- Dimension: 1536
- Embedding Text:
title(string) +text(string) - Model: text‑embedding‑ada‑002
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.