JUHE API Marketplace
DATASET
Open Source Community

KShivendu/dbpedia-entities-openai-1M

OpenAI 1M with DBPedia Entities is a dataset of one million samples designed for feature‑extraction tasks. Each sample includes an `_id`, `title`, `text`, and an `openai` field containing a 1536‑dimensional float32 embedding generated with the text‑embedding‑ada‑002 model. The dataset is English, created in June 2023 for benchmarking pgvector and VectorDB (Qdrant) performance, and will later be expanded to ten million vectors. It is derived from the first one million entries of the BeIR/DBpedia‑Entity dataset.

Updated 2/19/2024
hugging_face

Description

Dataset Overview

Basic Information

  • License: MIT
  • Size: 1M < n < 10M
  • Language: English (en)

Features

  • _id: string
  • title: string
  • text: string
  • openai: sequence of float32 (1536‑dimensional)

Splits

  • Training Set:
    • Samples: 1,000,000
    • Size: 12,383,152 bytes

Task Category

  • Feature Extraction

Dataset Name

  • Pretty Name: OpenAI 1M with DBPedia Entities

Embedding Details

  • Dimension: 1536
  • Embedding Text: title (string) + text (string)
  • Model: text‑embedding‑ada‑002

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Natural Language Processing
Text Embedding

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.