omegalabsinc/omega-multimodal

The OMEGA Labs Bittensor Subnet dataset is a multimodal dataset aimed at accelerating artificial general intelligence (AGI) research and development. Provided via the Bittensor decentralized network, it aspires to become the world’s largest multimodal dataset, encompassing a wide range of human knowledge and creation. The dataset includes over 1 million hours of video and more than 30 million two‑minute video clips, covering over 50 scene types and more than 15 000 action phrases. Advanced models are used to map video components into a unified latent space, facilitating the development of powerful AGI models with potential impact across multiple industries.

Updated 4/21/2025

hugging_face

Description

OMEGA Labs Bittensor Subnet Dataset Summary

Overview

The OMEGA Labs Bittensor Subnet Dataset is designed to accelerate Artificial General Intelligence (AGI) research by providing a large-scale, multimodal dataset. This dataset includes over 1 million hours of footage and more than 30 million 2-minute video clips, covering over 50 scenarios and 15,000+ action phrases. It leverages advanced models to translate video components into a unified latent space, facilitating the development of AGI models.

Key Features

Constant Stream of Fresh Data: The dataset is regularly updated with new entries, with an estimated addition of 5 million new videos daily.
Rich Data: Data quality is ensured through a reward system based on diversity, richness, and relevance of the data.
Latent Representations: Pre-computed ImageBind embeddings for video, audio, and captions are provided.
Empowering Digital Agents: The dataset supports the development of intelligent agents capable of complex task navigation and user assistance.
Flexible Metadata: Users can filter the dataset by various criteria, including topic relevance and cosine similarity.

Dataset Structure

The dataset includes the following columns:

video_id: Unique identifier for each video clip.
youtube_id: Original YouTube video ID.
description: Description of the video content.
views: Number of views on the original YouTube video.
start_time: Start time of the clip within the original video.
end_time: End time of the clip within the original video.
video_embed: Latent representation of the video content.
audio_embed: Latent representation of the audio content.
description_embed: Latent representation of the video description.
description_relevance_score: Relevance score of the video description.
query_relevance_score: Relevance score of the video to the search query.
query: Search query used to retrieve the video.
submitted_at: Timestamp of when the video was added to the dataset.

Applications

The dataset is applicable for various AGI research and development tasks, including:

Unified Representation Learning: Training models to learn across different modalities.
Any-to-Any Models: Developing models that can translate between various modalities.
Digital Agents: Creating intelligent agents for complex task management.
Immersive Gaming: Enhancing gaming environments with realistic physics and interactions.
Video Understanding: Advancing video processing tasks like transcription, motion analysis, and object detection.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Artificial General Intelligence

Multimodal Learning

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →