Back to datasets
Dataset assetOpen Source CommunityData AnalysisVideo Advertising

LAMBDA

This dataset is primarily used for analyzing and evaluating the effectiveness of video advertisements. It includes video identifiers (video_id), recall scores (recall_score), YouTube video IDs (youtube_id), and ad details (ad_details). The ad_details field is a structured feature containing sub‑features such as Audio, Brand, Duration, etc. The dataset is split into a training set (1,964 samples) and a test set (219 samples). Total size is 5,707,189 bytes, with a download size of 2,281,142 bytes.

Source
huggingface
Created
Jul 3, 2024
Updated
Jul 3, 2024
Signals
182 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Information

  • Name: Long Term Memorability of Advertisements (LAMBDA)
  • License: MIT
  • Task Categories:
    • Text Classification
    • Text Generation
    • Question Answering
  • Tags:
    • Memorability
    • Long‑Term Memorability
    • Advertising Memorability

Dataset Structure

  • Features:
    • video_id: Identifier of the sample, type int64
    • recall_score: Memorability score of the video, range 0–1, type float64
    • youtube_id: YouTube ID of the video, type string
    • ad_details: Scene features for each video, containing:
      • Audio: Audio, type string
      • Brand: Brand, type string
      • Duration: Duration, type string
      • Orientation: Orientation, type string
      • Pace: Pace, type string
      • Scenes: List of scene items, each with sub‑features:
        • Colors: Colors, type string
        • Description: Description, type string
        • Emotions: Emotions, type string
        • Number: Number, type string
        • Photography Style: Photography Style, type string
        • Tags: Tags, type string
        • Text Shown: Text Shown, type string
        • Tone: Tone, type string
        • Visual Complexity: Visual Complexity, type string
      • Title: Title, type string

Data Splits

  • Training Set:
    • Sample count: 1,964
    • Bytes: 5,490,622.457169034
  • Test Set:
    • Sample count: 219
    • Bytes: 612,243.5428309665

Dataset Sizes

  • Download Size: 2,551,503 bytes
  • Total Size: 6,102,866 bytes

Configuration

  • Default Configuration:
    • Training path: data/train-*
    • Test path: data/test-*

Citation

plaintext @misc{s2024longtermadmemorabilityunderstanding, title={Long‑Term Ad Memorability: Understanding and Generating Memorable Ads}, author={Harini S I au2 and Somesh Singh and Yaman K Singla and Aanisha Bhattacharyya and Veeky Baths and Changyou Chen and Rajiv Ratn Shah and Balaji Krishnamurthy}, year={2024}, eprint={2309.00378}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2309.00378} }

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio