Back to datasets
Dataset assetOpen Source CommunityReal EstateArtificial Intelligence

Real Estate Dataset

This dataset contains real‑estate information intended for a vector‑search‑based property recommendation system. Each attribute is embedded to support natural‑language queries and AI‑enhanced response generation.

Source
github
Created
Oct 2, 2024
Updated
Oct 2, 2024
Signals
227 views
Availability
Linked source ready
Overview

Dataset description and usage context

Real Estate Vector Search API Dataset Overview

Overview

The project implements a vector‑search‑based real‑estate recommendation system using MongoDB, OpenAI embeddings, and Flask. Users can issue natural‑language queries to search for properties, leveraging vector similarity to retrieve relevant listings and generate AI‑enhanced responses.

Dataset

  • File name: dataset.csv
  • Location: /data/dataset.csv
  • Purpose: Contains real‑estate records used to generate embedding vectors stored in MongoDB.

Data Loading and Embedding

  • Script: load_data.py
  • Location: /scripts/load_data.py
  • Function:
    • Load real‑estate data
    • Generate an embedding vector for each property
    • Store data and vectors in MongoDB
    • Create the required vector‑search index

Technology Stack

  • Programming Language: Python 3.8+
  • Framework: Flask
  • Database: MongoDB Atlas
  • API: OpenAI API

Installation and Execution

  1. Clone the repository:
    git clone https://github.com/yourusername/vector_search_project.git
    cd vector_search_project
    
  2. Create and activate a virtual environment:
    python -m venv venv
    source venv/bin/activate   # Windows: `venv\Scripts\activate`
    
  3. Install dependencies:
    pip install -r requirements.txt
    
  4. Set environment variables: Create a .env file containing:
    OPENAI_API_KEY=your_openai_api_key
    MONGO_URI=your_mongodb_connection_string
    
  5. Load data and generate embeddings:
    python scripts/load_data.py
    
  6. Start the application:
    python app.py
    

API Usage

  • Endpoint: POST /vector_search
  • Request Body:
    {
      "query": "3 bedroom house in Aguadilla under $200,000"
    }
    
  • Response:
    {
      "response": "Detailed AI‑generated response about matching properties",
      "source_information": "Information about the properties used to generate the response"
    }
    

Example Queries

  1. Basic location and bedroom query:
    {"query": "3 bedroom houses in Aguadilla"}
    
  2. Price‑range query:
    {"query": "homes under $150,000 in San Juan"}
    
  3. Complex feature query:
    {"query": "large houses with more than 2000 square feet and a pool"}
    

Technical Details

  • Vector Search Implementation: Utilises MongoDB’s vector‑search feature via the following aggregation pipeline:
    pipeline = [
        {
            "$vectorSearch": {
                "index": "vector_index",
                "queryVector": query_embedding,
                "path": "embedding_vector",
                "numCandidates": 150,
                "limit": 5
            }
        },
        {
            "$project": {
                "_id": 0,
                "brokered_by": 1,
                "status": 1,
                "price": 1,
                # ... other fields
            }
        }
    ]
    
  • Embedding Generation: Uses OpenAI’s text-embedding-3-small model.

Frequently Asked Questions

  1. No results returned:
    • Verify that the vector index was correctly created.
    • Ensure documents contain embedding vectors.
    • Confirm that the query embedding dimensionality matches the stored vectors.
  2. MongoDB connection issues:
    • Check the MongoDB URI in the .env file.
    • Make sure the IP is whitelisted in MongoDB Atlas.

Contributing

  1. Fork the repository
  2. Create a branch for your feature
  3. Commit your changes
  4. Push the branch
  5. Open a Pull Request

License

This project is released under the MIT License – see the LICENSE file for details.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio