JUHE API Marketplace
DATASET
Open Source Community

Real Estate Dataset

This dataset contains real‑estate information intended for a vector‑search‑based property recommendation system. Each attribute is embedded to support natural‑language queries and AI‑enhanced response generation.

Updated 10/2/2024
github

Description

Real Estate Vector Search API Dataset Overview

Overview

The project implements a vector‑search‑based real‑estate recommendation system using MongoDB, OpenAI embeddings, and Flask. Users can issue natural‑language queries to search for properties, leveraging vector similarity to retrieve relevant listings and generate AI‑enhanced responses.

Dataset

  • File name: dataset.csv
  • Location: /data/dataset.csv
  • Purpose: Contains real‑estate records used to generate embedding vectors stored in MongoDB.

Data Loading and Embedding

  • Script: load_data.py
  • Location: /scripts/load_data.py
  • Function:
    • Load real‑estate data
    • Generate an embedding vector for each property
    • Store data and vectors in MongoDB
    • Create the required vector‑search index

Technology Stack

  • Programming Language: Python 3.8+
  • Framework: Flask
  • Database: MongoDB Atlas
  • API: OpenAI API

Installation and Execution

  1. Clone the repository:
    git clone https://github.com/yourusername/vector_search_project.git
    cd vector_search_project
    
  2. Create and activate a virtual environment:
    python -m venv venv
    source venv/bin/activate   # Windows: `venv\Scripts\activate`
    
  3. Install dependencies:
    pip install -r requirements.txt
    
  4. Set environment variables: Create a .env file containing:
    OPENAI_API_KEY=your_openai_api_key
    MONGO_URI=your_mongodb_connection_string
    
  5. Load data and generate embeddings:
    python scripts/load_data.py
    
  6. Start the application:
    python app.py
    

API Usage

  • Endpoint: POST /vector_search
  • Request Body:
    {
      "query": "3 bedroom house in Aguadilla under $200,000"
    }
    
  • Response:
    {
      "response": "Detailed AI‑generated response about matching properties",
      "source_information": "Information about the properties used to generate the response"
    }
    

Example Queries

  1. Basic location and bedroom query:
    {"query": "3 bedroom houses in Aguadilla"}
    
  2. Price‑range query:
    {"query": "homes under $150,000 in San Juan"}
    
  3. Complex feature query:
    {"query": "large houses with more than 2000 square feet and a pool"}
    

Technical Details

  • Vector Search Implementation: Utilises MongoDB’s vector‑search feature via the following aggregation pipeline:
    pipeline = [
        {
            "$vectorSearch": {
                "index": "vector_index",
                "queryVector": query_embedding,
                "path": "embedding_vector",
                "numCandidates": 150,
                "limit": 5
            }
        },
        {
            "$project": {
                "_id": 0,
                "brokered_by": 1,
                "status": 1,
                "price": 1,
                # ... other fields
            }
        }
    ]
    
  • Embedding Generation: Uses OpenAI’s text-embedding-3-small model.

Frequently Asked Questions

  1. No results returned:
    • Verify that the vector index was correctly created.
    • Ensure documents contain embedding vectors.
    • Confirm that the query embedding dimensionality matches the stored vectors.
  2. MongoDB connection issues:
    • Check the MongoDB URI in the .env file.
    • Make sure the IP is whitelisted in MongoDB Atlas.

Contributing

  1. Fork the repository
  2. Create a branch for your feature
  3. Commit your changes
  4. Push the branch
  5. Open a Pull Request

License

This project is released under the MIT License – see the LICENSE file for details.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Real Estate
Artificial Intelligence

Source

Organization: github

Created: 10/2/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.