Dataset assetOpen Source CommunityReal EstateArtificial Intelligence

Real Estate Dataset

This dataset contains real‑estate information intended for a vector‑search‑based property recommendation system. Each attribute is embedded to support natural‑language queries and AI‑enhanced response generation.

Source

github

Created

Oct 2, 2024

Updated

Oct 2, 2024

Signals

227 views

Availability

Linked source ready

Overview

Dataset description and usage context

Real Estate Vector Search API Dataset Overview

Overview

The project implements a vector‑search‑based real‑estate recommendation system using MongoDB, OpenAI embeddings, and Flask. Users can issue natural‑language queries to search for properties, leveraging vector similarity to retrieve relevant listings and generate AI‑enhanced responses.

Dataset

File name: dataset.csv
Location: /data/dataset.csv
Purpose: Contains real‑estate records used to generate embedding vectors stored in MongoDB.

Data Loading and Embedding

Script: load_data.py
Location: /scripts/load_data.py
Function:
- Load real‑estate data
- Generate an embedding vector for each property
- Store data and vectors in MongoDB
- Create the required vector‑search index

Technology Stack

Programming Language: Python 3.8+
Framework: Flask
Database: MongoDB Atlas
API: OpenAI API

Installation and Execution

Clone the repository:

git clone https://github.com/yourusername/vector_search_project.git
cd vector_search_project

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate   # Windows: `venv\Scripts\activate`

Install dependencies:
```
pip install -r requirements.txt
```

Set environment variables: Create a .env file containing:

OPENAI_API_KEY=your_openai_api_key
MONGO_URI=your_mongodb_connection_string

Load data and generate embeddings:
```
python scripts/load_data.py
```
Start the application:
```
python app.py
```

API Usage

Endpoint: POST /vector_search

Request Body:

{
  "query": "3 bedroom house in Aguadilla under $200,000"
}

Response:

{
  "response": "Detailed AI‑generated response about matching properties",
  "source_information": "Information about the properties used to generate the response"
}

Example Queries

Basic location and bedroom query:

{"query": "3 bedroom houses in Aguadilla"}

Price‑range query:

{"query": "homes under $150,000 in San Juan"}

Complex feature query:

{"query": "large houses with more than 2000 square feet and a pool"}

Technical Details

Vector Search Implementation: Utilises MongoDB’s vector‑search feature via the following aggregation pipeline:

pipeline = [
    {
        "$vectorSearch": {
            "index": "vector_index",
            "queryVector": query_embedding,
            "path": "embedding_vector",
            "numCandidates": 150,
            "limit": 5
        }
    },
    {
        "$project": {
            "_id": 0,
            "brokered_by": 1,
            "status": 1,
            "price": 1,
            # ... other fields
        }
    }
]

Embedding Generation: Uses OpenAI’s text-embedding-3-small model.

Frequently Asked Questions

No results returned:
- Verify that the vector index was correctly created.
- Ensure documents contain embedding vectors.
- Confirm that the query embedding dimensionality matches the stored vectors.
MongoDB connection issues:
- Check the MongoDB URI in the .env file.
- Make sure the IP is whitelisted in MongoDB Atlas.

Contributing

Fork the repository
Create a branch for your feature
Commit your changes
Push the branch
Open a Pull Request

License

This project is released under the MIT License – see the LICENSE file for details.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio