Real Estate Dataset
This dataset contains real‑estate information intended for a vector‑search‑based property recommendation system. Each attribute is embedded to support natural‑language queries and AI‑enhanced response generation.
Description
Real Estate Vector Search API Dataset Overview
Overview
The project implements a vector‑search‑based real‑estate recommendation system using MongoDB, OpenAI embeddings, and Flask. Users can issue natural‑language queries to search for properties, leveraging vector similarity to retrieve relevant listings and generate AI‑enhanced responses.
Dataset
- File name:
dataset.csv - Location:
/data/dataset.csv - Purpose: Contains real‑estate records used to generate embedding vectors stored in MongoDB.
Data Loading and Embedding
- Script:
load_data.py - Location:
/scripts/load_data.py - Function:
- Load real‑estate data
- Generate an embedding vector for each property
- Store data and vectors in MongoDB
- Create the required vector‑search index
Technology Stack
- Programming Language: Python 3.8+
- Framework: Flask
- Database: MongoDB Atlas
- API: OpenAI API
Installation and Execution
- Clone the repository:
git clone https://github.com/yourusername/vector_search_project.git cd vector_search_project - Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # Windows: `venv\Scripts\activate` - Install dependencies:
pip install -r requirements.txt - Set environment variables:
Create a
.envfile containing:OPENAI_API_KEY=your_openai_api_key MONGO_URI=your_mongodb_connection_string - Load data and generate embeddings:
python scripts/load_data.py - Start the application:
python app.py
API Usage
- Endpoint:
POST /vector_search - Request Body:
{ "query": "3 bedroom house in Aguadilla under $200,000" } - Response:
{ "response": "Detailed AI‑generated response about matching properties", "source_information": "Information about the properties used to generate the response" }
Example Queries
- Basic location and bedroom query:
{"query": "3 bedroom houses in Aguadilla"} - Price‑range query:
{"query": "homes under $150,000 in San Juan"} - Complex feature query:
{"query": "large houses with more than 2000 square feet and a pool"}
Technical Details
- Vector Search Implementation: Utilises MongoDB’s vector‑search feature via the following aggregation pipeline:
pipeline = [ { "$vectorSearch": { "index": "vector_index", "queryVector": query_embedding, "path": "embedding_vector", "numCandidates": 150, "limit": 5 } }, { "$project": { "_id": 0, "brokered_by": 1, "status": 1, "price": 1, # ... other fields } } ] - Embedding Generation: Uses OpenAI’s
text-embedding-3-smallmodel.
Frequently Asked Questions
- No results returned:
- Verify that the vector index was correctly created.
- Ensure documents contain embedding vectors.
- Confirm that the query embedding dimensionality matches the stored vectors.
- MongoDB connection issues:
- Check the MongoDB URI in the
.envfile. - Make sure the IP is whitelisted in MongoDB Atlas.
- Check the MongoDB URI in the
Contributing
- Fork the repository
- Create a branch for your feature
- Commit your changes
- Push the branch
- Open a Pull Request
License
This project is released under the MIT License – see the LICENSE file for details.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 10/2/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.