Back to datasets
Dataset assetOpen Source CommunityReal EstateArtificial Intelligence
Real Estate Dataset
This dataset contains real‑estate information intended for a vector‑search‑based property recommendation system. Each attribute is embedded to support natural‑language queries and AI‑enhanced response generation.
Source
github
Created
Oct 2, 2024
Updated
Oct 2, 2024
Signals
227 views
Availability
Linked source ready
Overview
Dataset description and usage context
Real Estate Vector Search API Dataset Overview
Overview
The project implements a vector‑search‑based real‑estate recommendation system using MongoDB, OpenAI embeddings, and Flask. Users can issue natural‑language queries to search for properties, leveraging vector similarity to retrieve relevant listings and generate AI‑enhanced responses.
Dataset
- File name:
dataset.csv - Location:
/data/dataset.csv - Purpose: Contains real‑estate records used to generate embedding vectors stored in MongoDB.
Data Loading and Embedding
- Script:
load_data.py - Location:
/scripts/load_data.py - Function:
- Load real‑estate data
- Generate an embedding vector for each property
- Store data and vectors in MongoDB
- Create the required vector‑search index
Technology Stack
- Programming Language: Python 3.8+
- Framework: Flask
- Database: MongoDB Atlas
- API: OpenAI API
Installation and Execution
- Clone the repository:
git clone https://github.com/yourusername/vector_search_project.git cd vector_search_project - Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # Windows: `venv\Scripts\activate` - Install dependencies:
pip install -r requirements.txt - Set environment variables:
Create a
.envfile containing:OPENAI_API_KEY=your_openai_api_key MONGO_URI=your_mongodb_connection_string - Load data and generate embeddings:
python scripts/load_data.py - Start the application:
python app.py
API Usage
- Endpoint:
POST /vector_search - Request Body:
{ "query": "3 bedroom house in Aguadilla under $200,000" } - Response:
{ "response": "Detailed AI‑generated response about matching properties", "source_information": "Information about the properties used to generate the response" }
Example Queries
- Basic location and bedroom query:
{"query": "3 bedroom houses in Aguadilla"} - Price‑range query:
{"query": "homes under $150,000 in San Juan"} - Complex feature query:
{"query": "large houses with more than 2000 square feet and a pool"}
Technical Details
- Vector Search Implementation: Utilises MongoDB’s vector‑search feature via the following aggregation pipeline:
pipeline = [ { "$vectorSearch": { "index": "vector_index", "queryVector": query_embedding, "path": "embedding_vector", "numCandidates": 150, "limit": 5 } }, { "$project": { "_id": 0, "brokered_by": 1, "status": 1, "price": 1, # ... other fields } } ] - Embedding Generation: Uses OpenAI’s
text-embedding-3-smallmodel.
Frequently Asked Questions
- No results returned:
- Verify that the vector index was correctly created.
- Ensure documents contain embedding vectors.
- Confirm that the query embedding dimensionality matches the stored vectors.
- MongoDB connection issues:
- Check the MongoDB URI in the
.envfile. - Make sure the IP is whitelisted in MongoDB Atlas.
- Check the MongoDB URI in the
Contributing
- Fork the repository
- Create a branch for your feature
- Commit your changes
- Push the branch
- Open a Pull Request
License
This project is released under the MIT License – see the LICENSE file for details.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.