Back to datasets
Dataset assetOpen Source CommunityHousing DataCensus Analysis

California Housing

This dataset is a modified version of the California Housing dataset, sourced from Luís Torgo’s page (University of Porto). The original data came from the now‑defunct StatLib repository and can also be obtained from StatLib mirrors. It is constructed from the 1990 U.S. Census, where each row represents a census tract. The dataset includes attributes such as longitude, latitude, median housing age, total rooms, total bedrooms, population, households, median income, median house value, and ocean proximity.

Source
github
Created
May 14, 2023
Updated
Dec 10, 2023
Signals
228 views
Availability
Linked source ready
Overview

Dataset description and usage context

California Housing Dataset Overview

Data Source

  • This dataset is a modified version of the California Housing dataset, the original dataset sourced from Luís Torgo’s page (University of Porto), initially obtained from the StatLib repository.
  • The dataset is built from the 1990 California census, with each row representing a census block group.

Data Adjustments

  • Randomly removed 207 values from the total_bedrooms column to illustrate handling of missing data.
  • Added a categorical attribute ocean_proximity to describe the relative position of each block group to the ocean.

Data Description

  • Attribute List:

    • longitude: longitude
    • latitude: latitude
    • housing_median_age: median house age
    • total_rooms: total number of rooms
    • total_bedrooms: total number of bedrooms
    • population: population
    • households: number of households
    • median_income: median income
    • median_house_value: median house value
    • ocean_proximity: relative position to the ocean
  • ocean_proximity Category Statistics:

    • <1H OCEAN: 9,136
    • INLAND: 6,551
    • NEAR OCEAN: 2,658
    • NEAR BAY: 2,290
    • ISLAND: 5

Dataset Characteristics

  • The dataset contains a variety of geographic and housing‑related attributes, with particular emphasis on the relationship between house value and location.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio