Philadelphia Housing Price Prediction

A Predictive Modeling Framework to Support Property Tax Assessment

Daisy, Liz, Johnny, Parker

2026-03-17

Research Question & Motivation

Research Question

  • Background: Philadelphia is seeking to improve its Automated Valuation Model.
  • Research question: How can residential sale prices prediction be more accurate spagial and demographic?
  • Analytical scope: The segment of residential marketing sales.

Motivation

  • Property tax assessment depends on credible estimates of market value.
  • Inaccurate valuation can lead to unequal policy
  • Better prediction help strengthen AVM

Data Sources

Property Data

  • Residential sales records
  • Sale price
  • Interior area
  • Bedrooms / bathrooms
  • Year built
  • Property type

Neighborhood Data

  • ACS census variables
  • Median income
  • Poverty
  • Education
  • Demographic structure

Spatial Data

  • Transit access
  • Park proximity
  • Distance-based accessibility
  • Other city or amenity measures

Together, these sources capture property characteristics, neighborhood market conditions, and their location information.

Data Source & Cleaning

Source: OPA Properties Public Dataset - 2023-2024 Residential Sales, Philadelphia


Filter 1: Basic Info

  • Residential: Single Family & Multi Family
  • Years: sales in 2023 & 2024
  • State: PA only

Filter 2: Implausible Values

  • Living area: > 0 sq ft
  • Bedrooms: > 0
  • Sale price: > $50,000

Filter 3: Missing Values

  • Sale price
  • Living area
  • Bedrooms & bathrooms
  • All model variables

23372 residential sales across Philadelphia, 2023-2024.

Spatial Distribution of Residential Sale Prices

Properties Sold by Census Tract

Key takeaways:

  • Sales are unevenly distributed across census tracts, with market activity concentrated in a limited number of neighborhoods and much thinner transaction coverage in others.

Spatial Autocorrelation of Sale Prices

Key takeaways:

High-High clusters concentrate in Center City and Chestnut Hill, with smaller clusters in Far Northeast, around UPenn, and the far southern tip.
Low-Low clusters dominate North and West Philadelphia.
The absence of significant autocorrelation at the boundary suggests an abrupt spatial discontinuity.

Model Performance Improves with Each Layer

10-Fold Cross-Validation Results
Model RMSE (log) Approx. Error (%) MAE (log) R-squared
M7: + Neighborhood FE 0.4223 52.5 0.2734 0.6216
M5b: + Spatial 0.4437 55.8 0.2974 0.5822
M4: + Census 0.4554 57.7 0.3088 0.5597
M3: Structural Only 0.5369 71.1 0.3948 0.3882

Bottom line: Model performance improves at each stage, with the largest gain from adding neighborhood context. The final model with neighborhood fixed effects performs best, though prediction error remains meaningful.

Most Influential Predictors

Key Takeaways:

  • Living area is the strongest single predictor

  • Socioeconomics outweigh amenities, confirming that who lives nearby matters more than what’s nearby.

  • Bathrooms signal overall quality.

Final Model Data Errors

Note

M7 predicts mid-range prices well, but systematically overestimates low-price properties and struggles with luxury outliers: consistent with limited training data at the extremes.

Final Model Spatial Errors

The residuals from m7 are relatively dispersed rather than forming large, continuous clusters, which suggests that the final model captures much of the broad spatial structure in Philadelphia housing market. At the same time, some localized market variation is still not fully explained.

Summary

Model Accuracy RMSE = 0.422 (log scale), R-squared = 0.627 (10-fold CV)

Top Predictors

  • Neighborhood fixed effects (largest single gain: R-squared 0.582 -> 0.627)
  • Living area (coef = 0.538): strongest structural predictor
  • BA+ education share (coef = 0.428) & poverty rate (coef = -0.146): socioeconomics outweigh physical amenities

Key Recommendations

  • Current AVM likely undervalues transit-accessible and gentrifying properties
  • Sub-models by neighborhood type would improve accuracy in rapidly changing markets
  • Flag high-residual properties for manual review before formal tax assessment

Thank You

Data Sources

  • City of Philadelphia, Office of Property Assessment. OPA Properties Public Dataset. https://opendataphilly.org
  • U.S. Census Bureau. American Community Survey 5-Year Estimates, 2023. Via tidycensus R package.
  • SEPTA. Transit Stops, Spring 2025. OpenDataPhilly.
  • City of Philadelphia, Parks & Recreation. Street Tree Inventory, 2025. OpenDataPhilly.
  • School District of Philadelphia. Schools Dataset. OpenDataPhilly.
  • City of Philadelphia. Neighborhood Boundaries. OpenDataPhilly.