| Model | RMSE (log) | Approx. Error (%) | MAE (log) | R-squared |
|---|---|---|---|---|
| M7: + Neighborhood FE | 0.4223 | 52.5 | 0.2734 | 0.6216 |
| M5b: + Spatial | 0.4437 | 55.8 | 0.2974 | 0.5822 |
| M4: + Census | 0.4554 | 57.7 | 0.3088 | 0.5597 |
| M3: Structural Only | 0.5369 | 71.1 | 0.3948 | 0.3882 |
A Predictive Modeling Framework to Support Property Tax Assessment
2026-03-17
Together, these sources capture property characteristics, neighborhood market conditions, and their location information.
Source: OPA Properties Public Dataset - 2023-2024 Residential Sales, Philadelphia
Filter 1: Basic Info
Filter 2: Implausible Values
Filter 3: Missing Values
23372 residential sales across Philadelphia, 2023-2024.
Key takeaways:
Key takeaways:
High-High clusters concentrate in Center City and Chestnut Hill, with smaller clusters in Far Northeast, around UPenn, and the far southern tip.
Low-Low clusters dominate North and West Philadelphia.
The absence of significant autocorrelation at the boundary suggests an abrupt spatial discontinuity.
| Model | RMSE (log) | Approx. Error (%) | MAE (log) | R-squared |
|---|---|---|---|---|
| M7: + Neighborhood FE | 0.4223 | 52.5 | 0.2734 | 0.6216 |
| M5b: + Spatial | 0.4437 | 55.8 | 0.2974 | 0.5822 |
| M4: + Census | 0.4554 | 57.7 | 0.3088 | 0.5597 |
| M3: Structural Only | 0.5369 | 71.1 | 0.3948 | 0.3882 |
Bottom line: Model performance improves at each stage, with the largest gain from adding neighborhood context. The final model with neighborhood fixed effects performs best, though prediction error remains meaningful.
Key Takeaways:
Living area is the strongest single predictor
Socioeconomics outweigh amenities, confirming that who lives nearby matters more than what’s nearby.
Bathrooms signal overall quality.
Note
M7 predicts mid-range prices well, but systematically overestimates low-price properties and struggles with luxury outliers: consistent with limited training data at the extremes.
The residuals from m7 are relatively dispersed rather than forming large, continuous clusters, which suggests that the final model captures much of the broad spatial structure in Philadelphia housing market. At the same time, some localized market variation is still not fully explained.
Model Accuracy RMSE = 0.422 (log scale), R-squared = 0.627 (10-fold CV)
Top Predictors
Key Recommendations
Data Sources