Digital CEQR 

Predicting Neighborhood Socio-economic Condition in NYC


Problem Statement




Our Approach



Prediction of Neighborhood Gentrification Risk

Logistic Regression

Based on our results for the supervised learning model, we achieved an 82% accuracy and 87% recall for the prediction of gentrification. The final results of this elimination resulted in an increase of accuracy scores to 82% with recall being close to 87%. Based on the results of the models, we were able to infer that a decrease in evenness of income distribution, decrease in evenness of educational attainment, decrease in the racial composition, decrease in crowding within structures with 10 or more units and spatial lags for home value, rent and income distribution all play a major role in suggesting whether or not a tract has gentrified. 


K-mean Clustering

Those factors all play a major role in suggesting whether or not a tract has gentrified:

Based on the clustering results, the two types of tracts that we got had major differences in the change in median income, educational attainment, the population living in poverty, unemployed population, minority race population composition, housing in structures with 10 or more units and change in mobile units. For tracts where incomes significantly rose within the 5 year period, it was also observed that people living in structures with 10 or more units declined, indicating a possibility of ongoing displacement within tracts that have gentrified.


Decrease in evenness of income distribution

Decrease in crowding within structures with 10 or more units

Decrease in evenness of educational attainment

 Spatial lags for home value, rent and income distribution

Decrease in the racial composition

Web Tool

We developed a visualization tool to show the results of this project. By accessing the page of the tool it is possible to select the visualization showing the trends of each one of the five factors described above. Also, on the page, we added the conclusion of the analysis for the Red Hook area, which is an area of interest to the sponsors of this project. It is worth noting that this is a minimum viable product version of the proposed tool, we believe further versions of this tool could be developed incorporating the analysis for other regions, so the work could be generalized.



With the support of machine learning techniques, we are able to predict the risk of the gentrification of neighborhoods in New York City using the five year ACS demographics and social vulnerability index dataset. The finds of this research also suggest that change in income and ethnicity related factors play important roles in predicting neighborhood gentrification risk. Since the city agencies currently only use project-specific CEQR analysis but excluding macro demographic and economic factors, our results can potentially provide them with a better understanding of city-wide displacement risk during the decision making process.


The K-means Clustering results demonstrate that the declining population in structure with 10 units or more is also an important indicator of gentrification. This conclusion also supplies a remedy option for the current CEQR analysis approach where only low-income tenants living in 1-4 unit buildings are considered vulnerable to displacement while excluding those living in larger buildings. 


Our Team

WhatsApp Image 2020-07-15 at 7.30.30 PM.
WhatsApp Image 2020-07-15 at 11.14.11 PM

 Project Sponsored by

Reach Out for Any Questions!!

©2020 by Digital CEQR Team.

This site was designed with the
website builder. Create your website today.
Start Now