Boston House Prices Regression Project

by Kavon Benbow In this project I will be using certain variables in the “Boston House Prices-Advanced Regression Techniques” dataset from Kaggle. The problem/challenge that I focused on was predicting house prices by using some aspects of the dataset. Introduction Figuring out housing prices has been a problem/issue for many years and its only going…

by Kavon Benbow

In this project I will be using certain variables in the “Boston House Prices-Advanced Regression Techniques” dataset from Kaggle. The problem/challenge that I focused on was predicting house prices by using some aspects of the dataset.

Photo by Michael Tuszynski on Pexels.com

Introduction

Figuring out housing prices has been a problem/issue for many years and its only going to get worse. So in this project we will be predicting house prices using a random forest model and a dataset which includes various aspects of the houses in Boston. The goal is to make a model that accurately predict house prices such as number of rooms and age of the property.

Key Question

The import question her his quite simple in that we are just predicting house prices but the other question is how are we going to do this.

Introducing the data

The dataset that I decided to use was the “Boston House Prices-Advanced Regression Techniques” that I found from Kaggle. This dataset is filled with different valuable about housing in the Boston area it such as the per capita crime rate by town (CRIM) and (TAX) full property tax rate per 10,000.

https://www.kaggle.com/datasets/fedesoriano/the-boston-houseprice-data/data

Pre-Processing The Data

For the pre-processing of the data I started out with loading my data with some of the variables that I was using. For this specific dataset I did not see a opportunity to take out any null values. I also had to decide which variables will be more important to include in my model. I also loaded my library and decided to chose the Random Forest Model.

Model Selection

For this dataset I decided to use the random forest model. The reasoning of this was because the high accuracy that the model offers. On top of that the model uses a lot of decision trees and the more decision trees that you use the more accurate that the model will be. Random forest also was helpful in dealing with the null and missing values.

Model Training

This model was trained to predict housing prices in Boston and the way it was done was I put the data within the Random Forest model and the model takes it a creates multiple decision trees. Each tree is then trained with a sample of the data each sample has slightly different mixtures of characteristics of the house. The model then splits the data in order to find the best price for the house. Once trained each tree will make a prediction on the price of the house based on the results it got.

Model Evaluation

The model performed well we know this because the root mean square error (RSME) was between $3,200–$4,670, which is pretty accurate when you are talking about high house prices. This model can predict the house price with a low margin of error so it could be used for multiple cities and even states.

Model Tuning

Model tuning was important in this project especially concerning the random forest model. I adjusted parameters like the number of trees that were used (n_estimators) and the depth of each tree (max_depth). I also used RandomizedSearchCV to improve the model by testing different randomized settings to save time.

file:///Users/kavon/Downloads/Untitled14.html

Predictions and Final Thoughts

The idea and point of this project was to predict housing prices for neighborhoods in Boston, Massachusetts. The random forest model was decided to be the best one for the dataset. This project could be used to help other people find out prices for houses in many other places. The result was pleasing with little margin of error so the model did work well.

Tags:

Leave a comment