I tried once. The challenges came down to too many variables relative to sample size.
You gave just a small subset of variables which can impact price by over 20%.
The most important when I looked seemed to be:
- Neighbourhood (even street in some cases)
- Number of Bedrooms
- Year of construction
- Year of last renovation
- Floor (apartment)
- View (especially for apartments)
- Number of Bathrooms
- Year of sale
That last one, year of sale, is important because it actually impacts your training data. Do you use five years of data even though local real estate prices have been increasing at 10-20%/year? Or can that be normalized even though some neighbourhoods are changing at different rates than others?
I restricted my research to 1BR apartments in the city of Vancouver. There are some unique challenges for real estate in Canada, but basically the best sample size I could get was about 2000 sales (I think this was all the 1BR sales in a given year). Very few had information on all of the variables I listed above. Suffice it to say, you can't train a model of any quality on that data.
You gave just a small subset of variables which can impact price by over 20%.
The most important when I looked seemed to be:
- Neighbourhood (even street in some cases)
- Number of Bedrooms
- Year of construction
- Year of last renovation
- Floor (apartment)
- View (especially for apartments)
- Number of Bathrooms
- Year of sale
That last one, year of sale, is important because it actually impacts your training data. Do you use five years of data even though local real estate prices have been increasing at 10-20%/year? Or can that be normalized even though some neighbourhoods are changing at different rates than others?
I restricted my research to 1BR apartments in the city of Vancouver. There are some unique challenges for real estate in Canada, but basically the best sample size I could get was about 2000 sales (I think this was all the 1BR sales in a given year). Very few had information on all of the variables I listed above. Suffice it to say, you can't train a model of any quality on that data.