Water Availability Sources for Land Value Prediction Using Machine Learning Methods
Accurately predicting land values requires a solid understanding of geographical, economic, and management factors affecting water availability in a given region. To this end, most parcel valuation studies utilize hedonic regression - or some of its variants- to model final land prices as the aggregation of all of the intrinsic and market-driven characteristics affecting the land-unit. However, the prediction capabilities of hedonic pricing models are limited when fitted to datasets of increasing complexity across time and space, especially those derived from merging multiple layers of information. In this paper, we used a land valuation dataset, comprising originally 65,000 sale-transaction records paired to 160 possible geographical, socioeconomic, and agricultural features to evaluate three alternatives of penalty-based regression (LASSO, Ridge, and Elastic-Net), and two ensemble learning methods (Random Forest, and, Gradient Boosting Regression). It is expected that the learning algorithms evaluated increase land-price prediction accuracy relative to conventional hedonic pricing models. Also, this project will evaluate the generalization capabilities of the algorithms evaluated in terms of error sources derived from bias (BIAS) and variance (VAR) in the predictions. Future work is expected to assess also local stationarity as well as potential sources of dependence between observations in our dataset.