VRI Regression Analysis

Crown closure is a key forest metric representing the proportion of ground covered by tree canopies. It influences habitat suitability, light availability, and stand productivity. This project develops a regression-based machine learning model to predict crown closure using key forest inventory attributes from Vegetation Resource Inventory (VRI) data. This project follows a structured machine learning workflow, including data preprocessing, exploratory analysis, feature selection, model training, and evaluation, to optimize model performance.

The dataset used in this project is sourced from the British Columbia Vegetation Resource Inventory (VRI), which provides stand-level forest attributes collected through remote sensing and field surveys. The target variable for this analysis is crown closure (%), which represents the proportion of the forest floor covered by tree canopies. The predictor variables selected for the model are basal area (m²/ha), VRI live stems per hectare, projected age (years), projected height (m), and whole stem biomass per hectare (Mg/ha).

The original dataset contains VRI data for the whole province of British Columbia. I filtered the data in ArcGIS Pro to exclude all projects except for TFL 38, a tree farm licence located northwest of Squamish, BC. I also filtered out all unnecessary fields.

Note: Selected code snippets are included below to highlight key parts of the project. The complete Jupyter Notebook, including all code and outputs, is available using the link at bottom of the page.

Code Snippet 1


"""Analyze and Visualize VRI Data"""

"""Description:
This module analyzes and visualizes the VRI data using GeoPandas, Matplotlib, and Seaborn.  
- Computes and displays basic statistics for the dataset  
- Retrieves and prints the Coordinate Reference System (CRS)  
- Lists column names and their data types  
- Plots a map of the spatial data with black edges  
- Generates summary statistics for all columns  
- Creates a scatter plot matrix for numeric columns  
- Computes and displays a correlation matrix for numeric columns
"""
print("\n--- Basic Statistics ---")
print(VRI_cleaned.describe())
print("\n--- CRS ---")
print(VRI_cleaned.crs)
print(VRI_cleaned.columns)

VRI_cleaned.plot(figsize=(10, 10), edgecolor="black")
plt.title("TFL 38 VRI - Map")
plt.show()

column_data_types = VRI_cleaned.dtypes
print(column_data_types)

summary_stats = VRI_cleaned.describe()
print("\n--- Summary Statistics ---")
print(summary_stats)

sns.pairplot(VRI_cleaned)
plt.show()

numeric_columns = VRI_cleaned.select_dtypes(include=['float64', 'int64', 'int32'])
correlation_matrix = numeric_columns.corr()
print("\n--- Correlation Matrix ---")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(correlation_matrix)
    

Code Snippet 2


"""Model Training and Evaluation for VRI Data"""

"""Description:
This module trains and evaluates machine learning models on PCA-transformed VRI data.  
- Splits the dataset into features (excluding 'CROWN_CLOSURE') and target variable  
- Performs an 80/20 train-test split with a fixed random seed  
- Defines Random Forest, SVR, and Gradient Boosting regressor models  
- Trains models using 5-fold cross-validation with negative mean squared error scoring  
- Outputs the mean cross-validation MSE scores for each model
"""
# Split the data into features (X) and target variable (y)
X = df_pca.drop(columns='CROWN_CLOSURE')  # Features after PCA
y = df_pca['CROWN_CLOSURE']  # Target variable

# Split into training and testing sets (80/20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the models
rf_model = RandomForestRegressor(random_state=42)
svr_model = SVR()
gbr_model = GradientBoostingRegressor(random_state=42)

# Train the models with cross-validation (using 5 folds)
rf_cv = cross_val_score(rf_model, X_train, y_train, cv=5, scoring='neg_mean_squared_error')
svr_cv = cross_val_score(svr_model, X_train, y_train, cv=5, scoring='neg_mean_squared_error')
gbr_cv = cross_val_score(gbr_model, X_train, y_train, cv=5, scoring='neg_mean_squared_error')

# Output the mean cross-validation scores
print(f"Random Forest CV MSE: {np.mean(rf_cv)}")
print(f"SVR CV MSE: {np.mean(svr_cv)}")
print(f"Gradient Boosting CV MSE: {np.mean(gbr_cv)}")
    

Model Performance Evaluation

The model performance was assessed using Mean Squared Error (MSE) and Root Mean Squared Error (RMSE), both before and after hyperparameter tuning. The final performance for each model is found below:

Among the three models, Random Forest performed the best, achieving the lowest RMSE (8.23), which suggests it had the most accurate predictions for crown closure. SVR had the highest error, indicating it struggled more with the dataset.

Summary

This project used regression-based machine learning to predict crown closure in TFL 38 using attributes from BC's VRI dataset. After filtering and preparing the dataset in ArcGIS Pro, models were trained on the following variables: basal area, live stem density, projected age, height, and biomass. Among the models tested, Random Forest performed the best with an RMSE of 8.23. These results demonstrate the potential of using VRI data and machine learning to support forest structure analysis and management decisions.

Visit the VRI Regression Analysis Project on GitHub

← Back to Portfolio