What can data tell us about nature?

We often focus on the way business or society can be understood through the information it produces, but natural and physical science also stand to benefit. In this project, sections of the Roosevelt National Forest were observed to predict the predominant tree cover type from only cartographic (geography related) variables.


An expedition through a project and national forest

Understanding

Cartographic variables include examples such as elevation or hill slope, and are opposed to remotely sensed information like temperature or a satellite image. Given the unique subject matter knowledge in cartography and forest management, a thorough understanding of available data was crucial.

Iterating

No solution to a new problem ever comes from the first try. Instead, learning from previous success and failure can make the most out of attempts in the future. Various data representations, prediction models, and ensemble techniques were evaluated to reach a final solution.

Analyzing

The results of any project must be put into proper context to inform future work and decisions. The relationship of this project to others in natural environments or geography was also assessed for new paths forward.

Matthew Montrone at Pexels

A dataset with more exploration than most

With data from the US Geological Survey and US Forestry Service, there was a variety of subject matter information to become familiar with.

Two is better than one

Similar to other questions where there are minimal constraints on a solution, an ensemble of different models led to more accurate predictions than any single model on its own.

Yaroslav Shuraev at Pexels
Roosevelt National Forest, US Forest Service

A piece of a unique and important field

Areas not directly commercializable often receive less attention, but that does not affect their value as a scientific endeavor or to society.

from Roosevelt National Forest, courtesy of US Forest Service

Acknowledgements:

This project was originally hosted as a Kaggle competition in ~2015. After the competition closed, the contributors encountered and designed a submission that serves as the basis for the website. 

The original acknowledgements from the Kaggle competition are repeated below:

“Kaggle is hosting this competition for the machine learning community to use for fun and practice. This dataset was provided by Jock A. Blackard and Colorado State University. We also thank the UCI machine learning repository for hosting the dataset. If you use the problem in publication, please cite:

Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science

Aidan Jackson

Contributor & Website Author