Revvin’ up your Rengine

Listen to the PC roar

Data under tension

Beggin’ you to click and run

I was thrilled when Laura Poggio presented a slide with a Venn diagram with sections labelled “Danger Zone” at the Pedometrics 2017 conference in Wageningen, June 2017. It brought me back to the 1980s and the song was stuck in my head for a few days.

Laura’s keynote is on Fusing data and expert knowledge for digital soil assessment and she highlighted various tools that Pedometricians use to solve their problems. The diagram is as below:

*Diagram modified after Santacruz, 2016*

The Venn diagram is a modification of Drew Conway and Michael Malak’s Data Science diagram. The main point is that there are few Danger Zones we should be avoid in Pedometrics.

It is interesting to compare it to Tom Hengl’s pedometrics diagram made more than 10 years ago. Probably it reflects how “data science” has grown.

The “danger zone” diagram depicts the common tools used by pedometricians nowadays: statistics (and maths), spatial sciences, domain expertise (soil science) and hacking (programming) skills. The interface of all four skills create a “good” Spatial Data Science.

However, the mixture of 2 or 3 skills can create interesting zones which we often see and experience nowadays.

A soil scientist can do lots of statistical analysis without the need of spatial statistics in a *spatially-unaware data science*. That is still quite common and we need to make them aware that there is additional benefit if we consider spatial information.

Someone can create a beautiful soil map using correct statistics and raster packages in R in a *domain-unaware data science*. This is commonly done by machine learning professionals who tend to apply algorithms without an understanding of the domain they’re analysing. I think this should be called the *Triple Danger Zone* following description below. Many have taken a ride into this zone. It is just a reminder that basic soil science is as important as hacking and statistical skills.

There are the *Danger *and* Double Danger Zone* where one can have a great hacking (or R programming skills) working with a soil scientist in analysing their data. With the proliferation of packages in R or freely available software where one can simple plug soil data in the model without understanding the mathematical and statistical principles behind it. We can blindly use machine learning models and pull out the best R^{2} but lack an understanding of what the assumptions of the model and what the parameters mean. Conway called these “know enough to be dangerous”, but I think the *domain-unaware data science* could be more dangerous in pedometrics.

With proliferation of models and hacking skills, although sounds cool, we don’t want to drives ourselves in a Highway to the Danger Zone. I am not claiming to be an expert on all of these, but it is always good to acknowledge limitations of our models and collaboration with people with different skills can keep soil data science progress.