| Statistics for geoscientists
How does climate change impact creatures on this planet? That’s a tough question! To each its own climate zone sounds so caring. When I posted the polar bear on this page, I could not make up my mind about the tiger or the gorilla. Not surprisingly, the United Nations has chosen the gorilla! It does make sense since gorillas are so much more approachable than tigers. What is surprising indeed is that gorillas do trust any humans at all.
Sharing climate zones is what life on our little planet is all about. The more so since climate zones where seasons come and go are a fact of life. The problem is that the temperature of a climate zone may only be measured with some finite degree of precision. That’s why working with sound statistics makes sense. The problem is that too many geoscientists grew up on a steady diet of surreal geostatistics where degrees of freedom matter much less than degrees Celsius, and where spatial dependence between measured temperatures in ordered sets is assumed without applying Fisher's F-test to the variance of a set and the first variance term of the ordered set. So, it’s about time to take to working with real statistics when studying climate change. In fact, those who do take the study of climate change seriously, should not play silly kriging and smoothing games.

The most important steps in charting a sampling variogram are to derive Riemann sums for the ordered set of measured values, and to verify spatial dependence by applying Fisher’s F-test to the variance of the set of measured values and the first variance term of the ordered set. It makes more sense to verify spatial dependence by applying Fisher's F-test than it does to infer spatial dependence without proof, and to interpolate between measured values. Sampling variograms show where orderliness in sampling units or sample spaces dissipates into randomness. The concept of degrees of freedom puts real statistics apart from surreal geostatistics. Find out first why degrees of freedom do matter so much more than degrees Celsius when we study climate change.The rest are details!
Historical Adjusted Climate Data base for Canada
I applied for and was granted permission to access Environment Canada's massive data base.
I downloaded several sets of monthly temperatures for a few interesting locations. The first set was for Coral Harbour, Territory of Nunavut. Stats and charts for annual temperatures are derived with Excel spreadsheet templates.
Coral Harbour, Period 1933-2007
Iqaluit, Period 1946-2007
Calgary International Airport, Period 1913-2007
Ottawa International Airport, Period 1939-2007
Toronto International Airport, Period 1940-2008
Vancouver International Airport, Period 1937-2007
Victoria International Airport, Period 1948-2007
Lower troposphere global temperature: 1979-2008
The first step in this statistical analysis is to derive the sampling variogram and determine where spatial dependence in our own sample space of time dissipates into randomness. The lag of a sampling variogram shows why it makes sense to partition the complete set into annual subsets. Several spreadsheet templates give relevant statistics such as Fisher's F-test, Student's t-test, the chi-square test, and 95% confidence limits for annual subsets.
Wolfer 1749-1924 sunspot data
Box and Jenkins' 1976 Time Series Analysis gives sunspot data for 1770 to 1869. I do not have the larger set to which MATLAB applied Fourier analysis. Fisher's F-test for spatial dependence cannot applied to Fourier transforms. I have used annual sunspot counts from 1749 to 1924. Here is the sampling variogram for this set. The next step was to partition the set into subsets such that each covers a single cycle. I deleted all but the first variance terms. I also deleted the very term where each cycle reverses. Here are the statistics for all subsets. The final step is to derive lower and upper limits of symmetric 95% confidence ranges for all subsets that together constitute Wolfer 1749-1924. Here's some kind of control chart. Sunspots do vary from year to year as a function of the sun's surface. That's why a weighting factor is applied to each annual counts. It makes sense to take weighing factors into account when deriving the statistics for historical sunspot counts.
Wikipedia on Kriging
The most important step in verifying the validity of 95% confidence intervals in Figure 1 is to test for spatial dependence between measured values in the ordered set that underpins this figure. Fisher's F-test proves the observed F-value between the first variance term of the ordered set and the variance of the set to be below the tabulated F-value at 95% probability. Hence, the set of measured values is randomly distributed within its sample space. It is just as much a scientific fraud to interpolate within this sample space as it would be to extrapolate beyond it. Study this file and see what happens when Agterberg's functionally dependent distance-weighted averages are inserted between a set of measured values. Is't scientifc fraud or just a touch of junk statististis? Many years after Matheron forgot to derive variances of length-weighted average lead and silver grades I show how to work with weighing factors when measured values are unevenly spaced within a sample space.
|