|
Statistics
for geoscientists
How does climate change impact creatures
on this planet? That’s a tough question! To each
its own climate zone sounds so caring. When I posted the
polar bear on this page, I could not make up my mind about
the tiger or the gorilla. Not surprisingly, the United
Nations has chosen the gorilla! It does make sense since
gorillas are so much more approachable than tigers. What
is surprising indeed is that gorillas do trust any humans
at all.
Sharing climate zones is what life
on our little planet is all about. The more so since climate
zones where seasons come and go are a fact of life. The
problem is that the temperature of a climate zone may
only be measured with some finite degree of precision.
That’s why working with sound statistics makes sense.
The problem is that too many geoscientists grew up on
a steady diet of surreal geostatistics where degrees of
freedom matter much less than degrees Celsius, and where
spatial dependence between measured temperatures in ordered
sets is assumed without applying Fisher's F-test to the
variance of a set and the first variance term of the ordered
set. So, it’s about time to take to working with
real statistics when studying climate change. In fact,
those who do take the study of climate change seriously,
should not play silly assume, krige and smooth games.
The most important steps in charting a
sampling variogram are to derive Riemann sums for the
ordered set of measured values, and to verify spatial
dependence by applying Fisher’s F-test to the variance
of the set of measured values and the first variance term
of the ordered set. It makes more sense to verify spatial
dependence by applying Fisher's F-test than it does to
infer spatial dependence without proof, and to interpolate
between measured values. Sampling variograms show where
orderliness in sampling units or sample spaces dissipates
into randomness. The concept of degrees of freedom puts
real statistics apart from surreal geostatistics. Find
out first why degrees of freedom do matter so much more
than degrees Celsius when we study climate change.The
rest are details!
Historical Adjusted Climate Data
base for Canada
I applied for and was granted permission to access
Environment Canada's massive data base.
I downloaded several sets of monthly temperatures for
a few interesting locations. The first set was for Coral
Harbour, Territory of Nunavut. Stats and charts for annual
temperatures are derived with Excel spreadsheet templates.
Coral Harbour,
Period 1933-2007
Iqaluit,
Period 1946-2007
Calgary
International Airport, Period 1913-2007
Ottawa
International Airport, Period 1939-2007
Toronto
International Airport, Period 1940-2008
Vancouver
International Airport, Period 1937-2007
Victoria
International Airport, Period 1948-2007
Lower troposphere global temperature:
1979-2008
The first step in this statistical analysis is to derive
the sampling variogram
and determine where spatial dependence in our own sample
space of time dissipates into randomness. The lag of a
sampling variogram shows why it makes sense to partition
the complete set into annual subsets. Several spreadsheet
templates give relevant statistics such as Fisher's
F-test, Student's
t-test, the
chi-square test, and 95%
confidence limits for annual subsets.
Wolfer 1749-1924 sunspot data
Box and Jenkins' 1976 Time Series Analysis gives
sunspot data for 1770 to 1869. I do not have the larger
set to which MATLAB applied Fourier analysis. Fisher's
F-test for spatial dependence cannot applied to Fourier
transforms. I have used annual sunspot counts from 1749
to 1924. Here is the sampling
variogram for this set. The next step was to partition
the set into subsets such that each covers a single cycle.
I deleted all but the first variance terms. I also deleted
the very term where each cycle reverses. Here are the
statistics for all subsets.
The final step is to derive lower and upper limits of
symmetric 95% confidence ranges for all subsets that together
constitute Wolfer 1749-1924. Here's some kind of control
chart. Sunspots do vary from year to year as a function
of the sun's surface. That's why a weighting factor is
applied to each annual counts. It makes sense to take
weighing factors into account when deriving the statistics
for historical sunspot counts.
Wikipedia on Kriging
The most important step in verifying the validity of 95%
confidence intervals in Figure
1 is to test for spatial dependence between measured
values in the ordered set that underpins this figure.
Fisher's F-test proves the observed F-value between the
first variance term of the ordered set and the variance
of the set to be below the tabulated F-value at 95% probability.
Hence, the set of measured values is randomly distributed
within its sample space. It is just as much a scientific
fraud to interpolate within this sample space as it would
be to extrapolate beyond it. Study this
file and see what happens when Agterberg's functionally
dependent distance-weighted averages are inserted between
a set of measured values. Is't scientifc fraud or just
a touch of junk statististis? Many years after Matheron
forgot to derive variances of length-weighted average
lead and silver grades I show how to work with weighing
factors when measured values are unevenly spaced within
a sample space.
|