Ecology Lab, PCB 3043L
Ecologists
employ a wide range of statistical analyses in order to analyze and describe the
data they collect. You will thus
need to have a solid foundation in basic statistical methods in order to study
ecology. This lab will introduce
several basic statistical techniques to analyze and describe the data collected
during the field sampling lab in the wetlands of Hennington Pond.
The
first order of business in any data analysis exercise is to set up your data in
tabular format in a spreadsheet
(Excel, Access, QuatroPro, Lotus, etc.). A
well prepared spreadsheet will make statistical analysis relatively seamless
(some packages, such as Excel, allow you to do basic statistics right there in
your spreadsheet). This is best
done by “thinking in columns”. Set
up separate columns for date, time, sample number, sample site, treatment, and
any other information you will need to identify each row of data.
Next set up columns for independent variables that you measured with each datapoint, such as
temperature, water depth, soil depth, etc.
Finally, set up columns for the dependent
variables that you measured. These
are the actual parameters that you set out to quantify in order to directly
answer your questions (plant biomass/m2, number of plant species/m2,
number of stems/m2, etc.). Once
your data are entered into a clean spreadsheet, you are ready to begin your
statistical analysis.
There
are a huge range of statistical tools available to you as an ecologist.
It is clearly beyond the scope of this lab, and even this class, to teach
you everything you need to know to be an ecological biometrician.
However, the following basic statistical tests will likely be handy tools
for you this semester:
1. Descriptive statistics: These include the mean, median, mode, standard deviation, standard error, coefficient of variation, etc. These are called descriptive because they tell you a great deal about your data, and allow you to condense your data into more easily interpreted terms. For example: If you have 4 replicate samples of the same variable, it is much easier to graph these data as the mean ± the standard deviation of these 4 replicates. The coefficient of variation allows you to compare the variance of your 4 replicates about this mean with other means in a way that normalizes your comparison for the different magnitudes of those means. And so on.
2.
T-test,
(Students t-test): This simple
statistic allows you to compare whether your data are significantly
different from some expected value. For
example, you measure water temperature in Hennington pond every day at noon for
the month of September to test the hypothesis that September 2000 has been
significantly hotter for campus aquatic systems than past years. Fortunately, you have access to daily water temperature data
from last September, and from every September in the 1990s.
You set up your spreadsheet with 2000 data paired in rows with 1999 data,
each row being the measurement from the same day in September.
A t-test will tell you whether 2000 was significantly hotter than 1999. Replace the 1999 data with the means of daily water
temperatures for each September day for the 1990s, and you can see if September
2000 has been significantly hotter than the 1990s in general. You can also use t-tests to compare 1 dataset to another by
pairing them together—this is called a paired
t-test, ironically enough!
3.
Regression:
In many cases, you will want to know how your variable of interest
relates to some other, independent measurements you made.
For example, how does plant species number or plant biomass vary with
water depth in a wetland? To
answer this question, you should have a water depth measurement for every count
of species number or biomass (in separate columns in your spreadsheet).
Your question hypothesizes that species number or biomass is dependent on
water depth, making the former your dependent (or X-axis) variable and the
latter your independent (or Y-axis variable).
Plotting these data as a “scattergram”
will give you a visual idea of whether there is any relationship.
A regression will tell you whether this relationship is significant (a
flat line means that water depth does not help explain variation in species
number or biomass, and the regression will thus not be significant).
If it is significant, the slope tells you whether the two are positively
related (water level increases explain increases in species number or
biomass) or negatively related (water
level increases lead to decreases in species number or biomass).
Finally, the regression coefficient, or r2, will tell you how much
variability in species number or biomass is explained by changing water level
(an r2 of 0.80 means that water level explains 80% of the variability
you observed in your dependent variable).
4.
Correlation:
This is a simple calculation of how two variables relate to each other.
It is very similar to the regression test, except that you generate a correlation
coefficient (r instead of r2) and that there is no a
priori assumption of dependence and independence in your variables.
Thus, you will not generate a predictive linear relationship (Y = mX +
b), as you do with regression, but you will be able to tell whether the
correlation is significant or not.
5.
Analysis of Variance (ANOVA): This is probably the most-used single statistical tool in
experimental ecology. The theory
here is that the mathematics of ANOVA allow you to partition the variability you
observe in your dependent variable into predictable and chance partitions.
The predictable partitions are those aspects of your experiment for which
you attempted to control (your treatments,
if you will). Chance variation is
the random component of variability that we cannot do anything about other than
reduce it, wherever possible, and separate it out—which ANOVA does for us.
For example, you set up an experiment to test whether plant species count
or biomass in wetlands varies more because of differences in water depth,
because of differences in which lake on campus you sampled, or both.
Your dependent variable(s) are obvious by now.
Your treatments are “lake” and “water depth”.
Set up your spreadsheet with columns for your dependent variables, lake,
and water depth, such that each row contains the lake code name and water depth
measured when you sampled species counts or biomass. ANOVA will tell you whether a significant portion of the
variability you measured in your dependent variables is explained by lake
differences, by water depth differences, or by some interaction between lake and water level. There are numerous variations (pardon the pun!) on the ANOVA
theme.
Complete the following data analysis worksheet using the statistical tools noted and the data you collected, as a class, from the Field Sampling exercise in the wetlands of Hennington Pond. Your TA will provide you with a class dataset that compiles the sampling of all groups in your class. Be sure to identify what spreadsheet, statistics, and graphics software packages you used to generate your report.
Answer/address all of the
following questions on your own paper. In
many cases, this will require computer printouts of spreadsheets, graphics, or
statistical output.
1.
Explicitly state the
questions/hypotheses that you set out to test in these labs.
2.
(a) For biomass, first calculate plant biomass per m2 for all
3 methods that you employed (preferably in your spreadsheet).
Annotate the methods you used to calculate all three.
(b) Present the compiled class field data in 2 spreadsheets—one
spreadsheet should include all of your Braun-Blanquet and species count data,
the other should include all plant biomass data.
(c) Include a key to your column headings, if they are not clear. Recall that you have at least 1 dependent variable for each
of the 4 sampling methods that you used (plot/quadrat, point-quarter, transect,
and biomass).
3.
(a) Calculate means and standard deviations for the replicates of your
plot/quadrat Braun-Blanquet samples and count samples, (b) of your point-quarter
Braun-Blanquet samples and count samples, (c) of your transect Braun-Blanquet
samples and count samples, and (d) of the 3 different methods you used to
calculate biomass. (e) Plot
these means as bar graphs, and show standard deviations as error bars.
4.
(a) Use a paired t-test to determine whether the Braun-Blanquet and count
methods are comparable for the 3 different sampling techniques in which you used
them. Note that comparable = not
significantly different. (b) If
they are, which do you recommend using and why?
If they are not, which do you think is the better method?
(c) Does it seem to matter which method you use with which sampling
technique?
5.
(a) Plot means and standard deviations of your data from the 3 different
methods of determining aboveground biomass—bar plots will probably work best.
(b) Use ANOVA to determine whether there is a difference between the
results you have from each (Hint: Your
treatment here is method type). (c)
Briefly discuss your results and, if you did see a treatment effect, note which
of the 3 methods you think is most accurate and which is least accurate.
6.
Now let’s look at relationships between your dependent variables—1)
species counts, from Braun-Blanquet and actual counts, and 2) biomass, from the
3 methods—and some independent variables.
(a) What key independent variables did you measure that may help explain
variation in species count and plant biomass?
(b) What statistical tool will best tell you if these relationships are
in fact significant? (c) Run these
analyses, and show your results. (d)
Then show a plot of dependent vs. independent variables for all relationships
that you found that were significant (Hint:
While it is possible to do this with the means from your replicate plots,
you will increase your sample size and thus your chance of observing a
relationship if you use every individual quadrat plot).
Be careful NOT to mix and match data:
That is, don’t try to look for a relationship between water level and
species counts using both Braun-Blanquet data and actual count data unless your
t-test showed no difference between the two.