Ecology Lab, PCB 3043L

Lab #2 – Sept. 18-19

DATA ANALYSIS TECHNIQUES

GENERAL LAB INTRODUCTION

Ecologists employ a wide range of statistical analyses in order to analyze and describe the data they collect. You will thus need to have a solid foundation in basic statistical methods in order to study ecology. This lab will introduce several basic statistical techniques to analyze and describe the data collected during the field sampling lab in the wetlands of Hennington Pond.

The first order of business in any data analysis exercise is to set up your data in tabular format in a spreadsheet (Excel, Access, QuatroPro, Lotus, etc.). A well prepared spreadsheet will make statistical analysis relatively seamless (some packages, such as Excel, allow you to do basic statistics right there in your spreadsheet). This is best done by “thinking in columns”. Set up separate columns for date, time, sample number, sample site, treatment, and any other information you will need to identify each row of data. Next set up columns for independent variables that you measured with each datapoint, such as temperature, water depth, soil depth, etc. Finally, set up columns for the dependent variables that you measured. These are the actual parameters that you set out to quantify in order to directly answer your questions (plant biomass/m², number of plant species/m², number of stems/m², etc.). Once your data are entered into a clean spreadsheet, you are ready to begin your statistical analysis.

There are a huge range of statistical tools available to you as an ecologist. It is clearly beyond the scope of this lab, and even this class, to teach you everything you need to know to be an ecological biometrician. However, the following basic statistical tests will likely be handy tools for you this semester:

1. Descriptive statistics: These include the mean, median, mode, standard deviation, standard error, coefficient of variation, etc. These are called descriptive because they tell you a great deal about your data, and allow you to condense your data into more easily interpreted terms. For example: If you have 4 replicate samples of the same variable, it is much easier to graph these data as the mean ± the standard deviation of these 4 replicates. The coefficient of variation allows you to compare the variance of your 4 replicates about this mean with other means in a way that normalizes your comparison for the different magnitudes of those means. And so on.

2. T-test, (Students t-test): This simple statistic allows you to compare whether your data are significantly different from some expected value. For example, you measure water temperature in Hennington pond every day at noon for the month of September to test the hypothesis that September 2000 has been significantly hotter for campus aquatic systems than past years. Fortunately, you have access to daily water temperature data from last September, and from every September in the 1990s. You set up your spreadsheet with 2000 data paired in rows with 1999 data, each row being the measurement from the same day in September. A t-test will tell you whether 2000 was significantly hotter than 1999. Replace the 1999 data with the means of daily water temperatures for each September day for the 1990s, and you can see if September 2000 has been significantly hotter than the 1990s in general. You can also use t-tests to compare 1 dataset to another by pairing them together—this is called a paired t-test, ironically enough!

3. Regression: In many cases, you will want to know how your variable of interest relates to some other, independent measurements you made. For example, how does plant species number or plant biomass vary with water depth in a wetland? To answer this question, you should have a water depth measurement for every count of species number or biomass (in separate columns in your spreadsheet). Your question hypothesizes that species number or biomass is dependent on water depth, making the former your dependent (or X-axis) variable and the latter your independent (or Y-axis variable). Plotting these data as a “scattergram” will give you a visual idea of whether there is any relationship. A regression will tell you whether this relationship is significant (a flat line means that water depth does not help explain variation in species number or biomass, and the regression will thus not be significant). If it is significant, the slope tells you whether the two are positively related (water level increases explain increases in species number or biomass) or negatively related (water level increases lead to decreases in species number or biomass). Finally, the regression coefficient, or r², will tell you how much variability in species number or biomass is explained by changing water level (an r² of 0.80 means that water level explains 80% of the variability you observed in your dependent variable).

4. Correlation: This is a simple calculation of how two variables relate to each other. It is very similar to the regression test, except that you generate a correlation coefficient (r instead of r²) and that there is no a priori assumption of dependence and independence in your variables. Thus, you will not generate a predictive linear relationship (Y = mX + b), as you do with regression, but you will be able to tell whether the correlation is significant or not.

5. Analysis of Variance (ANOVA): This is probably the most-used single statistical tool in experimental ecology. The theory here is that the mathematics of ANOVA allow you to partition the variability you observe in your dependent variable into predictable and chance partitions. The predictable partitions are those aspects of your experiment for which you attempted to control (your treatments, if you will). Chance variation is the random component of variability that we cannot do anything about other than reduce it, wherever possible, and separate it out—which ANOVA does for us. For example, you set up an experiment to test whether plant species count or biomass in wetlands varies more because of differences in water depth, because of differences in which lake on campus you sampled, or both. Your dependent variable(s) are obvious by now. Your treatments are “lake” and “water depth”. Set up your spreadsheet with columns for your dependent variables, lake, and water depth, such that each row contains the lake code name and water depth measured when you sampled species counts or biomass. ANOVA will tell you whether a significant portion of the variability you measured in your dependent variables is explained by lake differences, by water depth differences, or by some interaction between lake and water level. There are numerous variations (pardon the pun!) on the ANOVA theme.

LAB INSTRUCTIONS

Complete the following data analysis worksheet using the statistical tools noted and the data you collected, as a class, from the Field Sampling exercise in the wetlands of Hennington Pond. Your TA will provide you with a class dataset that compiles the sampling of all groups in your class. Be sure to identify what spreadsheet, statistics, and graphics software packages you used to generate your report.

Ecology Lab, PCB 3043L

Labs #1 & 2 – Due Sept. 25 & 26

FIELD SAMPLING & DATA ANALYSIS WORKSHEET

Answer/address all of the following questions on your own paper. In many cases, this will require computer printouts of spreadsheets, graphics, or statistical output.

1. Explicitly state the questions/hypotheses that you set out to test in these labs.

2. (a) For biomass, first calculate plant biomass per m² for all 3 methods that you employed (preferably in your spreadsheet). Annotate the methods you used to calculate all three. (b) Present the compiled class field data in 2 spreadsheets—one spreadsheet should include all of your Braun-Blanquet and species count data, the other should include all plant biomass data. (c) Include a key to your column headings, if they are not clear. Recall that you have at least 1 dependent variable for each of the 4 sampling methods that you used (plot/quadrat, point-quarter, transect, and biomass).

3. (a) Calculate means and standard deviations for the replicates of your plot/quadrat Braun-Blanquet samples and count samples, (b) of your point-quarter Braun-Blanquet samples and count samples, (c) of your transect Braun-Blanquet samples and count samples, and (d) of the 3 different methods you used to calculate biomass. (e) Plot these means as bar graphs, and show standard deviations as error bars.

4. (a) Use a paired t-test to determine whether the Braun-Blanquet and count methods are comparable for the 3 different sampling techniques in which you used them. Note that comparable = not significantly different. (b) If they are, which do you recommend using and why? If they are not, which do you think is the better method? (c) Does it seem to matter which method you use with which sampling technique?

5. (a) Plot means and standard deviations of your data from the 3 different methods of determining aboveground biomass—bar plots will probably work best. (b) Use ANOVA to determine whether there is a difference between the results you have from each (Hint: Your treatment here is method type). (c) Briefly discuss your results and, if you did see a treatment effect, note which of the 3 methods you think is most accurate and which is least accurate.

6. Now let’s look at relationships between your dependent variables—1) species counts, from Braun-Blanquet and actual counts, and 2) biomass, from the 3 methods—and some independent variables. (a) What key independent variables did you measure that may help explain variation in species count and plant biomass? (b) What statistical tool will best tell you if these relationships are in fact significant? (c) Run these analyses, and show your results. (d) Then show a plot of dependent vs. independent variables for all relationships that you found that were significant (Hint: While it is possible to do this with the means from your replicate plots, you will increase your sample size and thus your chance of observing a relationship if you use every individual quadrat plot). Be careful NOT to mix and match data: That is, don’t try to look for a relationship between water level and species counts using both Braun-Blanquet data and actual count data unless your t-test showed no difference between the two.