What is pooling in statistics




















For example, a tissue sample might be considered a pool of single cells. However, more typically when we discuss pooling we mean that biological replicates are processed together at some stage, such as RNA extraction or microarray hybridization.

Typically, the decision to use sample pooling is due to the inability to obtain enough experimental material from a single individual. In this era of single cell processing, this is seldom called for in well-established protocols, but may be required for protocols which require substantial amounts of starting material. On the other hand, pooling is similar to averaging, and properly done pooling can be used to improve precision when the processing costs are high relative to the costs of obtaining biological replicates.

Suppose that you are comparing gene expression in wild-type versus mutant plants in the root tip and that you have the resources to measure 3 biological replicates of each type of plant.

The table shows an estimate for the variance of the data within each group. Although the smallest sample variance Group C: 1. A parameter value such as 2. If we assume that the variance of the groups are equal, the pooled variance formula provides a way to estimate the common variance. The graph at the top of this article visualizes the information in the table and uses a reference line to indicate the pooled variance.

The blue markers indicate the sample variances of each group. The confidence intervals for the population variances are shown by the vertical lines. The pooled variance is indicated by the horizontal reference line. It is the weighted average of the sample variances. If you think that all groups have the same variance, the pooled variance estimates that common variance.

In two-sample t tests and ANOVA tests, you often assume that the variance is constant across different groups in the data. The pooled variance is an estimate of the common variance.

It is a weighted average of the sample variances for each group, where the larger groups are weighted more heavily than smaller groups. You can download a SAS program that computes the pooled variance and creates the graphs in this article.

Although the pooled variance provides an estimate for a common variance among groups, it is not always clear when the assumption of a common variance is valid. The visualization in this article can help, or you can perform a formal "homogeneity of variance" test as part of an ANOVA analysis. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis.

Add a comment. Active Oldest Votes. The Oxford English Dictionary defines pool as: pool, v. Another example would be: you measure blood levels of substance X in males and females. Whether it is statistically correct to do so depends very much on the specific case. Improve this answer. Greg Snow Greg Snow To clarify because I'm trying to combine variances as well from the literature , what you're saying is that to get an 'average' variance for multiple populations, I can take a weighted mean of calculated variances?

How would I weight those variances? Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Thus, the following equalities hold:. As an example, the familiar equation for the estimated standard deviation of n data points about their mean x - is:.

The numerator under the square root sign is clearly a sum of squared deviations of the individual data points about their mean, and the denominator is a degrees of freedom. Because one degree of freedom has been used to calculate the mean, only n — 1 of the deviations are independent.

Why would you want to use an approximation, anyway, when you can do it correctly just as easily? So much for the basics. Figure 1 depicts a system that is used for making routine measurements.

Samples 1 and 2 are physical tangible materials that are submitted for measurement. The numerical results the intangible statistical samples are shown as sets of circles, green for Sample 1 and blue for Sample 2. The numerical results and their means are shown on a measurement axis. The vertical dimension in this lower part of the figure has no meaning—the information is simply spread out upward. Let me step back a moment and emphasize something in the last paragraph.

We all know what a sample is. We can touch it. You have to be careful about this. There are a lot of words we have in common, but our meanings are different. Many measurement laboratories make multiple measurements on each sample that is submitted. These are called replicate measurements—they are repeated measurements of the same thing.

In Figure 1, each sample is measured three times—there are three replicate results for each sample. No one is surprised by this. Everyone seems to make replicate measurements. But why do we do this?



0コメント

  • 1000 / 1000