分散分析

関: analysis of variance

Wikipedia preview

出典(authority):フリー百科事典『ウィキペディア（Wikipedia）』「2016/07/24 23:18:11」(JST)

wiki ja

分散分析（ぶんさんぶんせき、英: analysis of variance、略称: ANOVA）は、観測データにおける変動を誤差変動と各要因およびそれらの交互作用による変動に分解することによって、要因および交互作用の効果を判定する、統計的仮説検定の一手法である。

統計学者かつ遺伝学者であるロナルド・フィッシャーによって1920年代から1930年代にかけて基本手法が確立された。そのため「フィッシャーの分散分析」「フィッシャーのANOVA法」とも呼ばれる。

基本的な手法として、まず、データの分散成分の平方和を分解し、誤差による変動から要因効果による変動を分離する。次に、平方和を自由度で割ることで平均平方を算出する。そして、要因効果（または、交互作用）によって説明される平均平方を分子、誤差によって説明される平均平方を分母とすることでF値を計算する（F検定）。各効果の有意性については有意水準を設けて判定する。

交互作用の性質を詳しく調べるには、単純主効果の検定や交互作用対比を行うとよい。また、3つ以上の水準を持つ要因の効果が有意であったとき、具体的にどの群とどの群の間に差があったかを知るためには、多重比較を行う必要がある。したがって、分析の目的によっては、分散分析のみから結論が導かれるものではなく、これらの手法と組み合わせて用いることが肝要である。

分散分析には各種のモデルがあり、データの性質や要因計画の型、検証したい仮説に応じてそれらを使い分けることが適切な利用法である（一元配置分散分析・回帰分散分析・共分散分析など）。現在では、分散分析は一般線形モデル、構造方程式モデリングの一部として扱えることが判明しており、さらなる拡張も可能である（潜在変数に対する分散分析など）。

ソフトウェア

SASやSPSSといった主要な統計パッケージで、分散分析も実行可能である。R言語にも、分散分析に関わる関数がある。また、分散分析やそれに伴う多重比較に特化したソフトウェアもあり、多くはフリーソフトである。

js-STAR^[1]: 田中敏（信州大学教授）作成による"STAR"をJavaScriptに移植したもの。3要因までの分散分析、単純主効果の検定および多重比較（LSD法、HSD法、Bonferroni法、Holm法）が一度にできる。また、その他にχ²検定や相関係数なども扱うことができる。ウェブ上でそのまま使うことができ、ダウンロードすることもできる。インターフェイスがシンプルで、使い方も分かりやすい。仕様の理論的背景は、田中敏・山際勇一郎による『ユーザーのための教育・心理統計と実験計画法－方法の理解から論文の書き方まで－』（教育出版、1992年、新訂版）に基づくと思われる^[誰?]。『実践データ心理解析－問題の発想・データ処理・論文の作成－』（新曜社、2006年、改訂版）には、js-STARの使用法、分散分析表の読み取り方、論文への記述の仕方などが詳しく解説されている。
ANOVA4 on the Web^[2]: 桐木建始（広島女学院大学教授）が作成。4要因までの分散分析および多重比較（Ryan法）が一度に可能。ブラウザ上でそのまま動作し、インストール不要。また、『わかって楽しい心理統計法入門』（北大路書房、2007年）では、ANOVA4の使い方が解説されている。
分散分析プログラム（AIST-ANOVA）^[3]: 独立行政法人産業技術総合研究所計測標準総合センター計測標準部門物性統計科応用統計研究室で開発された、Excelのアドイン。10要因までの分散分析を行うことができる。
Excel NAG 統計解析アドイン^[4]: Excelに分散分析関数を追加するアドイン。アカデミック版、無料の試供版もあり。
ezANOVA^[5]: 単独で動作するソフトなので、ブラウザやExcelを要しない。
MAANOVA^[6]: MATLAB上で分散分析を行う。
Origin: Origin上で分散分析を行う。一元配置／二元配置、繰返しのある配置、Post-hoc検定などを含む。

その他、汎用言語であるC言語のプログラム^[7]、R言語で独自に書かれた関数^[8]などもある。それぞれのプログラミング言語が使える計算機環境と操作能力が必要である。

脚注

^ Nakano Hiroyuki. “js-STAR 2012”. 2012年4月4日閲覧。
^ Kiriki Kenshi (2002年). “ANOVA4 on the Web”. 2012年4月4日閲覧。
^ 粒子計測研究室 NMIJ/AIST (2011年). “不確かさWeb 分散分析プログラム”. 2012年4月4日閲覧。
^ 日本ニューメリカルアルゴリズムズグループ株式会社 (2012年). “Excel NAG 統計解析アドイン”. 2012年4月4日閲覧。
^ Chris Rorden. “ezANOVA free statistical software”. 2012年4月4日閲覧。
^ Bioconfuctor. “maanova”. 2012年4月4日閲覧。
^ H.Akiba (1995年2月27日). “２因子多水準分散分析”. 2012年4月4日閲覧。
^ 渡辺利夫. “R Language”. 2012年4月4日閲覧。

外部リンク

私のための統計処理分散分析
Visual ANOVA 分散分析のイメージを説明するFlashプログラム
反復測定（度）分散分析/基礎と応用　計算法の解説、プログラムのダウンロード。多重比較や球面性の仮定・球面性検定にも詳しい。
心理生理学データの分散分析　反復測定分散分析の使用や論文中への書き方の解説。多重比較や球面性検定の説明も。

wiki en

[Wiki en表示]

Biologist and statistician Ronald Fisher

Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems.

1 History
2 Motivating example
3 Background and terminology
- 3.1 Design-of-experiments terms
- 3.2 Fixed-effects models
- 3.3 Random-effects models
- 3.4 Mixed-effects models
4 Assumptions of ANOVA
- 4.1 Textbook analysis using a normal distribution
- 4.2 Randomization-based analysis
  - 4.2.1 Unit-treatment additivity
  - 4.2.2 Derived linear model
  - 4.2.3 Statistical models for observational data
- 4.3 Summary of assumptions
5 Characteristics of ANOVA
6 Logic of ANOVA
- 6.1 Partitioning of the sum of squares
- 6.2 The F-test
- 6.3 Extended logic
7 ANOVA for a single factor
8 ANOVA for multiple factors
9 Worked numeric examples
10 Associated analysis
- 10.1 Preparatory analysis
  - 10.1.1 The number of experimental units
  - 10.1.2 Power analysis
  - 10.1.3 Effect size
- 10.2 Follow-up analysis
  - 10.2.1 Model confirmation
  - 10.2.2 Follow-up tests
11 Study designs and ANOVAs
12 ANOVA cautions
13 Generalizations
- 13.1 Connection to Linear Regression
  - 13.1.1 Example
14 See also
15 Footnotes
16 Notes
17 References
18 Further reading
19 External links

History

While the analysis of variance reached fruition in the 20th century, antecedents extend centuries into the past according to Stigler.^[1] These include hypothesis testing, the partitioning of sums of squares, experimental techniques and the additive model. Laplace was performing hypothesis testing in the 1770s.^[2] The development of least-squares methods by Laplace and Gauss circa 1800 provided an improved method of combining observations (over the existing practices of astronomy and geodesy). It also initiated much study of the contributions to sums of squares. Laplace soon knew how to estimate a variance from a residual (rather than a total) sum of squares.^[3] By 1827 Laplace was using least squares methods to address ANOVA problems regarding measurements of atmospheric tides.^[4] Before 1800 astronomers had isolated observational errors resulting from reaction times (the "personal equation") and had developed methods of reducing the errors.^[5] The experimental methods used in the study of the personal equation were later accepted by the emerging field of psychology ^[6] which developed strong (full factorial) experimental methods to which randomization and blinding were soon added.^[7] An eloquent non-mathematical explanation of the additive effects model was available in 1885.^[8]

Ronald Fisher introduced the term variance and proposed its formal analysis in a 1918 article The Correlation Between Relatives on the Supposition of Mendelian Inheritance.^[9] His first application of the analysis of variance was published in 1921.^[10] Analysis of variance became widely known after being included in Fisher's 1925 book Statistical Methods for Research Workers.

Randomization models were developed by several researchers. The first was published in Polish by Neyman in 1923.^[11]

One of the attributes of ANOVA which ensured its early popularity was computational elegance. The structure of the additive model allows solution for the additive coefficients by simple algebra rather than by matrix calculations. In the era of mechanical calculators this simplicity was critical. The determination of statistical significance also required access to tables of the F function which were supplied by early statistics texts.

Motivating example

No fit.

Fair fit

Very good fit

The analysis of variance can be used as an exploratory tool to explain observations. A dog show provides an example. A dog show is not a random sampling of the breed: it is typically limited to dogs that are adult, pure-bred, and exemplary. A histogram of dog weights from a show might plausibly be rather complex, like the yellow-orange distribution shown in the illustrations. Suppose we wanted to predict the weight of a dog based on a certain set of characteristics of each dog. Before we could do that, we would need to explain the distribution of weights by dividing the dog population into groups based on those characteristics. A successful grouping will split dogs such that (a) each group has a low variance of dog weights (meaning the group is relatively homogeneous) and (b) the mean of each group is distinct (if two groups have the same mean, then it isn't reasonable to conclude that the groups are, in fact, separate in any meaningful way).

In the illustrations to the right, each group is identified as X₁, X₂, etc. In the first illustration, we divide the dogs according to the product (interaction) of two binary groupings: young vs old, and short-haired vs long-haired (thus, group 1 is young, short-haired dogs, group 2 is young, long-haired dogs, etc.). Since the distributions of dog weight within each of the groups (shown in blue) has a large variance, and since the means are very close across groups, grouping dogs by these characteristics does not produce an effective way to explain the variation in dog weights: knowing which group a dog is in does not allow us to make any reasonable statements as to what that dog's weight is likely to be. Thus, this grouping fails to fit the distribution we are trying to explain (yellow-orange).

An attempt to explain the weight distribution by grouping dogs as (pet vs working breed) and (less athletic vs more athletic) would probably be somewhat more successful (fair fit). The heaviest show dogs are likely to be big strong working breeds, while breeds kept as pets tend to be smaller and thus lighter. As shown by the second illustration, the distributions have variances that are considerably smaller than in the first case, and the means are more reasonably distinguishable. However, the significant overlap of distributions, for example, means that we cannot reliably say that X₁ and X₂ are truly distinct (i.e., it is perhaps reasonably likely that splitting dogs according to the flip of a coin—by pure chance—might produce distributions that look similar).

An attempt to explain weight by breed is likely to produce a very good fit. All Chihuahuas are light and all St Bernards are heavy. The difference in weights between Setters and Pointers does not justify separate breeds. The analysis of variance provides the formal tools to justify these intuitive judgments. A common use of the method is the analysis of experimental data or the development of models. The method has some advantages over correlation: not all of the data must be numeric and one result of the method is a judgment in the confidence in an explanatory relationship.

Background and terminology

ANOVA is a particular form of statistical hypothesis testing heavily used in the analysis of experimental data. A test result (calculated from the null hypothesis and the sample) is called statistically significant if it is deemed unlikely to have occurred by chance, assuming the truth of the null hypothesis. A statistically significant result, when a probability (p-value) is less than a threshold (significance level), justifies the rejection of the null hypothesis, but only if the a priori probability of the null hypothesis is not high.

In the typical application of ANOVA, the null hypothesis is that all groups are simply random samples of the same population. For example, when studying the effect of different treatments on similar samples of patients, the null hypothesis would be that all treatments have the same effect (perhaps none). Rejecting the null hypothesis would imply that different treatments result in altered effects.

By construction, hypothesis testing limits the rate of Type I errors (false positives) to a significance level. Experimenters also wish to limit Type II errors (false negatives). The rate of Type II errors depends largely on sample size (the rate will increase for small numbers of samples), significance level (when the standard of proof is high, the chances of overlooking a discovery are also high) and effect size (a smaller effect size is more prone to Type II error).

The terminology of ANOVA is largely from the statistical design of experiments. The experimenter adjusts factors and measures responses in an attempt to determine an effect. Factors are assigned to experimental units by a combination of randomization and blocking to ensure the validity of the results. Blinding keeps the weighing impartial. Responses show a variability that is partially the result of the effect and is partially random error.

ANOVA is the synthesis of several ideas and it is used for multiple purposes. As a consequence, it is difficult to define concisely or precisely.

"Classical ANOVA for balanced data does three things at once:

As exploratory data analysis, an ANOVA is an organization of an additive data decomposition, and its sums of squares indicate the variance of each component of the decomposition (or, equivalently, each set of terms of a linear model).
Comparisons of mean squares, along with an F-test ... allow testing of a nested sequence of models.
Closely related to the ANOVA is a linear model fit with coefficient estimates and standard errors."^[12]

In short, ANOVA is a statistical tool used in several ways to develop and confirm an explanation for the observed data.

Additionally:

It is computationally elegant and relatively robust against violations of its assumptions.
ANOVA provides industrial strength (multiple sample comparison) statistical analysis.
It has been adapted to the analysis of a variety of experimental designs.

As a result: ANOVA "has long enjoyed the status of being the most used (some would say abused) statistical technique in psychological research."^[13] ANOVA "is probably the most useful technique in the field of statistical inference."^[14]

ANOVA is difficult to teach, particularly for complex experiments, with split-plot designs being notorious.^[15] In some cases the proper application of the method is best determined by problem pattern recognition followed by the consultation of a classic authoritative test.^[16]

Design-of-experiments terms

(Condensed from the NIST Engineering Statistics handbook: Section 5.7. A Glossary of DOE Terminology.)^[17]

Balanced design: An experimental design where all cells (i.e. treatment combinations) have the same number of observations.
Blocking: A schedule for conducting treatment combinations in an experimental study such that any effects on the experimental results due to a known change in raw materials, operators, machines, etc., become concentrated in the levels of the blocking variable. The reason for blocking is to isolate a systematic effect and prevent it from obscuring the main effects. Blocking is achieved by restricting randomization.
Design: A set of experimental runs which allows the fit of a particular model and the estimate of effects.
DOE: Design of experiments. An approach to problem solving involving collection of data that will support valid, defensible, and supportable conclusions.^[18]
Effect: How changing the settings of a factor changes the response. The effect of a single factor is also called a main effect.
Error: Unexplained variation in a collection of observations. DOE's typically require understanding of both random error and lack of fit error.
Experimental unit: The entity to which a specific treatment combination is applied.
Factors: Process inputs an investigator manipulates to cause a change in the output.
Lack-of-fit error: Error that occurs when the analysis omits one or more important terms or factors from the process model. Including replication in a DOE allows separation of experimental error into its components: lack of fit and random (pure) error.
Model: Mathematical relationship which relates changes in a given response to changes in one or more factors.
Random error: Error that occurs due to natural variation in the process. Random error is typically assumed to be normally distributed with zero mean and a constant variance. Random error is also called experimental error.
Randomization: A schedule for allocating treatment material and for conducting treatment combinations in a DOE such that the conditions in one run neither depend on the conditions of the previous run nor predict the conditions in the subsequent runs.^{[nb 1]}
Replication: Performing the same treatment combination more than once. Including replication allows an estimate of the random error independent of any lack of fit error.
Responses: The output(s) of a process. Sometimes called dependent variable(s).
Treatment: A treatment is a specific combination of factor levels whose effect is to be compared with other treatments.

Classes of models There are three classes of models used in the analysis of variance, and these are outlined here.

Fixed-effects models

The fixed-effects model (class I) of analysis of variance applies to situations in which the experimenter applies one or more treatments to the subjects of the experiment to see whether the response variable values change. This allows the experimenter to estimate the ranges of response variable values that the treatment would generate in the population as a whole.

Random-effects models

Random effects model (class II) is used when the treatments are not fixed. This occurs when the various factor levels are sampled from a larger population. Because the levels themselves are random variables, some assumptions and the method of contrasting the treatments (a multi-variable generalization of simple differences) differ from the fixed-effects model.^[19]

Mixed-effects models

A mixed-effects model (class III) contains experimental factors of both fixed and random-effects types, with appropriately different interpretations and analysis for the two types.

Example: Teaching experiments could be performed by a college or university department to find a good introductory textbook, with each text considered a treatment. The fixed-effects model would compare a list of candidate texts. The random-effects model would determine whether important differences exist among a list of randomly selected texts. The mixed-effects model would compare the (fixed) incumbent texts to randomly selected alternatives.

Defining fixed and random effects has proven elusive, with competing definitions arguably leading toward a linguistic quagmire.^[20]

Assumptions of ANOVA

The analysis of variance has been studied from several approaches, the most common of which uses a linear model that relates the response to the treatments and blocks. Note that the model is linear in parameters but may be nonlinear across factor levels. Interpretation is easy when data is balanced across factors but much deeper understanding is needed for unbalanced data.

Textbook analysis using a normal distribution

The analysis of variance can be presented in terms of a linear model, which makes the following assumptions about the probability distribution of the responses:^[21]^[22]^[23]^[24]

Independence of observations – this is an assumption of the model that simplifies the statistical analysis.
Normality – the distributions of the residuals are normal.
Equality (or "homogeneity") of variances, called homoscedasticity — the variance of data in groups should be the same.

The separate assumptions of the textbook model imply that the errors are independently, identically, and normally distributed for fixed effects models, that is, that the errors ( $\varepsilon$ ) are independent and

Randomization-based analysis

In a randomized controlled experiment, the treatments are randomly assigned to experimental units, following the experimental protocol. This randomization is objective and declared before the experiment is carried out. The objective random-assignment is used to test the significance of the null hypothesis, following the ideas of C. S. Peirce and Ronald Fisher. This design-based analysis was discussed and developed by Francis J. Anscombe at Rothamsted Experimental Station and by Oscar Kempthorne at Iowa State University.^[25] Kempthorne and his students make an assumption of unit treatment additivity, which is discussed in the books of Kempthorne and David R. Cox.^{[citation needed]}

Unit-treatment additivity

In its simplest form, the assumption of unit-treatment additivity^{[nb 2]} states that the observed response $y_{i,j}$ from experimental unit $i$ when receiving treatment $j$ can be written as the sum of the unit's response $y_{i}$ and the treatment-effect $t_{j}$ , that is ^[26]^[27]^[28]

The assumption of unit-treatment additivity implies that, for every treatment $j$ , the $j$ th treatment has exactly the same effect $t_{j}$ on every experiment unit.

The assumption of unit treatment additivity usually cannot be directly falsified, according to Cox and Kempthorne. However, many consequences of treatment-unit additivity can be falsified. For a randomized experiment, the assumption of unit-treatment additivity implies that the variance is constant for all treatments. Therefore, by contraposition, a necessary condition for unit-treatment additivity is that the variance is constant.

The use of unit treatment additivity and randomization is similar to the design-based inference that is standard in finite-population survey sampling.

Derived linear model

Kempthorne uses the randomization-distribution and the assumption of unit treatment additivity to produce a derived linear model, very similar to the textbook model discussed previously.^[29] The test statistics of this derived linear model are closely approximated by the test statistics of an appropriate normal linear model, according to approximation theorems and simulation studies.^[30] However, there are differences. For example, the randomization-based analysis results in a small but (strictly) negative correlation between the observations.^[31]^[32] In the randomization-based analysis, there is no assumption of a normal distribution and certainly no assumption of independence. On the contrary, the observations are dependent!

The randomization-based analysis has the disadvantage that its exposition involves tedious algebra and extensive time. Since the randomization-based analysis is complicated and is closely approximated by the approach using a normal linear model, most teachers emphasize the normal linear model approach. Few statisticians object to model-based analysis of balanced randomized experiments.

Statistical models for observational data

However, when applied to data from non-randomized experiments or observational studies, model-based analysis lacks the warrant of randomization.^[33] For observational data, the derivation of confidence intervals must use subjective models, as emphasized by Ronald Fisher and his followers. In practice, the estimates of treatment-effects from observational studies generally are often inconsistent. In practice, "statistical models" and observational data are useful for suggesting hypotheses that should be treated very cautiously by the public.^[34]

Summary of assumptions

The normal-model based ANOVA analysis assumes the independence, normality and homogeneity of the variances of the residuals. The randomization-based analysis assumes only the homogeneity of the variances of the residuals (as a consequence of unit-treatment additivity) and uses the randomization procedure of the experiment. Both these analyses require homoscedasticity, as an assumption for the normal-model analysis and as a consequence of randomization and additivity for the randomization-based analysis.

However, studies of processes that change variances rather than means (called dispersion effects) have been successfully conducted using ANOVA.^[35] There are no necessary assumptions for ANOVA in its full generality, but the F-test used for ANOVA hypothesis testing has assumptions and practical limitations which are of continuing interest.

Problems which do not satisfy the assumptions of ANOVA can often be transformed to satisfy the assumptions. The property of unit-treatment additivity is not invariant under a "change of scale", so statisticians often use transformations to achieve unit-treatment additivity. If the response variable is expected to follow a parametric family of probability distributions, then the statistician may specify (in the protocol for the experiment or observational study) that the responses be transformed to stabilize the variance.^[36] Also, a statistician may specify that logarithmic transforms be applied to the responses, which are believed to follow a multiplicative model.^[27]^[37] According to Cauchy's functional equation theorem, the logarithm is the only continuous transformation that transforms real multiplication to addition.^{[citation needed]}

Characteristics of ANOVA

ANOVA is used in the analysis of comparative experiments, those in which only the difference in outcomes is of interest. The statistical significance of the experiment is determined by a ratio of two variances. This ratio is independent of several possible alterations to the experimental observations: Adding a constant to all observations does not alter significance. Multiplying all observations by a constant does not alter significance. So ANOVA statistical significance result is independent of constant bias and scaling errors as well as the units used in expressing observations. In the era of mechanical calculation it was common to subtract a constant from all observations (when equivalent to dropping leading digits) to simplify data entry.^[38]^[39] This is an example of data coding.

Logic of ANOVA

The calculations of ANOVA can be characterized as computing a number of means and variances, dividing two variances and comparing the ratio to a handbook value to determine statistical significance. Calculating a treatment effect is then trivial, "the effect of any treatment is estimated by taking the difference between the mean of the observations which receive the treatment and the general mean."^[40]

Partitioning of the sum of squares

ANOVA uses traditional standardized terminology. The definitional equation of sample variance is $s^{2}=\textstyle {\frac {1}{n-1}}\sum (y_{i}-{\bar {y}})^{2}$ , where the divisor is called the degrees of freedom (DF), the summation is called the sum of squares (SS), the result is called the mean square (MS) and the squared terms are deviations from the sample mean. ANOVA estimates 3 sample variances: a total variance based on all the observation deviations from the grand mean, an error variance based on all the observation deviations from their appropriate treatment means and a treatment variance. The treatment variance is based on the deviations of treatment means from the grand mean, the result being multiplied by the number of observations in each treatment to account for the difference between the variance of observations and the variance of means.

The fundamental technique is a partitioning of the total sum of squares SS into components related to the effects used in the model. For example, the model for a simplified ANOVA with one type of treatment at different levels.

The number of degrees of freedom DF can be partitioned in a similar way: one of these components (that for error) specifies a chi-squared distribution which describes the associated sum of squares, while the same is true for "treatments" if there is no treatment effect.

The F-test

The F-test is used for comparing the factors of the total deviation. For example, in one-way, or single-factor ANOVA, statistical significance is tested for by comparing the F test statistic

where MS is mean square, $I$ = number of treatments and $n_{T}$ = total number of cases

to the F-distribution with $I-1$ , $n_{T}-I$ degrees of freedom. Using the F-distribution is a natural candidate because the test statistic is the ratio of two scaled sums of squares each of which follows a scaled chi-squared distribution.

The expected value of F is $1+{n\sigma _{\text{Treatment}}^{2}}/{\sigma _{\text{Error}}^{2}}$ (where n is the treatment sample size) which is 1 for no treatment effect. As values of F increase above 1, the evidence is increasingly inconsistent with the null hypothesis. Two apparent experimental methods of increasing F are increasing the sample size and reducing the error variance by tight experimental controls.

There are two methods of concluding the ANOVA hypothesis test, both of which produce the same result:

The textbook method is to compare the observed value of F with the critical value of F determined from tables. The critical value of F is a function of the degrees of freedom of the numerator and the denominator and the significance level (α). If F ≥ F_Critical, the null hypothesis is rejected.
The computer method calculates the probability (p-value) of a value of F greater than or equal to the observed value. The null hypothesis is rejected if this probability is less than or equal to the significance level (α).

The ANOVA F-test is known to be nearly optimal in the sense of minimizing false negative errors for a fixed rate of false positive errors (i.e. maximizing power for a fixed significance level). For example, to test the hypothesis that various medical treatments have exactly the same effect, the F-test's p-values closely approximate the permutation test's p-values: The approximation is particularly close when the design is balanced.^[30]^[41] Such permutation tests characterize tests with maximum power against all alternative hypotheses, as observed by Rosenbaum.^{[nb 3]} The ANOVA F–test (of the null-hypothesis that all treatments have exactly the same effect) is recommended as a practical test, because of its robustness against many alternative distributions.^[42]^{[nb 4]}

Extended logic

ANOVA consists of separable parts; partitioning sources of variance and hypothesis testing can be used individually. ANOVA is used to support other statistical tools. Regression is first used to fit more complex models to data, then ANOVA is used to compare models with the objective of selecting simple(r) models that adequately describe the data. "Such models could be fit without any reference to ANOVA, but ANOVA tools could then be used to make some sense of the fitted models, and to test hypotheses about batches of coefficients."^[43] "[W]e think of the analysis of variance as a way of understanding and structuring multilevel models—not as an alternative to regression but as a tool for summarizing complex high-dimensional inferences ..."^[43]

ANOVA for a single factor

The simplest experiment suitable for ANOVA analysis is the completely randomized experiment with a single factor. More complex experiments with a single factor involve constraints on randomization and include completely randomized blocks and Latin squares (and variants: Graeco-Latin squares, etc.). The more complex experiments share many of the complexities of multiple factors. A relatively complete discussion of the analysis (models, data summaries, ANOVA table) of the completely randomized experiment is available.

ANOVA for multiple factors

ANOVA generalizes to the study of the effects of multiple factors. When the experiment includes observations at all combinations of levels of each factor, it is termed factorial. Factorial experiments are more efficient than a series of single factor experiments and the efficiency grows as the number of factors increases.^[44] Consequently, factorial designs are heavily used.

The use of ANOVA to study the effects of multiple factors has a complication. In a 3-way ANOVA with factors x, y and z, the ANOVA model includes terms for the main effects (x, y, z) and terms for interactions (xy, xz, yz, xyz). All terms require hypothesis tests. The proliferation of interaction terms increases the risk that some hypothesis test will produce a false positive by chance. Fortunately, experience says that high order interactions are rare.^[45]^{[verification needed]} The ability to detect interactions is a major advantage of multiple factor ANOVA. Testing one factor at a time hides interactions, but produces apparently inconsistent experimental results.^[44]

Caution is advised when encountering interactions; Test interaction terms first and expand the analysis beyond ANOVA if interactions are found. Texts vary in their recommendations regarding the continuation of the ANOVA procedure after encountering an interaction. Interactions complicate the interpretation of experimental data. Neither the calculations of significance nor the estimated treatment effects can be taken at face value. "A significant interaction will often mask the significance of main effects."^[46] Graphical methods are recommended to enhance understanding. Regression is often useful. A lengthy discussion of interactions is available in Cox (1958).^[47] Some interactions can be removed (by transformations) while others cannot.

A variety of techniques are used with multiple factor ANOVA to reduce expense. One technique used in factorial designs is to minimize replication (possibly no replication with support of analytical trickery) and to combine groups when effects are found to be statistically (or practically) insignificant. An experiment with many insignificant factors may collapse into one with a few factors supported by many replications.^[48]

Worked numeric examples

Several fully worked numerical examples are available. A simple case uses one-way (a single factor) analysis. A more complex case uses two-way (two-factor) analysis.

Associated analysis

Some analysis is required in support of the design of the experiment while other analysis is performed after changes in the factors are formally found to produce statistically significant changes in the responses. Because experimentation is iterative, the results of one experiment alter plans for following experiments.

Preparatory analysis

The number of experimental units

In the design of an experiment, the number of experimental units is planned to satisfy the goals of the experiment. Experimentation is often sequential.

Early experiments are often designed to provide mean-unbiased estimates of treatment effects and of experimental error. Later experiments are often designed to test a hypothesis that a treatment effect has an important magnitude; in this case, the number of experimental units is chosen so that the experiment is within budget and has adequate power, among other goals.

Reporting sample size analysis is generally required in psychology. "Provide information on sample size and the process that led to sample size decisions."^[49] The analysis, which is written in the experimental protocol before the experiment is conducted, is examined in grant applications and administrative review boards.

Besides the power analysis, there are less formal methods for selecting the number of experimental units. These include graphical methods based on limiting the probability of false negative errors, graphical methods based on an expected variation increase (above the residuals) and methods based on achieving a desired confident interval.^[50]

Power analysis

Power analysis is often applied in the context of ANOVA in order to assess the probability of successfully rejecting the null hypothesis if we assume a certain ANOVA design, effect size in the population, sample size and significance level. Power analysis can assist in study design by determining what sample size would be required in order to have a reasonable chance of rejecting the null hypothesis when the alternative hypothesis is true.^[51]^[52]^[53]^[54]

Effect size

Several standardized measures of effect have been proposed for ANOVA to summarize the strength of the association between a predictor(s) and the dependent variable (e.g., η², ω², or ƒ²) or the overall standardized difference (Ψ) of the complete model. Standardized effect-size estimates facilitate comparison of findings across studies and disciplines. However, while standardized effect sizes are commonly used in much of the professional literature, a non-standardized measure of effect size that has immediately "meaningful" units may be preferable for reporting purposes.^[55]

Follow-up analysis

It is always appropriate to carefully consider outliers. They have a disproportionate impact on statistical conclusions and are often the result of errors.

Model confirmation

It is prudent to verify that the assumptions of ANOVA have been met. Residuals are examined or analyzed to confirm homoscedasticity and gross normality.^[56] Residuals should have the appearance of (zero mean normal distribution) noise when plotted as a function of anything including time and modeled data values. Trends hint at interactions among factors or among observations. One rule of thumb: "If the largest standard deviation is less than twice the smallest standard deviation, we can use methods based on the assumption of equal standard deviations and our results will still be approximately correct."^[57]

Follow-up tests

A statistically significant effect in ANOVA is often followed up with one or more different follow-up tests. This can be done in order to assess which groups are different from which other groups or to test various other focused hypotheses. Follow-up tests are often distinguished in terms of whether they are planned (a priori) or post hoc. Planned tests are determined before looking at the data and post hoc tests are performed after looking at the data.

Often one of the "treatments" is none, so the treatment group can act as a control. Dunnett's test (a modification of the t-test) tests whether each of the other treatment groups has the same mean as the control.^[58]

Post hoc tests such as Tukey's range test most commonly compare every group mean with every other group mean and typically incorporate some method of controlling for Type I errors. Comparisons, which are most commonly planned, can be either simple or compound. Simple comparisons compare one group mean with one other group mean. Compound comparisons typically compare two sets of groups means where one set has two or more groups (e.g., compare average group means of group A, B and C with group D). Comparisons can also look at tests of trend, such as linear and quadratic relationships, when the independent variable involves ordered levels.

Following ANOVA with pair-wise multiple-comparison tests has been criticized on several grounds.^[55]^[59] There are many such tests (10 in one table) and recommendations regarding their use are vague or conflicting.^[60]^[61]

Study designs and ANOVAs

There are several types of ANOVA. Many statisticians base ANOVA on the design of the experiment,^[62] especially on the protocol that specifies the random assignment of treatments to subjects; the protocol's description of the assignment mechanism should include a specification of the structure of the treatments and of any blocking. It is also common to apply ANOVA to observational data using an appropriate statistical model.^{[citation needed]}

Some popular designs use the following types of ANOVA:

One-way ANOVA is used to test for differences among two or more independent groups (means),e.g. different levels of urea application in a crop, or different levels of antibiotic action on several bacterial species,^[63] or different levels of effect of some medicine on groups of patients. Typically, however, the one-way ANOVA is used to test for differences among at least three groups, since the two-group case can be covered by a t-test.^[64] When there are only two means to compare, the t-test and the ANOVA F-test are equivalent; the relation between ANOVA and t is given by F = t².
Factorial ANOVA is used when the experimenter wants to study the interaction effects among the treatments.
Repeated measures ANOVA is used when the same subjects are used for each treatment (e.g., in a longitudinal study).
Multivariate analysis of variance (MANOVA) is used when there is more than one response variable.

ANOVA cautions

Balanced experiments (those with an equal sample size for each treatment) are relatively easy to interpret; Unbalanced experiments offer more complexity. For single factor (one way) ANOVA, the adjustment for unbalanced data is easy, but the unbalanced analysis lacks both robustness and power.^[65] For more complex designs the lack of balance leads to further complications. "The orthogonality property of main effects and interactions present in balanced data does not carry over to the unbalanced case. This means that the usual analysis of variance techniques do not apply. Consequently, the analysis of unbalanced factorials is much more difficult than that for balanced designs."^[66] In the general case, "The analysis of variance can also be applied to unbalanced data, but then the sums of squares, mean squares, and F-ratios will depend on the order in which the sources of variation are considered."^[43] The simplest techniques for handling unbalanced data restore balance by either throwing out data or by synthesizing missing data. More complex techniques use regression.

ANOVA is (in part) a significance test. The American Psychological Association holds the view that simply reporting significance is insufficient and that reporting confidence bounds is preferred.^[55]

While ANOVA is conservative (in maintaining a significance level) against multiple comparisons in one dimension, it is not conservative against comparisons in multiple dimensions.^[67]

Generalizations

ANOVA is considered to be a special case of linear regression^[68]^[69] which in turn is a special case of the general linear model.^[70] All consider the observations to be the sum of a model (fit) and a residual (error) to be minimized.

The Kruskal–Wallis test and the Friedman test are nonparametric tests, which do not rely on an assumption of normality.^[71]^[72]

Connection to Linear Regression

Below we make clear the connection between multi-way ANOVA and linear regression. Linearly re-order the data so that $k^{th}$ observation is associated with a response $y_{k}$ and factors $Z_{k,b}$ where $b\in \{1,2,...,B\}$ denotes the different factors and $B$ is the total number of factors. In one-way ANOVA $B=1$ and in two-way ANOVA $B=2$ . Furthermore, we assume the $b^{th}$ factor has $I_{b}$ levels. Now, we can one-hot encode the factors into the $\prod _{b=1}^{B}I_{b}$ dimensional vector $v_{k}$ .

The one-hot encoding function $g_{b}:I_{b}\mapsto \{0,1\}^{I_{b}}$ is defined such that the $l^{th}$ entry of $g_{b}(Z_{k,b})$ is

 
   
     
       
         g
         
           b
         
       
       (
       
         Z
         
           k
           ,
           b
         
       
       
         )
         
           l
         
       
       =
       
         
           {
           
             
               
                 1
               
               
                 
                   if 
                 
                 i
                 =
                 
                   Z
                   
                     k
                     ,
                     b
                   
                 
               
             
             
               
                 0
               
               
                 
                   otherwise
                 
               
             
           
           
         
       
     
   
   {\displaystyle g_{b}(Z_{k,b})_{l}={\begin{cases}1&{\text{if }}i=Z_{k,b}\

==Wikipedia preview== 出典(authority):フリー百科事典『ウィキペディア（Wikipedia）』「2016/12/31 12:55:00」(JST) ====[http://ja.wikipedia.org/wiki/ANOVA wiki ja]==== 分散分析（ぶんさんぶんせき、英: analysis of variance、略称: ANOVA）は、観測データにおける変動を誤差変動と各要因およびそれらの交互作用による変動に分解することによって、要因および交互作用の効果を判定する、統計的仮説検定の一手法である。統計学者かつ遺伝学者であるロナルド・フィッシャーによって1920年代から1930年代にかけて基本手法が確立された。そのため「フィッシャーの分散分析」「フィッシャーのANOVA法」とも呼ばれる。基本的な手法として、まず、データの分散成分の平方和を分解し、誤差による変動から要因効果による変動を分離する。次に、平方和を自由度で割ることで平均平方を算出する。そして、要因効果（または、交互作用）によって説明される平均平方を分子、誤差によって説明される平均平方を分母とすることでF値を計算する（F検定）。各効果の有意性については有意水準を設けて判定する。交互作用の性質を詳しく調べるには、単純主効果の検定や交互作用対比を行うとよい。また、3つ以上の水準を持つ要因の効果が有意であったとき、具体的にどの群とどの群の間に差があったかを知るためには、多重比較を行う必要がある。したがって、分析の目的によっては、分散分析のみから結論が導かれるものではなく、これらの手法と組み合わせて用いることが肝要である。分散分析には各種のモデルがあり、データの性質や要因計画の型、検証したい仮説に応じてそれらを使い分けることが適切な利用法である（一元配置分散分析・回帰分散分析・共分散分析など）。現在では、分散分析は一般線形モデル、構造方程式モデリングの一部として扱えることが判明しており、さらなる拡張も可能である（潜在変数に対する分散分析など）。

ソフトウェア

SASやSPSSといった主要な統計パッケージで、分散分析も実行可能である。R言語にも、分散分析に関わる関数がある。また、分散分析やそれに伴う多重比較に特化したソフトウェアもあり、多くはフリーソフトである。

js-STAR^[1]: 田中敏（信州大学教授）作成による"STAR"をJavaScriptに移植したもの。3要因までの分散分析、単純主効果の検定および多重比較（LSD法、HSD法、Bonferroni法、Holm法）が一度にできる。また、その他にχ²検定や相関係数なども扱うことができる。ウェブ上でそのまま使うことができ、ダウンロードすることもできる。インターフェイスがシンプルで、使い方も分かりやすい。仕様の理論的背景は、田中敏・山際勇一郎による『ユーザーのための教育・心理統計と実験計画法－方法の理解から論文の書き方まで－』（教育出版、1992年、新訂版）に基づくと思われる^[誰?]。『実践データ心理解析－問題の発想・データ処理・論文の作成－』（新曜社、2006年、改訂版）には、js-STARの使用法、分散分析表の読み取り方、論文への記述の仕方などが詳しく解説されている。
ANOVA4 on the Web^[2]: 桐木建始（広島女学院大学教授）が作成。4要因までの分散分析および多重比較（Ryan法）が一度に可能。ブラウザ上でそのまま動作し、インストール不要。また、『わかって楽しい心理統計法入門』（北大路書房、2007年）では、ANOVA4の使い方が解説されている。
分散分析プログラム（AIST-ANOVA）^[3]: 独立行政法人産業技術総合研究所計測標準総合センター計測標準部門物性統計科応用統計研究室で開発された、Excelのアドイン。10要因までの分散分析を行うことができる。
Excel NAG 統計解析アドイン^[4]: Excelに分散分析関数を追加するアドイン。アカデミック版、無料の試供版もあり。
ezANOVA^[5]: 単独で動作するソフトなので、ブラウザやExcelを要しない。
MAANOVA^[6]: MATLAB上で分散分析を行う。
Origin: Origin上で分散分析を行う。一元配置／二元配置、繰返しのある配置、Post-hoc検定などを含む。

その他、汎用言語であるC言語のプログラム^[7]、R言語で独自に書かれた関数^[8]などもある。それぞれのプログラミング言語が使える計算機環境と操作能力が必要である。

脚注

^ Nakano Hiroyuki. “js-STAR 2012”. 2012年4月4日閲覧。
^ Kiriki Kenshi (2002年). “ANOVA4 on the Web”. 2012年4月4日閲覧。
^ 粒子計測研究室 NMIJ/AIST (2011年). “不確かさWeb 分散分析プログラム”. 2012年4月4日閲覧。
^ 日本ニューメリカルアルゴリズムズグループ株式会社 (2012年). “Excel NAG 統計解析アドイン”. 2012年4月4日閲覧。
^ Chris Rorden. “ezANOVA free statistical software”. 2012年4月4日閲覧。
^ Bioconfuctor. “maanova”. 2012年4月4日閲覧。
^ H.Akiba (1995年2月27日). “２因子多水準分散分析”. 2012年4月4日閲覧。
^ 渡辺利夫 (2007年1月30日). “R Language”. 2012年4月4日閲覧。

外部リンク

私のための統計処理分散分析
Visual ANOVA 分散分析のイメージを説明するFlashプログラム
反復測定（度）分散分析/基礎と応用　計算法の解説、プログラムのダウンロード。多重比較や球面性の仮定・球面性検定にも詳しい。
心理生理学データの分散分析　反復測定分散分析の使用や論文中への書き方の解説。多重比較や球面性検定の説明も。

wiki en

[Wiki en表示]

Biologist and statistician Ronald Fisher

Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems.

1 History
2 Motivating example
3 Background and terminology
- 3.1 Design-of-experiments terms
4 Classes of models
- 4.1 Fixed-effects models
- 4.2 Random-effects models
- 4.3 Mixed-effects models
5 Assumptions of ANOVA
- 5.1 Textbook analysis using a normal distribution
- 5.2 Randomization-based analysis
  - 5.2.1 Unit-treatment additivity
  - 5.2.2 Derived linear model
  - 5.2.3 Statistical models for observational data
- 5.3 Summary of assumptions
6 Characteristics of ANOVA
7 Logic of ANOVA
- 7.1 Partitioning of the sum of squares
- 7.2 The F-test
- 7.3 Extended logic
8 ANOVA for a single factor
9 ANOVA for multiple factors
10 Worked numeric examples
11 Associated analysis
- 11.1 Preparatory analysis
  - 11.1.1 The number of experimental units
  - 11.1.2 Power analysis
  - 11.1.3 Effect size
- 11.2 Follow-up analysis
  - 11.2.1 Model confirmation
  - 11.2.2 Follow-up tests
12 Study designs and ANOVAs
13 ANOVA cautions
14 Generalizations
- 14.1 Connection to linear regression
  - 14.1.1 Example
15 See also
16 Footnotes
17 Notes
18 References
19 Further reading
20 External links

History

While the analysis of variance reached fruition in the 20th century, antecedents extend centuries into the past according to Stigler.^[1] These include hypothesis testing, the partitioning of sums of squares, experimental techniques and the additive model. Laplace was performing hypothesis testing in the 1770s.^[2] The development of least-squares methods by Laplace and Gauss circa 1800 provided an improved method of combining observations (over the existing practices then used in astronomy and geodesy). It also initiated much study of the contributions to sums of squares. Laplace soon knew how to estimate a variance from a residual (rather than a total) sum of squares.^[3] By 1827 Laplace was using least squares methods to address ANOVA problems regarding measurements of atmospheric tides.^[4] Before 1800 astronomers had isolated observational errors resulting from reaction times (the "personal equation") and had developed methods of reducing the errors.^[5] The experimental methods used in the study of the personal equation were later accepted by the emerging field of psychology ^[6] which developed strong (full factorial) experimental methods to which randomization and blinding were soon added.^[7] An eloquent non-mathematical explanation of the additive effects model was available in 1885.^[8]

Ronald Fisher introduced the term variance and proposed its formal analysis in a 1918 article The Correlation Between Relatives on the Supposition of Mendelian Inheritance.^[9] His first application of the analysis of variance was published in 1921.^[10] Analysis of variance became widely known after being included in Fisher's 1925 book Statistical Methods for Research Workers.

Randomization models were developed by several researchers. The first was published in Polish by Neyman in 1923.^[11]

One of the attributes of ANOVA which ensured its early popularity was computational elegance. The structure of the additive model allows solution for the additive coefficients by simple algebra rather than by matrix calculations. In the era of mechanical calculators this simplicity was critical. The determination of statistical significance also required access to tables of the F function which were supplied by early statistics texts.

Motivating example

No fit.

Fair fit

Very good fit

The analysis of variance can be used as an exploratory tool to explain observations. A dog show provides an example. A dog show is not a random sampling of the breed: it is typically limited to dogs that are adult, pure-bred, and exemplary. A histogram of dog weights from a show might plausibly be rather complex, like the yellow-orange distribution shown in the illustrations. Suppose we wanted to predict the weight of a dog based on a certain set of characteristics of each dog. Before we could do that, we would need to explain the distribution of weights by dividing the dog population into groups based on those characteristics. A successful grouping will split dogs such that (a) each group has a low variance of dog weights (meaning the group is relatively homogeneous) and (b) the mean of each group is distinct (if two groups have the same mean, then it isn't reasonable to conclude that the groups are, in fact, separate in any meaningful way).

In the illustrations to the right, each group is identified as X₁, X₂, etc. In the first illustration, we divide the dogs according to the product (interaction) of two binary groupings: young vs old, and short-haired vs long-haired (thus, group 1 is young, short-haired dogs, group 2 is young, long-haired dogs, etc.). Since the distributions of dog weight within each of the groups (shown in blue) has a large variance, and since the means are very close across groups, grouping dogs by these characteristics does not produce an effective way to explain the variation in dog weights: knowing which group a dog is in does not allow us to make any reasonable statements as to what that dog's weight is likely to be. Thus, this grouping fails to fit the distribution we are trying to explain (yellow-orange).

An attempt to explain the weight distribution by grouping dogs as (pet vs working breed) and (less athletic vs more athletic) would probably be somewhat more successful (fair fit). The heaviest show dogs are likely to be big strong working breeds, while breeds kept as pets tend to be smaller and thus lighter. As shown by the second illustration, the distributions have variances that are considerably smaller than in the first case, and the means are more reasonably distinguishable. However, the significant overlap of distributions, for example, means that we cannot reliably say that X₁ and X₂ are truly distinct (i.e., it is perhaps reasonably likely that splitting dogs according to the flip of a coin—by pure chance—might produce distributions that look similar).

An attempt to explain weight by breed is likely to produce a very good fit. All Chihuahuas are light and all St Bernards are heavy. The difference in weights between Setters and Pointers does not justify separate breeds. The analysis of variance provides the formal tools to justify these intuitive judgments. A common use of the method is the analysis of experimental data or the development of models. The method has some advantages over correlation: not all of the data must be numeric and one result of the method is a judgment in the confidence in an explanatory relationship.

Background and terminology

ANOVA is a particular form of statistical hypothesis testing heavily used in the analysis of experimental data. A test result (calculated from the null hypothesis and the sample) is called statistically significant if it is deemed unlikely to have occurred by chance, assuming the truth of the null hypothesis. A statistically significant result, when a probability (p-value) is less than a threshold (significance level), justifies the rejection of the null hypothesis, but only if the a prior probability of the null hypothesis is not high.

In the typical application of ANOVA, the null hypothesis is that all groups are simply random samples of the same population. For example, when studying the effect of different treatments on similar samples of patients, the null hypothesis would be that all treatments have the same effect (perhaps none). Rejecting the null hypothesis would imply that different treatments result in altered effects.

By construction, hypothesis testing limits the rate of Type I errors (false positives) to a significance level. Experimenters also wish to limit Type II errors (false negatives). The rate of Type II errors depends largely on sample size (the rate will increase for small numbers of samples), significance level (when the standard of proof is high, the chances of overlooking a discovery are also high) and effect size (a smaller effect size is more prone to Type II error).

The terminology of ANOVA is largely from the statistical design of experiments. The experimenter adjusts factors and measures responses in an attempt to determine an effect. Factors are assigned to experimental units by a combination of randomization and blocking to ensure the validity of the results. Blinding keeps the weighing impartial. Responses show a variability that is partially the result of the effect and is partially random error.

ANOVA is the synthesis of several ideas and it is used for multiple purposes. As a consequence, it is difficult to define concisely or precisely.

"Classical ANOVA for balanced data does three things at once:

As exploratory data analysis, an ANOVA is an organization of an additive data decomposition, and its sums of squares indicate the variance of each component of the decomposition (or, equivalently, each set of terms of a linear model).
Comparisons of mean squares, along with an F-test ... allow testing of a nested sequence of models.
Closely related to the ANOVA is a linear model fit with coefficient estimates and standard errors."^[12]

In short, ANOVA is a statistical tool used in several ways to develop and confirm an explanation for the observed data.

Additionally:

It is computationally elegant and relatively robust against violations of its assumptions.
ANOVA provides industrial strength (multiple sample comparison) statistical analysis.
It has been adapted to the analysis of a variety of experimental designs.

As a result: ANOVA "has long enjoyed the status of being the most used (some would say abused) statistical technique in psychological research."^[13] ANOVA "is probably the most useful technique in the field of statistical inference."^[14]

ANOVA is difficult to teach, particularly for complex experiments, with split-plot designs being notorious.^[15] In some cases the proper application of the method is best determined by problem pattern recognition followed by the consultation of a classic authoritative test.^[16]

Design-of-experiments terms

(Condensed from the NIST Engineering Statistics handbook: Section 5.7. A Glossary of DOE Terminology.)^[17]

Balanced design: An experimental design where all cells (i.e. treatment combinations) have the same number of observations.
Blocking: A schedule for conducting treatment combinations in an experimental study such that any effects on the experimental results due to a known change in raw materials, operators, machines, etc., become concentrated in the levels of the blocking variable. The reason for blocking is to isolate a systematic effect and prevent it from obscuring the main effects. Blocking is achieved by restricting randomization.
Design: A set of experimental runs which allows the fit of a particular model and the estimate of effects.
DOE: Design of experiments. An approach to problem solving involving collection of data that will support valid, defensible, and supportable conclusions.^[18]
Effect: How changing the settings of a factor changes the response. The effect of a single factor is also called a main effect.
Error: Unexplained variation in a collection of observations. DOE's typically require understanding of both random error and lack of fit error.
Experimental unit: The entity to which a specific treatment combination is applied.
Factors: Process inputs an investigator manipulates to cause a change in the output.
Lack-of-fit error: Error that occurs when the analysis omits one or more important terms or factors from the process model. Including replication in a DOE allows separation of experimental error into its components: lack of fit and random (pure) error.
Model: Mathematical relationship which relates changes in a given response to changes in one or more factors.
Random error: Error that occurs due to natural variation in the process. Random error is typically assumed to be normally distributed with zero mean and a constant variance. Random error is also called experimental error.
Randomization: A schedule for allocating treatment material and for conducting treatment combinations in a DOE such that the conditions in one run neither depend on the conditions of the previous run nor predict the conditions in the subsequent runs.^{[nb 1]}
Replication: Performing the same treatment combination more than once. Including replication allows an estimate of the random error independent of any lack of fit error.
Responses: The output(s) of a process. Sometimes called dependent variable(s).
Treatment: A treatment is a specific combination of factor levels whose effect is to be compared with other treatments.

Classes of models

There are three classes of models used in the analysis of variance, and these are outlined here.

Fixed-effects models

The fixed-effects model (class I) of analysis of variance applies to situations in which the experimenter applies one or more treatments to the subjects of the experiment to see whether the response variable values change. This allows the experimenter to estimate the ranges of response variable values that the treatment would generate in the population as a whole.

Random-effects models

Random effects model (class II) is used when the treatments are not fixed. This occurs when the various factor levels are sampled from a larger population. Because the levels themselves are random variables, some assumptions and the method of contrasting the treatments (a multi-variable generalization of simple differences) differ from the fixed-effects model.^[19]

Mixed-effects models

A mixed-effects model (class III) contains experimental factors of both fixed and random-effects types, with appropriately different interpretations and analysis for the two types.

Example: Teaching experiments could be performed by a college or university department to find a good introductory textbook, with each text considered a treatment. The fixed-effects model would compare a list of candidate texts. The random-effects model would determine whether important differences exist among a list of randomly selected texts. The mixed-effects model would compare the (fixed) incumbent texts to randomly selected alternatives.

Defining fixed and random effects has proven elusive, with competing definitions arguably leading toward a linguistic quagmire.^[20]

Assumptions of ANOVA

The analysis of variance has been studied from several approaches, the most common of which uses a linear model that relates the response to the treatments and blocks. Note that the model is linear in parameters but may be nonlinear across factor levels. Interpretation is easy when data is balanced across factors but much deeper understanding is needed for unbalanced data.

Textbook analysis using a normal distribution

The analysis of variance can be presented in terms of a linear model, which makes the following assumptions about the probability distribution of the responses:^[21]^[22]^[23]^[24]

Independence of observations – this is an assumption of the model that simplifies the statistical analysis.
Normality – the distributions of the residuals are normal.
Equality (or "homogeneity") of variances, called homoscedasticity — the variance of data in groups should be the same.

The separate assumptions of the textbook model imply that the errors are independently, identically, and normally distributed for fixed effects models, that is, that the errors ( $\varepsilon$ ) are independent and

Randomization-based analysis

In a randomized controlled experiment, the treatments are randomly assigned to experimental units, following the experimental protocol. This randomization is objective and declared before the experiment is carried out. The objective random-assignment is used to test the significance of the null hypothesis, following the ideas of C. S. Peirce and Ronald Fisher. This design-based analysis was discussed and developed by Francis J. Anscombe at Rothamsted Experimental Station and by Oscar Kempthorne at Iowa State University.^[25] Kempthorne and his students make an assumption of unit treatment additivity, which is discussed in the books of Kempthorne and David R. Cox.^{[citation needed]}

Unit-treatment additivity

In its simplest form, the assumption of unit-treatment additivity^{[nb 2]} states that the observed response $y_{i,j}$ from experimental unit $i$ when receiving treatment $j$ can be written as the sum of the unit's response $y_{i}$ and the treatment-effect $t_{j}$ , that is ^[26]^[27]^[28]

The assumption of unit-treatment additivity implies that, for every treatment $j$ , the $j$ th treatment has exactly the same effect $t_{j}$ on every experiment unit.

The assumption of unit treatment additivity usually cannot be directly falsified, according to Cox and Kempthorne. However, many consequences of treatment-unit additivity can be falsified. For a randomized experiment, the assumption of unit-treatment additivity implies that the variance is constant for all treatments. Therefore, by contraposition, a necessary condition for unit-treatment additivity is that the variance is constant.

The use of unit treatment additivity and randomization is similar to the design-based inference that is standard in finite-population survey sampling.

Derived linear model

Kempthorne uses the randomization-distribution and the assumption of unit treatment additivity to produce a derived linear model, very similar to the textbook model discussed previously.^[29] The test statistics of this derived linear model are closely approximated by the test statistics of an appropriate normal linear model, according to approximation theorems and simulation studies.^[30] However, there are differences. For example, the randomization-based analysis results in a small but (strictly) negative correlation between the observations.^[31]^[32] In the randomization-based analysis, there is no assumption of a normal distribution and certainly no assumption of independence. On the contrary, the observations are dependent!

The randomization-based analysis has the disadvantage that its exposition involves tedious algebra and extensive time. Since the randomization-based analysis is complicated and is closely approximated by the approach using a normal linear model, most teachers emphasize the normal linear model approach. Few statisticians object to model-based analysis of balanced randomized experiments.

Statistical models for observational data

However, when applied to data from non-randomized experiments or observational studies, model-based analysis lacks the warrant of randomization.^[33] For observational data, the derivation of confidence intervals must use subjective models, as emphasized by Ronald Fisher and his followers. In practice, the estimates of treatment-effects from observational studies generally are often inconsistent. In practice, "statistical models" and observational data are useful for suggesting hypotheses that should be treated very cautiously by the public.^[34]

Summary of assumptions

The normal-model based ANOVA analysis assumes the independence, normality and homogeneity of the variances of the residuals. The randomization-based analysis assumes only the homogeneity of the variances of the residuals (as a consequence of unit-treatment additivity) and uses the randomization procedure of the experiment. Both these analyses require homoscedasticity, as an assumption for the normal-model analysis and as a consequence of randomization and additivity for the randomization-based analysis.

However, studies of processes that change variances rather than means (called dispersion effects) have been successfully conducted using ANOVA.^[35] There are no necessary assumptions for ANOVA in its full generality, but the F-test used for ANOVA hypothesis testing has assumptions and practical limitations which are of continuing interest.

Problems which do not satisfy the assumptions of ANOVA can often be transformed to satisfy the assumptions. The property of unit-treatment additivity is not invariant under a "change of scale", so statisticians often use transformations to achieve unit-treatment additivity. If the response variable is expected to follow a parametric family of probability distributions, then the statistician may specify (in the protocol for the experiment or observational study) that the responses be transformed to stabilize the variance.^[36] Also, a statistician may specify that logarithmic transforms be applied to the responses, which are believed to follow a multiplicative model.^[27]^[37] According to Cauchy's functional equation theorem, the logarithm is the only continuous transformation that transforms real multiplication to addition.^{[citation needed]}

Characteristics of ANOVA

ANOVA is used in the analysis of comparative experiments, those in which only the difference in outcomes is of interest. The statistical significance of the experiment is determined by a ratio of two variances. This ratio is independent of several possible alterations to the experimental observations: Adding a constant to all observations does not alter significance. Multiplying all observations by a constant does not alter significance. So ANOVA statistical significance result is independent of constant bias and scaling errors as well as the units used in expressing observations. In the era of mechanical calculation it was common to subtract a constant from all observations (when equivalent to dropping leading digits) to simplify data entry.^[38]^[39] This is an example of data coding.

Logic of ANOVA

The calculations of ANOVA can be characterized as computing a number of means and variances, dividing two variances and comparing the ratio to a handbook value to determine statistical significance. Calculating a treatment effect is then trivial, "the effect of any treatment is estimated by taking the difference between the mean of the observations which receive the treatment and the general mean."^[40]

Partitioning of the sum of squares

ANOVA uses traditional standardized terminology. The definitional equation of sample variance is $s^{2}=\textstyle {\frac {1}{n-1}}\sum (y_{i}-{\bar {y}})^{2}$ , where the divisor is called the degrees of freedom (DF), the summation is called the sum of squares (SS), the result is called the mean square (MS) and the squared terms are deviations from the sample mean. ANOVA estimates 3 sample variances: a total variance based on all the observation deviations from the grand mean, an error variance based on all the observation deviations from their appropriate treatment means and a treatment variance. The treatment variance is based on the deviations of treatment means from the grand mean, the result being multiplied by the number of observations in each treatment to account for the difference between the variance of observations and the variance of means.

The fundamental technique is a partitioning of the total sum of squares SS into components related to the effects used in the model. For example, the model for a simplified ANOVA with one type of treatment at different levels.

The number of degrees of freedom DF can be partitioned in a similar way: one of these components (that for error) specifies a chi-squared distribution which describes the associated sum of squares, while the same is true for "treatments" if there is no treatment effect.

The F-test

The F-test is used for comparing the factors of the total deviation. For example, in one-way, or single-factor ANOVA, statistical significance is tested for by comparing the F test statistic

where MS is mean square, $I$ = number of treatments and $n_{T}$ = total number of cases

to the F-distribution with $I-1$ , $n_{T}-I$ degrees of freedom. Using the F-distribution is a natural candidate because the test statistic is the ratio of two scaled sums of squares each of which follows a scaled chi-squared distribution.

The expected value of F is $1+{n\sigma _{\text{Treatment}}^{2}}/{\sigma _{\text{Error}}^{2}}$ (where n is the treatment sample size) which is 1 for no treatment effect. As values of F increase above 1, the evidence is increasingly inconsistent with the null hypothesis. Two apparent experimental methods of increasing F are increasing the sample size and reducing the error variance by tight experimental controls.

There are two methods of concluding the ANOVA hypothesis test, both of which produce the same result:

The textbook method is to compare the observed value of F with the critical value of F determined from tables. The critical value of F is a function of the degrees of freedom of the numerator and the denominator and the significance level (α). If F ≥ F_Critical, the null hypothesis is rejected.
The computer method calculates the probability (p-value) of a value of F greater than or equal to the observed value. The null hypothesis is rejected if this probability is less than or equal to the significance level (α).

The ANOVA F-test is known to be nearly optimal in the sense of minimizing false negative errors for a fixed rate of false positive errors (i.e. maximizing power for a fixed significance level). For example, to test the hypothesis that various medical treatments have exactly the same effect, the F-test's p-values closely approximate the permutation test's p-values: The approximation is particularly close when the design is balanced.^[30]^[41] Such permutation tests characterize tests with maximum power against all alternative hypotheses, as observed by Rosenbaum.^{[nb 3]} The ANOVA F–test (of the null-hypothesis that all treatments have exactly the same effect) is recommended as a practical test, because of its robustness against many alternative distributions.^[42]^{[nb 4]}

Extended logic

ANOVA consists of separable parts; partitioning sources of variance and hypothesis testing can be used individually. ANOVA is used to support other statistical tools. Regression is first used to fit more complex models to data, then ANOVA is used to compare models with the objective of selecting simple(r) models that adequately describe the data. "Such models could be fit without any reference to ANOVA, but ANOVA tools could then be used to make some sense of the fitted models, and to test hypotheses about batches of coefficients."^[43] "[W]e think of the analysis of variance as a way of understanding and structuring multilevel models—not as an alternative to regression but as a tool for summarizing complex high-dimensional inferences ..."^[43]

ANOVA for a single factor

The simplest experiment suitable for ANOVA analysis is the completely randomized experiment with a single factor. More complex experiments with a single factor involve constraints on randomization and include completely randomized blocks and Latin squares (and variants: Graeco-Latin squares, etc.). The more complex experiments share many of the complexities of multiple factors. A relatively complete discussion of the analysis (models, data summaries, ANOVA table) of the completely randomized experiment is available.

ANOVA for multiple factors

ANOVA generalizes to the study of the effects of multiple factors. When the experiment includes observations at all combinations of levels of each factor, it is termed factorial. Factorial experiments are more efficient than a series of single factor experiments and the efficiency grows as the number of factors increases.^[44] Consequently, factorial designs are heavily used.

The use of ANOVA to study the effects of multiple factors has a complication. In a 3-way ANOVA with factors x, y and z, the ANOVA model includes terms for the main effects (x, y, z) and terms for interactions (xy, xz, yz, xyz). All terms require hypothesis tests. The proliferation of interaction terms increases the risk that some hypothesis test will produce a false positive by chance. Fortunately, experience says that high order interactions are rare.^[45]^{[verification needed]} The ability to detect interactions is a major advantage of multiple factor ANOVA. Testing one factor at a time hides interactions, but produces apparently inconsistent experimental results.^[44]

Caution is advised when encountering interactions; Test interaction terms first and expand the analysis beyond ANOVA if interactions are found. Texts vary in their recommendations regarding the continuation of the ANOVA procedure after encountering an interaction. Interactions complicate the interpretation of experimental data. Neither the calculations of significance nor the estimated treatment effects can be taken at face value. "A significant interaction will often mask the significance of main effects."^[46] Graphical methods are recommended to enhance understanding. Regression is often useful. A lengthy discussion of interactions is available in Cox (1958).^[47] Some interactions can be removed (by transformations) while others cannot.

A variety of techniques are used with multiple factor ANOVA to reduce expense. One technique used in factorial designs is to minimize replication (possibly no replication with support of analytical trickery) and to combine groups when effects are found to be statistically (or practically) insignificant. An experiment with many insignificant factors may collapse into one with a few factors supported by many replications.^[48]

Worked numeric examples

Several fully worked numerical examples are available. A simple case uses one-way (a single factor) analysis. A more complex case uses two-way (two-factor) analysis.

Associated analysis

Some analysis is required in support of the design of the experiment while other analysis is performed after changes in the factors are formally found to produce statistically significant changes in the responses. Because experimentation is iterative, the results of one experiment alter plans for following experiments.

Preparatory analysis

The number of experimental units

In the design of an experiment, the number of experimental units is planned to satisfy the goals of the experiment. Experimentation is often sequential.

Early experiments are often designed to provide mean-unbiased estimates of treatment effects and of experimental error. Later experiments are often designed to test a hypothesis that a treatment effect has an important magnitude; in this case, the number of experimental units is chosen so that the experiment is within budget and has adequate power, among other goals.

Reporting sample size analysis is generally required in psychology. "Provide information on sample size and the process that led to sample size decisions."^[49] The analysis, which is written in the experimental protocol before the experiment is conducted, is examined in grant applications and administrative review boards.

Besides the power analysis, there are less formal methods for selecting the number of experimental units. These include graphical methods based on limiting the probability of false negative errors, graphical methods based on an expected variation increase (above the residuals) and methods based on achieving a desired confident interval.^[50]

Power analysis

Power analysis is often applied in the context of ANOVA in order to assess the probability of successfully rejecting the null hypothesis if we assume a certain ANOVA design, effect size in the population, sample size and significance level. Power analysis can assist in study design by determining what sample size would be required in order to have a reasonable chance of rejecting the null hypothesis when the alternative hypothesis is true.^[51]^[52]^[53]^[54]

Effect size

Several standardized measures of effect have been proposed for ANOVA to summarize the strength of the association between a predictor(s) and the dependent variable (e.g., η², ω², or ƒ²) or the overall standardized difference (Ψ) of the complete model. Standardized effect-size estimates facilitate comparison of findings across studies and disciplines. However, while standardized effect sizes are commonly used in much of the professional literature, a non-standardized measure of effect size that has immediately "meaningful" units may be preferable for reporting purposes.^[55]

Follow-up analysis

It is always appropriate to carefully consider outliers. They have a disproportionate impact on statistical conclusions and are often the result of errors.

Model confirmation

It is prudent to verify that the assumptions of ANOVA have been met. Residuals are examined or analyzed to confirm homoscedasticity and gross normality.^[56] Residuals should have the appearance of (zero mean normal distribution) noise when plotted as a function of anything including time and modeled data values. Trends hint at interactions among factors or among observations. One rule of thumb: "If the largest standard deviation is less than twice the smallest standard deviation, we can use methods based on the assumption of equal standard deviations and our results will still be approximately correct."^[57]

Follow-up tests

A statistically significant effect in ANOVA is often followed up with one or more different follow-up tests. This can be done in order to assess which groups are different from which other groups or to test various other focused hypotheses. Follow-up tests are often distinguished in terms of whether they are planned (a priori) or post hoc. Planned tests are determined before looking at the data and post hoc tests are performed after looking at the data.

Often one of the "treatments" is none, so the treatment group can act as a control. Dunnett's test (a modification of the t-test) tests whether each of the other treatment groups has the same mean as the control.^[58]

Post hoc tests such as Tukey's range test most commonly compare every group mean with every other group mean and typically incorporate some method of controlling for Type I errors. Comparisons, which are most commonly planned, can be either simple or compound. Simple comparisons compare one group mean with one other group mean. Compound comparisons typically compare two sets of groups means where one set has two or more groups (e.g., compare average group means of group A, B and C with group D). Comparisons can also look at tests of trend, such as linear and quadratic relationships, when the independent variable involves ordered levels.

Following ANOVA with pair-wise multiple-comparison tests has been criticized on several grounds.^[55]^[59] There are many such tests (10 in one table) and recommendations regarding their use are vague or conflicting.^[60]^[61]

Study designs and ANOVAs

There are several types of ANOVA. Many statisticians base ANOVA on the design of the experiment,^[62] especially on the protocol that specifies the random assignment of treatments to subjects; the protocol's description of the assignment mechanism should include a specification of the structure of the treatments and of any blocking. It is also common to apply ANOVA to observational data using an appropriate statistical model.^{[citation needed]}

Some popular designs use the following types of ANOVA:

One-way ANOVA is used to test for differences among two or more independent groups (means),e.g. different levels of urea application in a crop, or different levels of antibiotic action on several bacterial species,^[63] or different levels of effect of some medicine on groups of patients. Typically, however, the one-way ANOVA is used to test for differences among at least three groups, since the two-group case can be covered by a t-test.^[64] When there are only two means to compare, the t-test and the ANOVA F-test are equivalent; the relation between ANOVA and t is given by F = t².
Factorial ANOVA is used when the experimenter wants to study the interaction effects among the treatments.
Repeated measures ANOVA is used when the same subjects are used for each treatment (e.g., in a longitudinal study).
Multivariate analysis of variance (MANOVA) is used when there is more than one response variable.

ANOVA cautions

Balanced experiments (those with an equal sample size for each treatment) are relatively easy to interpret; Unbalanced experiments offer more complexity. For single factor (one way) ANOVA, the adjustment for unbalanced data is easy, but the unbalanced analysis lacks both robustness and power.^[65] For more complex designs the lack of balance leads to further complications. "The orthogonality property of main effects and interactions present in balanced data does not carry over to the unbalanced case. This means that the usual analysis of variance techniques do not apply. Consequently, the analysis of unbalanced factorials is much more difficult than that for balanced designs."^[66] In the general case, "The analysis of variance can also be applied to unbalanced data, but then the sums of squares, mean squares, and F-ratios will depend on the order in which the sources of variation are considered."^[43] The simplest techniques for handling unbalanced data restore balance by either throwing out data or by synthesizing missing data. More complex techniques use regression.

ANOVA is (in part) a significance test. The American Psychological Association holds the view that simply reporting significance is insufficient and that reporting confidence bounds is preferred.^[55]

While ANOVA is conservative (in maintaining a significance level) against multiple comparisons in one dimension, it is not conservative against comparisons in multiple dimensions.^[67]

Generalizations

ANOVA is considered to be a special case of linear regression^[68]^[69] which in turn is a special case of the general linear model.^[70] All consider the observations to be the sum of a model (fit) and a residual (error) to be minimized.

The Kruskal–Wallis test and the Friedman test are nonparametric tests, which do not rely on an assumption of normality.^[71]^[72]

Connection to linear regression

Below we make clear the connection between multi-way ANOVA and linear regression. Linearly re-order the data so that $k^{\text{th}}$ observation is associated with a response $y_{k}$ and factors $Z_{k,b}$ where $b\in \{1,2,\ldots ,B\}$ denotes the different factors and $B$ is the total number of factors. In one-way ANOVA $B=1$ and in two-way ANOVA $B=2$ . Furthermore, we assume the $b^{th}$ factor has $I_{b}$ levels. Now, we can one-hot encode the factors into the $\prod _{b=1}^{B}I_{b}$ dimensional vector $v_{k}$ .

The one-hot encoding function $g_{b}:I_{b}\mapsto \{0,1\}^{I_{b}}$ is defined such that the $i^{th}$ entry of $g_{b}(Z_{k,b})$ is

 
   
     
       
         g
         
           b
         
       
       (
       
         Z
         
           k
           ,
           b
         
       
       
         )
         
           i
         
       
       =
       
         
           {
           
             
               
                 1
               
               
                 
                   if 
                 
                 i
                 =
                 
                   Z
                   
                     k
                     ,
                     b
                   
                 
               
             
             
               
                 0
               
               
                 
                   otherwise
                 
               
             
           
           
         
       
     
   
   {\displaystyle g_{b}(Z_{k,b})_{i}={\begin{cases}1&{\text{if }}i=Z_{k,b}\

==Wikipedia preview== 出典(authority):フリー百科事典『ウィキペディア（Wikipedia）』「2017/03/24 05:15:22」(JST) ====[http://ja.wikipedia.org/wiki/ANOVA wiki ja]==== 分散分析（ぶんさんぶんせき、英: analysis of variance、略称: ANOVA）は、観測データにおける変動を誤差変動と各要因およびそれらの交互作用による変動に分解することによって、要因および交互作用の効果を判定する、統計的仮説検定の一手法である。統計学者かつ遺伝学者であるロナルド・フィッシャーによって1920年代から1930年代にかけて基本手法が確立された。そのため「フィッシャーの分散分析」「フィッシャーのANOVA法」とも呼ばれる。基本的な手法として、まず、データの分散成分の平方和を分解し、誤差による変動から要因効果による変動を分離する。次に、平方和を自由度で割ることで平均平方を算出する。そして、要因効果（または、交互作用）によって説明される平均平方を分子、誤差によって説明される平均平方を分母とすることでF値を計算する（F検定）。各効果の有意性については有意水準を設けて判定する。交互作用の性質を詳しく調べるには、単純主効果の検定や交互作用対比を行うとよい。また、3つ以上の水準を持つ要因の効果が有意であったとき、具体的にどの群とどの群の間に差があったかを知るためには、多重比較を行う必要がある。したがって、分析の目的によっては、分散分析のみから結論が導かれるものではなく、これらの手法と組み合わせて用いることが肝要である。分散分析には各種のモデルがあり、データの性質や要因計画の型、検証したい仮説に応じてそれらを使い分けることが適切な利用法である（一元配置分散分析・回帰分散分析・共分散分析など）。現在では、分散分析は一般線形モデル、構造方程式モデリングの一部として扱えることが判明しており、さらなる拡張も可能である（潜在変数に対する分散分析など）。

ソフトウェア

SASやSPSSといった主要な統計パッケージで、分散分析も実行可能である。R言語にも、分散分析に関わる関数がある。また、分散分析やそれに伴う多重比較に特化したソフトウェアもあり、多くはフリーソフトである。

js-STAR^[1]: 田中敏（信州大学教授）作成による"STAR"をJavaScriptに移植したもの。3要因までの分散分析、単純主効果の検定および多重比較（LSD法、HSD法、Bonferroni法、Holm法）が一度にできる。また、その他にχ²検定や相関係数なども扱うことができる。ウェブ上でそのまま使うことができ、ダウンロードすることもできる。インターフェイスがシンプルで、使い方も分かりやすい。仕様の理論的背景は、田中敏・山際勇一郎による『ユーザーのための教育・心理統計と実験計画法－方法の理解から論文の書き方まで－』（教育出版、1992年、新訂版）に基づくと思われる^[誰?]。『実践データ心理解析－問題の発想・データ処理・論文の作成－』（新曜社、2006年、改訂版）には、js-STARの使用法、分散分析表の読み取り方、論文への記述の仕方などが詳しく解説されている。
ANOVA4 on the Web^[2]: 桐木建始（広島女学院大学教授）が作成。4要因までの分散分析および多重比較（Ryan法）が一度に可能。ブラウザ上でそのまま動作し、インストール不要。また、『わかって楽しい心理統計法入門』（北大路書房、2007年）では、ANOVA4の使い方が解説されている。
分散分析プログラム（AIST-ANOVA）^[3]: 独立行政法人産業技術総合研究所計測標準総合センター計測標準部門物性統計科応用統計研究室で開発された、Excelのアドイン。10要因までの分散分析を行うことができる。
Excel NAG 統計解析アドイン^[4]: Excelに分散分析関数を追加するアドイン。アカデミック版、無料の試供版もあり。
ezANOVA^[5]: 単独で動作するソフトなので、ブラウザやExcelを要しない。
MAANOVA^[6]: MATLAB上で分散分析を行う。
Origin: Origin上で分散分析を行う。一元配置／二元配置、繰返しのある配置、Post-hoc検定などを含む。

その他、汎用言語であるC言語のプログラム^[7]、R言語で独自に書かれた関数^[8]などもある。それぞれのプログラミング言語が使える計算機環境と操作能力が必要である。

脚注

^ Nakano Hiroyuki. “js-STAR 2012”. 2012年4月4日閲覧。
^ Kiriki Kenshi (2002年). “ANOVA4 on the Web”. 2012年4月4日閲覧。
^ 粒子計測研究室 NMIJ/AIST (2011年). “不確かさWeb 分散分析プログラム”. 2012年4月4日閲覧。
^ 日本ニューメリカルアルゴリズムズグループ株式会社 (2012年). “Excel NAG 統計解析アドイン”. 2012年4月4日閲覧。
^ Chris Rorden. “ezANOVA free statistical software”. 2012年4月4日閲覧。
^ Bioconfuctor. “maanova”. 2012年4月4日閲覧。
^ H.Akiba (1995年2月27日). “２因子多水準分散分析”. 2012年4月4日閲覧。
^ 渡辺利夫 (2007年1月30日). “R Language”. 2012年4月4日閲覧。

外部リンク

私のための統計処理分散分析
Visual ANOVA 分散分析のイメージを説明するFlashプログラム
反復測定（度）分散分析/基礎と応用　計算法の解説、プログラムのダウンロード。多重比較や球面性の仮定・球面性検定にも詳しい。
心理生理学データの分散分析　反復測定分散分析の使用や論文中への書き方の解説。多重比較や球面性検定の説明も。

wiki en

[Wiki en表示]

Biologist and statistician Ronald Fisher

Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems.

1 History
2 Motivating example
3 Background and terminology
- 3.1 Design-of-experiments terms
4 Classes of models
- 4.1 Fixed-effects models
- 4.2 Random-effects models
- 4.3 Mixed-effects models
5 Assumptions of ANOVA
- 5.1 Textbook analysis using a normal distribution
- 5.2 Randomization-based analysis
  - 5.2.1 Unit-treatment additivity
  - 5.2.2 Derived linear model
  - 5.2.3 Statistical models for observational data
- 5.3 Summary of assumptions
6 Characteristics of ANOVA
7 Logic of ANOVA
- 7.1 Partitioning of the sum of squares
- 7.2 The F-test
- 7.3 Extended logic
8 ANOVA for a single factor
9 ANOVA for multiple factors
10 Worked numeric examples
11 Associated analysis
- 11.1 Preparatory analysis
  - 11.1.1 The number of experimental units
  - 11.1.2 Power analysis
  - 11.1.3 Effect size
- 11.2 Follow-up analysis
  - 11.2.1 Model confirmation
  - 11.2.2 Follow-up tests
12 Study designs and ANOVAs
13 ANOVA cautions
14 Generalizations
- 14.1 Connection to linear regression
  - 14.1.1 Example
15 See also
16 Footnotes
17 Notes
18 References
19 Further reading
20 External links

History

While the analysis of variance reached fruition in the 20th century, antecedents extend centuries into the past according to Stigler.^[1] These include hypothesis testing, the partitioning of sums of squares, experimental techniques and the additive model. Laplace was performing hypothesis testing in the 1770s.^[2] The development of least-squares methods by Laplace and Gauss circa 1800 provided an improved method of combining observations (over the existing practices then used in astronomy and geodesy). It also initiated much study of the contributions to sums of squares. Laplace soon knew how to estimate a variance from a residual (rather than a total) sum of squares.^[3] By 1827 Laplace was using least squares methods to address ANOVA problems regarding measurements of atmospheric tides.^[4] Before 1800 astronomers had isolated observational errors resulting from reaction times (the "personal equation") and had developed methods of reducing the errors.^[5] The experimental methods used in the study of the personal equation were later accepted by the emerging field of psychology ^[6] which developed strong (full factorial) experimental methods to which randomization and blinding were soon added.^[7] An eloquent non-mathematical explanation of the additive effects model was available in 1885.^[8]

Ronald Fisher introduced the term variance and proposed its formal analysis in a 1918 article The Correlation Between Relatives on the Supposition of Mendelian Inheritance.^[9] His first application of the analysis of variance was published in 1921.^[10] Analysis of variance became widely known after being included in Fisher's 1925 book Statistical Methods for Research Workers.

Randomization models were developed by several researchers. The first was published in Polish by Neyman in 1923.^[11]

One of the attributes of ANOVA which ensured its early popularity was computational elegance. The structure of the additive model allows solution for the additive coefficients by simple algebra rather than by matrix calculations. In the era of mechanical calculators this simplicity was critical. The determination of statistical significance also required access to tables of the F function which were supplied by early statistics texts.

Motivating example

No fit.

Fair fit

Very good fit

The analysis of variance can be used as an exploratory tool to explain observations. A dog show provides an example. A dog show is not a random sampling of the breed: it is typically limited to dogs that are adult, pure-bred, and exemplary. A histogram of dog weights from a show might plausibly be rather complex, like the yellow-orange distribution shown in the illustrations. Suppose we wanted to predict the weight of a dog based on a certain set of characteristics of each dog. Before we could do that, we would need to explain the distribution of weights by dividing the dog population into groups based on those characteristics. A successful grouping will split dogs such that (a) each group has a low variance of dog weights (meaning the group is relatively homogeneous) and (b) the mean of each group is distinct (if two groups have the same mean, then it isn't reasonable to conclude that the groups are, in fact, separate in any meaningful way).

In the illustrations to the right, each group is identified as X₁, X₂, etc. In the first illustration, we divide the dogs according to the product (interaction) of two binary groupings: young vs old, and short-haired vs long-haired (thus, group 1 is young, short-haired dogs, group 2 is young, long-haired dogs, etc.). Since the distributions of dog weight within each of the groups (shown in blue) has a large variance, and since the means are very close across groups, grouping dogs by these characteristics does not produce an effective way to explain the variation in dog weights: knowing which group a dog is in does not allow us to make any reasonable statements as to what that dog's weight is likely to be. Thus, this grouping fails to fit the distribution we are trying to explain (yellow-orange).

An attempt to explain the weight distribution by grouping dogs as (pet vs working breed) and (less athletic vs more athletic) would probably be somewhat more successful (fair fit). The heaviest show dogs are likely to be big strong working breeds, while breeds kept as pets tend to be smaller and thus lighter. As shown by the second illustration, the distributions have variances that are considerably smaller than in the first case, and the means are more reasonably distinguishable. However, the significant overlap of distributions, for example, means that we cannot reliably say that X₁ and X₂ are truly distinct (i.e., it is perhaps reasonably likely that splitting dogs according to the flip of a coin—by pure chance—might produce distributions that look similar).

An attempt to explain weight by breed is likely to produce a very good fit. All Chihuahuas are light and all St Bernards are heavy. The difference in weights between Setters and Pointers does not justify separate breeds. The analysis of variance provides the formal tools to justify these intuitive judgments. A common use of the method is the analysis of experimental data or the development of models. The method has some advantages over correlation: not all of the data must be numeric and one result of the method is a judgment in the confidence in an explanatory relationship.

Background and terminology

ANOVA is a particular form of statistical hypothesis testing heavily used in the analysis of experimental data. A test result (calculated from the null hypothesis and the sample) is called statistically significant if it is deemed unlikely to have occurred by chance, assuming the truth of the null hypothesis. A statistically significant result, when a probability (p-value) is less than a threshold (significance level), justifies the rejection of the null hypothesis, but only if the a priori probability of the null hypothesis is not high.

In the typical application of ANOVA, the null hypothesis is that all groups are simply random samples of the same population. For example, when studying the effect of different treatments on similar samples of patients, the null hypothesis would be that all treatments have the same effect (perhaps none). Rejecting the null hypothesis would imply that different treatments result in altered effects.

By construction, hypothesis testing limits the rate of Type I errors (false positives) to a significance level. Experimenters also wish to limit Type II errors (false negatives). The rate of Type II errors depends largely on sample size (the rate will increase for small numbers of samples), significance level (when the standard of proof is high, the chances of overlooking a discovery are also high) and effect size (a smaller effect size is more prone to Type II error).

The terminology of ANOVA is largely from the statistical design of experiments. The experimenter adjusts factors and measures responses in an attempt to determine an effect. Factors are assigned to experimental units by a combination of randomization and blocking to ensure the validity of the results. Blinding keeps the weighing impartial. Responses show a variability that is partially the result of the effect and is partially random error.

ANOVA is the synthesis of several ideas and it is used for multiple purposes. As a consequence, it is difficult to define concisely or precisely.

"Classical ANOVA for balanced data does three things at once:

As exploratory data analysis, an ANOVA is an organization of an additive data decomposition, and its sums of squares indicate the variance of each component of the decomposition (or, equivalently, each set of terms of a linear model).
Comparisons of mean squares, along with an F-test ... allow testing of a nested sequence of models.
Closely related to the ANOVA is a linear model fit with coefficient estimates and standard errors."^[12]

In short, ANOVA is a statistical tool used in several ways to develop and confirm an explanation for the observed data.

Additionally:

It is computationally elegant and relatively robust against violations of its assumptions.
ANOVA provides industrial strength (multiple sample comparison) statistical analysis.
It has been adapted to the analysis of a variety of experimental designs.

As a result: ANOVA "has long enjoyed the status of being the most used (some would say abused) statistical technique in psychological research."^[13] ANOVA "is probably the most useful technique in the field of statistical inference."^[14]

ANOVA is difficult to teach, particularly for complex experiments, with split-plot designs being notorious.^[15] In some cases the proper application of the method is best determined by problem pattern recognition followed by the consultation of a classic authoritative test.^[16]

Design-of-experiments terms

(Condensed from the NIST Engineering Statistics handbook: Section 5.7. A Glossary of DOE Terminology.)^[17]

Balanced design: An experimental design where all cells (i.e. treatment combinations) have the same number of observations.
Blocking: A schedule for conducting treatment combinations in an experimental study such that any effects on the experimental results due to a known change in raw materials, operators, machines, etc., become concentrated in the levels of the blocking variable. The reason for blocking is to isolate a systematic effect and prevent it from obscuring the main effects. Blocking is achieved by restricting randomization.
Design: A set of experimental runs which allows the fit of a particular model and the estimate of effects.
DOE: Design of experiments. An approach to problem solving involving collection of data that will support valid, defensible, and supportable conclusions.^[18]
Effect: How changing the settings of a factor changes the response. The effect of a single factor is also called a main effect.
Error: Unexplained variation in a collection of observations. DOE's typically require understanding of both random error and lack of fit error.
Experimental unit: The entity to which a specific treatment combination is applied.
Factors: Process inputs an investigator manipulates to cause a change in the output.
Lack-of-fit error: Error that occurs when the analysis omits one or more important terms or factors from the process model. Including replication in a DOE allows separation of experimental error into its components: lack of fit and random (pure) error.
Model: Mathematical relationship which relates changes in a given response to changes in one or more factors.
Random error: Error that occurs due to natural variation in the process. Random error is typically assumed to be normally distributed with zero mean and a constant variance. Random error is also called experimental error.
Randomization: A schedule for allocating treatment material and for conducting treatment combinations in a DOE such that the conditions in one run neither depend on the conditions of the previous run nor predict the conditions in the subsequent runs.^{[nb 1]}
Replication: Performing the same treatment combination more than once. Including replication allows an estimate of the random error independent of any lack of fit error.
Responses: The output(s) of a process. Sometimes called dependent variable(s).
Treatment: A treatment is a specific combination of factor levels whose effect is to be compared with other treatments.

Classes of models

There are three classes of models used in the analysis of variance, and these are outlined here.

Fixed-effects models

The fixed-effects model (class I) of analysis of variance applies to situations in which the experimenter applies one or more treatments to the subjects of the experiment to see whether the response variable values change. This allows the experimenter to estimate the ranges of response variable values that the treatment would generate in the population as a whole.

Random-effects models

Random effects model (class II) is used when the treatments are not fixed. This occurs when the various factor levels are sampled from a larger population. Because the levels themselves are random variables, some assumptions and the method of contrasting the treatments (a multi-variable generalization of simple differences) differ from the fixed-effects model.^[19]

Mixed-effects models

A mixed-effects model (class III) contains experimental factors of both fixed and random-effects types, with appropriately different interpretations and analysis for the two types.

Example: Teaching experiments could be performed by a college or university department to find a good introductory textbook, with each text considered a treatment. The fixed-effects model would compare a list of candidate texts. The random-effects model would determine whether important differences exist among a list of randomly selected texts. The mixed-effects model would compare the (fixed) incumbent texts to randomly selected alternatives.

Defining fixed and random effects has proven elusive, with competing definitions arguably leading toward a linguistic quagmire.^[20]

Assumptions of ANOVA

The analysis of variance has been studied from several approaches, the most common of which uses a linear model that relates the response to the treatments and blocks. Note that the model is linear in parameters but may be nonlinear across factor levels. Interpretation is easy when data is balanced across factors but much deeper understanding is needed for unbalanced data.

Textbook analysis using a normal distribution

The analysis of variance can be presented in terms of a linear model, which makes the following assumptions about the probability distribution of the responses:^[21]^[22]^[23]^[24]

Independence of observations – this is an assumption of the model that simplifies the statistical analysis.
Normality – the distributions of the residuals are normal.
Equality (or "homogeneity") of variances, called homoscedasticity — the variance of data in groups should be the same.

The separate assumptions of the textbook model imply that the errors are independently, identically, and normally distributed for fixed effects models, that is, that the errors ( $\varepsilon$ ) are independent and

\varepsilon \thicksim N(0,\sigma ^{2}).\,

Randomization-based analysis

In a randomized controlled experiment, the treatments are randomly assigned to experimental units, following the experimental protocol. This randomization is objective and declared before the experiment is carried out. The objective random-assignment is used to test the significance of the null hypothesis, following the ideas of C. S. Peirce and Ronald Fisher. This design-based analysis was discussed and developed by Francis J. Anscombe at Rothamsted Experimental Station and by Oscar Kempthorne at Iowa State University.^[25] Kempthorne and his students make an assumption of unit treatment additivity, which is discussed in the books of Kempthorne and David R. Cox.^{[citation needed]}

Unit-treatment additivity

In its simplest form, the assumption of unit-treatment additivity^{[nb 2]} states that the observed response $y_{i,j}$ from experimental unit $i$ when receiving treatment $j$ can be written as the sum of the unit's response $y_{i}$ and the treatment-effect $t_{j}$ , that is ^[26]^[27]^[28]

y_{i,j}=y_{i}+t_{j}.

The assumption of unit-treatment additivity implies that, for every treatment $j$ , the $j$ th treatment has exactly the same effect $t_{j}$ on every experiment unit.

The assumption of unit treatment additivity usually cannot be directly falsified, according to Cox and Kempthorne. However, many consequences of treatment-unit additivity can be falsified. For a randomized experiment, the assumption of unit-treatment additivity implies that the variance is constant for all treatments. Therefore, by contraposition, a necessary condition for unit-treatment additivity is that the variance is constant.

The use of unit treatment additivity and randomization is similar to the design-based inference that is standard in finite-population survey sampling.

Derived linear model

Kempthorne uses the randomization-distribution and the assumption of unit treatment additivity to produce a derived linear model, very similar to the textbook model discussed previously.^[29] The test statistics of this derived linear model are closely approximated by the test statistics of an appropriate normal linear model, according to approximation theorems and simulation studies.^[30] However, there are differences. For example, the randomization-based analysis results in a small but (strictly) negative correlation between the observations.^[31]^[32] In the randomization-based analysis, there is no assumption of a normal distribution and certainly no assumption of independence. On the contrary, the observations are dependent!

The randomization-based analysis has the disadvantage that its exposition involves tedious algebra and extensive time. Since the randomization-based analysis is complicated and is closely approximated by the approach using a normal linear model, most teachers emphasize the normal linear model approach. Few statisticians object to model-based analysis of balanced randomized experiments.

Statistical models for observational data

However, when applied to data from non-randomized experiments or observational studies, model-based analysis lacks the warrant of randomization.^[33] For observational data, the derivation of confidence intervals must use subjective models, as emphasized by Ronald Fisher and his followers. In practice, the estimates of treatment-effects from observational studies generally are often inconsistent. In practice, "statistical models" and observational data are useful for suggesting hypotheses that should be treated very cautiously by the public.^[34]

Summary of assumptions

The normal-model based ANOVA analysis assumes the independence, normality and homogeneity of the variances of the residuals. The randomization-based analysis assumes only the homogeneity of the variances of the residuals (as a consequence of unit-treatment additivity) and uses the randomization procedure of the experiment. Both these analyses require homoscedasticity, as an assumption for the normal-model analysis and as a consequence of randomization and additivity for the randomization-based analysis.

However, studies of processes that change variances rather than means (called dispersion effects) have been successfully conducted using ANOVA.^[35] There are no necessary assumptions for ANOVA in its full generality, but the F-test used for ANOVA hypothesis testing has assumptions and practical limitations which are of continuing interest.

Problems which do not satisfy the assumptions of ANOVA can often be transformed to satisfy the assumptions. The property of unit-treatment additivity is not invariant under a "change of scale", so statisticians often use transformations to achieve unit-treatment additivity. If the response variable is expected to follow a parametric family of probability distributions, then the statistician may specify (in the protocol for the experiment or observational study) that the responses be transformed to stabilize the variance.^[36] Also, a statistician may specify that logarithmic transforms be applied to the responses, which are believed to follow a multiplicative model.^[27]^[37] According to Cauchy's functional equation theorem, the logarithm is the only continuous transformation that transforms real multiplication to addition.^{[citation needed]}

Characteristics of ANOVA

ANOVA is used in the analysis of comparative experiments, those in which only the difference in outcomes is of interest. The statistical significance of the experiment is determined by a ratio of two variances. This ratio is independent of several possible alterations to the experimental observations: Adding a constant to all observations does not alter significance. Multiplying all observations by a constant does not alter significance. So ANOVA statistical significance result is independent of constant bias and scaling errors as well as the units used in expressing observations. In the era of mechanical calculation it was common to subtract a constant from all observations (when equivalent to dropping leading digits) to simplify data entry.^[38]^[39] This is an example of data coding.

Logic of ANOVA

The calculations of ANOVA can be characterized as computing a number of means and variances, dividing two variances and comparing the ratio to a handbook value to determine statistical significance. Calculating a treatment effect is then trivial, "the effect of any treatment is estimated by taking the difference between the mean of the observations which receive the treatment and the general mean."^[40]

Partitioning of the sum of squares

ANOVA uses traditional standardized terminology. The definitional equation of sample variance is $s^{2}=\textstyle {\frac {1}{n-1}}\sum (y_{i}-{\bar {y}})^{2}$ , where the divisor is called the degrees of freedom (DF), the summation is called the sum of squares (SS), the result is called the mean square (MS) and the squared terms are deviations from the sample mean. ANOVA estimates 3 sample variances: a total variance based on all the observation deviations from the grand mean, an error variance based on all the observation deviations from their appropriate treatment means and a treatment variance. The treatment variance is based on the deviations of treatment means from the grand mean, the result being multiplied by the number of observations in each treatment to account for the difference between the variance of observations and the variance of means.

The fundamental technique is a partitioning of the total sum of squares SS into components related to the effects used in the model. For example, the model for a simplified ANOVA with one type of treatment at different levels.

SS_{\text{Total}}=SS_{\text{Error}}+SS_{\text{Treatments}}

The number of degrees of freedom DF can be partitioned in a similar way: one of these components (that for error) specifies a chi-squared distribution which describes the associated sum of squares, while the same is true for "treatments" if there is no treatment effect.

DF_{\text{Total}}=DF_{\text{Error}}+DF_{\text{Treatments}}

The F-test

The F-test is used for comparing the factors of the total deviation. For example, in one-way, or single-factor ANOVA, statistical significance is tested for by comparing the F test statistic

F={\frac {\text{variance between treatments}}{\text{variance within treatments}}}

F={\frac {MS_{\text{Treatments}}}{MS_{\text{Error}}}}={{SS_{\text{Treatments}}/(I-1)} \over {SS_{\text{Error}}/(n_{T}-I)}}

where MS is mean square, $I$ = number of treatments and $n_{T}$ = total number of cases

to the F-distribution with $I-1$ , $n_{T}-I$ degrees of freedom. Using the F-distribution is a natural candidate because the test statistic is the ratio of two scaled sums of squares each of which follows a scaled chi-squared distribution.

The expected value of F is $1+{n\sigma _{\text{Treatment}}^{2}}/{\sigma _{\text{Error}}^{2}}$ (where n is the treatment sample size) which is 1 for no treatment effect. As values of F increase above 1, the evidence is increasingly inconsistent with the null hypothesis. Two apparent experimental methods of increasing F are increasing the sample size and reducing the error variance by tight experimental controls.

There are two methods of concluding the ANOVA hypothesis test, both of which produce the same result:

The textbook method is to compare the observed value of F with the critical value of F determined from tables. The critical value of F is a function of the degrees of freedom of the numerator and the denominator and the significance level (α). If F ≥ F_Critical, the null hypothesis is rejected.
The computer method calculates the probability (p-value) of a value of F greater than or equal to the observed value. The null hypothesis is rejected if this probability is less than or equal to the significance level (α).

The ANOVA F-test is known to be nearly optimal in the sense of minimizing false negative errors for a fixed rate of false positive errors (i.e. maximizing power for a fixed significance level). For example, to test the hypothesis that various medical treatments have exactly the same effect, the F-test's p-values closely approximate the permutation test's p-values: The approximation is particularly close when the design is balanced.^[30]^[41] Such permutation tests characterize tests with maximum power against all alternative hypotheses, as observed by Rosenbaum.^{[nb 3]} The ANOVA F–test (of the null-hypothesis that all treatments have exactly the same effect) is recommended as a practical test, because of its robustness against many alternative distributions.^[42]^{[nb 4]}

Extended logic

ANOVA consists of separable parts; partitioning sources of variance and hypothesis testing can be used individually. ANOVA is used to support other statistical tools. Regression is first used to fit more complex models to data, then ANOVA is used to compare models with the objective of selecting simple(r) models that adequately describe the data. "Such models could be fit without any reference to ANOVA, but ANOVA tools could then be used to make some sense of the fitted models, and to test hypotheses about batches of coefficients."^[43] "[W]e think of the analysis of variance as a way of understanding and structuring multilevel models—not as an alternative to regression but as a tool for summarizing complex high-dimensional inferences ..."^[43]

ANOVA for a single factor

The simplest experiment suitable for ANOVA analysis is the completely randomized experiment with a single factor. More complex experiments with a single factor involve constraints on randomization and include completely randomized blocks and Latin squares (and variants: Graeco-Latin squares, etc.). The more complex experiments share many of the complexities of multiple factors. A relatively complete discussion of the analysis (models, data summaries, ANOVA table) of the completely randomized experiment is available.

ANOVA for multiple factors

ANOVA generalizes to the study of the effects of multiple factors. When the experiment includes observations at all combinations of levels of each factor, it is termed factorial. Factorial experiments are more efficient than a series of single factor experiments and the efficiency grows as the number of factors increases.^[44] Consequently, factorial designs are heavily used.

The use of ANOVA to study the effects of multiple factors has a complication. In a 3-way ANOVA with factors x, y and z, the ANOVA model includes terms for the main effects (x, y, z) and terms for interactions (xy, xz, yz, xyz). All terms require hypothesis tests. The proliferation of interaction terms increases the risk that some hypothesis test will produce a false positive by chance. Fortunately, experience says that high order interactions are rare.^[45]^{[verification needed]} The ability to detect interactions is a major advantage of multiple factor ANOVA. Testing one factor at a time hides interactions, but produces apparently inconsistent experimental results.^[44]

Caution is advised when encountering interactions; Test interaction terms first and expand the analysis beyond ANOVA if interactions are found. Texts vary in their recommendations regarding the continuation of the ANOVA procedure after encountering an interaction. Interactions complicate the interpretation of experimental data. Neither the calculations of significance nor the estimated treatment effects can be taken at face value. "A significant interaction will often mask the significance of main effects."^[46] Graphical methods are recommended to enhance understanding. Regression is often useful. A lengthy discussion of interactions is available in Cox (1958).^[47] Some interactions can be removed (by transformations) while others cannot.

A variety of techniques are used with multiple factor ANOVA to reduce expense. One technique used in factorial designs is to minimize replication (possibly no replication with support of analytical trickery) and to combine groups when effects are found to be statistically (or practically) insignificant. An experiment with many insignificant factors may collapse into one with a few factors supported by many replications.^[48]

Worked numeric examples

Several fully worked numerical examples are available. A simple case uses one-way (a single factor) analysis. A more complex case uses two-way (two-factor) analysis.

Associated analysis

Some analysis is required in support of the design of the experiment while other analysis is performed after changes in the factors are formally found to produce statistically significant changes in the responses. Because experimentation is iterative, the results of one experiment alter plans for following experiments.

Preparatory analysis

The number of experimental units

In the design of an experiment, the number of experimental units is planned to satisfy the goals of the experiment. Experimentation is often sequential.

Early experiments are often designed to provide mean-unbiased estimates of treatment effects and of experimental error. Later experiments are often designed to test a hypothesis that a treatment effect has an important magnitude; in this case, the number of experimental units is chosen so that the experiment is within budget and has adequate power, among other goals.

Reporting sample size analysis is generally required in psychology. "Provide information on sample size and the process that led to sample size decisions."^[49] The analysis, which is written in the experimental protocol before the experiment is conducted, is examined in grant applications and administrative review boards.

Besides the power analysis, there are less formal methods for selecting the number of experimental units. These include graphical methods based on limiting the probability of false negative errors, graphical methods based on an expected variation increase (above the residuals) and methods based on achieving a desired confident interval.^[50]

Power analysis

Power analysis is often applied in the context of ANOVA in order to assess the probability of successfully rejecting the null hypothesis if we assume a certain ANOVA design, effect size in the population, sample size and significance level. Power analysis can assist in study design by determining what sample size would be required in order to have a reasonable chance of rejecting the null hypothesis when the alternative hypothesis is true.^[51]^[52]^[53]^[54]

Effect size

Several standardized measures of effect have been proposed for ANOVA to summarize the strength of the association between a predictor(s) and the dependent variable (e.g., η², ω², or ƒ²) or the overall standardized difference (Ψ) of the complete model. Standardized effect-size estimates facilitate comparison of findings across studies and disciplines. However, while standardized effect sizes are commonly used in much of the professional literature, a non-standardized measure of effect size that has immediately "meaningful" units may be preferable for reporting purposes.^[55]

Follow-up analysis

It is always appropriate to carefully consider outliers. They have a disproportionate impact on statistical conclusions and are often the result of errors.

Model confirmation

It is prudent to verify that the assumptions of ANOVA have been met. Residuals are examined or analyzed to confirm homoscedasticity and gross normality.^[56] Residuals should have the appearance of (zero mean normal distribution) noise when plotted as a function of anything including time and modeled data values. Trends hint at interactions among factors or among observations. One rule of thumb: "If the largest standard deviation is less than twice the smallest standard deviation, we can use methods based on the assumption of equal standard deviations and our results will still be approximately correct."^[57]

Follow-up tests

A statistically significant effect in ANOVA is often followed up with one or more different follow-up tests. This can be done in order to assess which groups are different from which other groups or to test various other focused hypotheses. Follow-up tests are often distinguished in terms of whether they are planned (a priori) or post hoc. Planned tests are determined before looking at the data and post hoc tests are performed after looking at the data.

Often one of the "treatments" is none, so the treatment group can act as a control. Dunnett's test (a modification of the t-test) tests whether each of the other treatment groups has the same mean as the control.^[58]

Post hoc tests such as Tukey's range test most commonly compare every group mean with every other group mean and typically incorporate some method of controlling for Type I errors. Comparisons, which are most commonly planned, can be either simple or compound. Simple comparisons compare one group mean with one other group mean. Compound comparisons typically compare two sets of groups means where one set has two or more groups (e.g., compare average group means of group A, B and C with group D). Comparisons can also look at tests of trend, such as linear and quadratic relationships, when the independent variable involves ordered levels.

Following ANOVA with pair-wise multiple-comparison tests has been criticized on several grounds.^[55]^[59] There are many such tests (10 in one table) and recommendations regarding their use are vague or conflicting.^[60]^[61]

Study designs and ANOVAs

There are several types of ANOVA. Many statisticians base ANOVA on the design of the experiment,^[62] especially on the protocol that specifies the random assignment of treatments to subjects; the protocol's description of the assignment mechanism should include a specification of the structure of the treatments and of any blocking. It is also common to apply ANOVA to observational data using an appropriate statistical model.^{[citation needed]}

Some popular designs use the following types of ANOVA:

One-way ANOVA is used to test for differences among two or more independent groups (means),e.g. different levels of urea application in a crop, or different levels of antibiotic action on several bacterial species,^[63] or different levels of effect of some medicine on groups of patients. Typically, however, the one-way ANOVA is used to test for differences among at least three groups, since the two-group case can be covered by a t-test.^[64] When there are only two means to compare, the t-test and the ANOVA F-test are equivalent; the relation between ANOVA and t is given by F = t².
Factorial ANOVA is used when the experimenter wants to study the interaction effects among the treatments.
Repeated measures ANOVA is used when the same subjects are used for each treatment (e.g., in a longitudinal study).
Multivariate analysis of variance (MANOVA) is used when there is more than one response variable.

ANOVA cautions

Balanced experiments (those with an equal sample size for each treatment) are relatively easy to interpret; Unbalanced experiments offer more complexity. For single factor (one way) ANOVA, the adjustment for unbalanced data is easy, but the unbalanced analysis lacks both robustness and power.^[65] For more complex designs the lack of balance leads to further complications. "The orthogonality property of main effects and interactions present in balanced data does not carry over to the unbalanced case. This means that the usual analysis of variance techniques do not apply. Consequently, the analysis of unbalanced factorials is much more difficult than that for balanced designs."^[66] In the general case, "The analysis of variance can also be applied to unbalanced data, but then the sums of squares, mean squares, and F-ratios will depend on the order in which the sources of variation are considered."^[43] The simplest techniques for handling unbalanced data restore balance by either throwing out data or by synthesizing missing data. More complex techniques use regression.

ANOVA is (in part) a significance test. The American Psychological Association holds the view that simply reporting significance is insufficient and that reporting confidence bounds is preferred.^[55]

While ANOVA is conservative (in maintaining a significance level) against multiple comparisons in one dimension, it is not conservative against comparisons in multiple dimensions.^[67]

Generalizations

ANOVA is considered to be a special case of linear regression^[68]^[69] which in turn is a special case of the general linear model.^[70] All consider the observations to be the sum of a model (fit) and a residual (error) to be minimized.

The Kruskal–Wallis test and the Friedman test are nonparametric tests, which do not rely on an assumption of normality.^[71]^[72]

Connection to linear regression

Below we make clear the connection between multi-way ANOVA and linear regression. Linearly re-order the data so that $k^{\text{th}}$ observation is associated with a response $y_{k}$ and factors $Z_{k,b}$ where $b\in \{1,2,\ldots ,B\}$ denotes the different factors and $B$ is the total number of factors. In one-way ANOVA $B=1$ and in two-way ANOVA $B=2$ . Furthermore, we assume the $b^{th}$ factor has $I_{b}$ levels. Now, we can one-hot encode the factors into the $\prod _{b=1}^{B}I_{b}$ dimensional vector $v_{k}$ .

The one-hot encoding function $g_{b}:I_{b}\mapsto \{0,1\}^{I_{b}}$ is defined such that the $i^{th}$ entry of $g_{b}(Z_{k,b})$ is

{\displaystyle g_{b}(Z_{k,b})_{i}={\begin{cases}1&{\text{if }}i=Z_{k,b}\

The vector $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle v_{k}}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <msub>
         <mi>v</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>k</mi>
         </mrow>
       </msub>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle v_{k}}</annotation>
 </semantics>

</math> is the concatenation of all of the above vectors for all $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle b}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <mi>b</mi>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle b}</annotation>
 </semantics>

</math>. Thus, $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle v_{k}=[g_{1}(Z_{k,1}),g_{2}(Z_{k,2}),\ldots ,g_{B}(Z_{k,B})]}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <msub>
         <mi>v</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>k</mi>
         </mrow>
       </msub>
       <mo>=</mo>
       <mo stretchy="false">[</mo>
       <msub>
         <mi>g</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mn>1</mn>
         </mrow>
       </msub>
       <mo stretchy="false">(</mo>
       <msub>
         <mi>Z</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>k</mi>
           <mo>,</mo>
           <mn>1</mn>
         </mrow>
       </msub>
       <mo stretchy="false">)</mo>
       <mo>,</mo>
       <msub>
         <mi>g</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mn>2</mn>
         </mrow>
       </msub>
       <mo stretchy="false">(</mo>
       <msub>
         <mi>Z</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>k</mi>
           <mo>,</mo>
           <mn>2</mn>
         </mrow>
       </msub>
       <mo stretchy="false">)</mo>
       <mo>,</mo>
       <mo>…</mo>
       <mo>,</mo>
       <msub>
         <mi>g</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>B</mi>
         </mrow>
       </msub>
       <mo stretchy="false">(</mo>
       <msub>
         <mi>Z</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>k</mi>
           <mo>,</mo>
           <mi>B</mi>
         </mrow>
       </msub>
       <mo stretchy="false">)</mo>
       <mo stretchy="false">]</mo>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle v_{k}=[g_{1}(Z_{k,1}),g_{2}(Z_{k,2}),\ldots ,g_{B}(Z_{k,B})]}</annotation>
 </semantics>

</math>. In order to obtain a fully general $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle B}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <mi>B</mi>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle B}</annotation>
 </semantics>

</math>-way interaction ANOVA we must also concatenate every additional interaction term in the vector $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle v_{k}}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <msub>
         <mi>v</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>k</mi>
         </mrow>
       </msub>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle v_{k}}</annotation>
 </semantics>

</math> and then add an intercept term. Let that vector be $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle X_{k}}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <msub>
         <mi>X</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>k</mi>
         </mrow>
       </msub>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle X_{k}}</annotation>
 </semantics>

</math>.

With this notation in place, we now have the exact connection with linear regression. We simply regress response $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle y_{k}}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <msub>
         <mi>y</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>k</mi>
         </mrow>
       </msub>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle y_{k}}</annotation>
 </semantics>

</math> against the vector $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle X_{k}}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <msub>
         <mi>X</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>k</mi>
         </mrow>
       </msub>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle X_{k}}</annotation>
 </semantics>

</math>. However, there is a concern about identifiability. In order to overcome such issues we assume that the sum of the parameters within each set of interactions is equal to zero. From here, one can use F-statistics or other methods to determine the relevance of the individual factors.

Example

We can consider the 2-way interaction example where we assume that the first factor has 2 levels and the second factor has 3 levels.

Define $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle a_{i}=1}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <msub>
         <mi>a</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>i</mi>
         </mrow>
       </msub>
       <mo>=</mo>
       <mn>1</mn>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle a_{i}=1}</annotation>
 </semantics>

</math> if $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle Z_{k,1}=i}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <msub>
         <mi>Z</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>k</mi>
           <mo>,</mo>
           <mn>1</mn>
         </mrow>
       </msub>
       <mo>=</mo>
       <mi>i</mi>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle Z_{k,1}=i}</annotation>
 </semantics>

</math> and $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle b_{i}=1}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <msub>
         <mi>b</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>i</mi>
         </mrow>
       </msub>
       <mo>=</mo>
       <mn>1</mn>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle b_{i}=1}</annotation>
 </semantics>

</math> if $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle Z_{k,2}=i}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <msub>
         <mi>Z</mi>
         <mrow class="MJX-TeXAtom-ORD">
           <mi>k</mi>
           <mo>,</mo>
           <mn>2</mn>
         </mrow>
       </msub>
       <mo>=</mo>
       <mi>i</mi>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle Z_{k,2}=i}</annotation>
 </semantics>

</math>, i.e. $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle a}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <mi>a</mi>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle a}</annotation>
 </semantics>

</math> is the one-hot encoding of the first factor and $<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle b}">$

 <semantics>
   <mrow class="MJX-TeXAtom-ORD">
     <mstyle displaystyle="true" scriptlevel="0">
       <mi>b</mi>
     </mstyle>
   </mrow>
   <annotation encoding="application/x-tex">{\displaystyle b}</annotation>
 </semantics>

</math> is the one-hot encoding of the second factor.

With that,

<math display="block" xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle X_{k}=[a_{1},a_{2},b_{1},b_{2},b_{3},a_{1}\times b_{1},a_{1}\times b_{2},a_{1}\times b_{3},a_{2}\times b_{1},a_{2}\times b_{2},a_{2}\times b_{3},1]}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <msub> <mi>X</mi> <mrow class="MJX-TeXAtom-ORD"> <mi>k</mi> </mrow> </msub> <mo>=</mo> <mo stretchy="false">[</mo> <msub> <mi>a</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>a</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>2</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>b</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>b</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>2</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>b</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>3</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>a</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>1</mn> </mrow> </msub> <mo>×</mo> <msub> <mi>b</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>a</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>1</mn> </mrow> </msub> <mo>×</mo> <msub> <mi>b</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>2</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>a</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>1</mn> </mrow> </msub> <mo>×</mo> <msub> <mi>b</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>3</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>a</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>2</mn> </mrow> </msub> <mo>×</mo> <msub> <mi>b</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>a</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>2</mn> </mrow> </msub> <mo>×</mo> <msub> <mi>b</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>2</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>a</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>2</mn> </mrow> </msub> <mo>×</mo> <msub> <mi>b</mi> <mrow class="MJX-TeXAtom-ORD"> <mn>3</mn> </mrow> </msub> <mo>,</mo> <mn>1</mn> <mo stretchy="false">]</mo> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle X_{k}=[a_{1},a_{2},b_{1},b_{2},b_{3},a_{1}\times b_{1},a_{1}\times b_{2},a_{1}\times b_{3},a_{2}\times b_{1},a_{2}\times b_{2},a_{2}\times b_{3},1]}</annotation> </semantics> </math>

where the last term is an intercept term. For a more concrete example suppose that

<math display="block" xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle {\begin{aligned}Z_{k,1}&=2\\Z_{k,2}&=1\end{aligned}}}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <mrow class="MJX-TeXAtom-ORD"> <mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"> <mtr> <mtd> <msub> <mi>Z</mi> <mrow class="MJX-TeXAtom-ORD"> <mi>k</mi> <mo>,</mo> <mn>1</mn> </mrow> </msub> </mtd> <mtd> <mi></mi> <mo>=</mo> <mn>2</mn> </mtd> </mtr> <mtr> <mtd> <msub> <mi>Z</mi> <mrow class="MJX-TeXAtom-ORD"> <mi>k</mi> <mo>,</mo> <mn>2</mn> </mrow> </msub> </mtd> <mtd> <mi></mi> <mo>=</mo> <mn>1</mn> </mtd> </mtr> </mtable> </mrow> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle {\begin{aligned}Z_{k,1}&=2\\Z_{k,2}&=1\end{aligned}}}</annotation> </semantics> </math>

Then,

<math display="block" xmlns="http://www.w3.org/1998/Math/MathML" alttext="{\displaystyle X_{k}=[0,1,1,0,0,0,0,0,1,0,0,1]}"> <semantics> <mrow class="MJX-TeXAtom-ORD"> <mstyle displaystyle="true" scriptlevel="0"> <msub> <mi>X</mi> <mrow class="MJX-TeXAtom-ORD"> <mi>k</mi> </mrow> </msub> <mo>=</mo> <mo stretchy="false">[</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>0</mn> <mo>,</mo> <mn>0</mn> <mo>,</mo> <mn>0</mn> <mo>,</mo> <mn>0</mn> <mo>,</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>0</mn> <mo>,</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo stretchy="false">]</mo> </mstyle> </mrow> <annotation encoding="application/x-tex">{\displaystyle X_{k}=[0,1,1,0,0,0,0,0,1,0,0,1]}</annotation> </semantics> </math>

Footnotes

^ Randomization is a term used in multiple ways in this material. "Randomization has three roles in applications: as a device for eliminating biases, for example from unobserved explanatory variables and selection effects; as a basis for estimating standard errors; and as a foundation for formally exact significance tests." Cox (2006, page 192) Hinkelmann and Kempthorne use randomization both in experimental design and for statistical analysis.
^ Unit-treatment additivity is simply termed additivity in most texts. Hinkelmann and Kempthorne add adjectives and distinguish between additivity in the strict and broad senses. This allows a detailed consideration of multiple error sources (treatment, state, selection, measurement and sampling) on page 161.
^ Rosenbaum (2002, page 40) cites Section 5.7 (Permutation Tests), Theorem 2.3 (actually Theorem 3, page 184) of Lehmann's Testing Statistical Hypotheses (1959).
^ The F-test for the comparison of variances has a mixed reputation. It is not recommended as a hypothesis test to determine whether two different samples have the same variance. It is recommended for ANOVA where two estimates of the variance of the same sample are compared. While the F-test is not generally robust against departures from normality, it has been found to be robust in the special case of ANOVA. Citations from Moore & McCabe (2003): "Analysis of variance uses F statistics, but these are not the same as the F statistic for comparing two population standard deviations." (page 554) "The F test and other procedures for inference about variances are so lacking in robustness as to be of little use in practice." (page 556) "[The ANOVA F-test] is relatively insensitive to moderate nonnormality and unequal variances, especially when the sample sizes are similar." (page 763) ANOVA assumes homoscedasticity, but it is robust. The statistical test for homoscedasticity (the F-test) is not robust. Moore & McCabe recommend a rule of thumb.

Notes

^ Diez, David M; Barr, Christopher D; Cetinkaya-Rundel, Mine (2017). OpenIntro Statistics (3rd ed.). OpenIntro. Retrieved 11 November 2017.<style data-mw-deduplicate="TemplateStyles:r879151008">.mw-parser-output cite.citation{font-style:inherit}.mw-parser-output .citation q{quotes:"\"""\"""'""'"}.mw-parser-output .citation .cs1-lock-free a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/6/65/Lock-green.svg/9px-Lock-green.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .citation .cs1-lock-limited a,.mw-parser-output .citation .cs1-lock-registration a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Lock-gray-alt-2.svg/9px-Lock-gray-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .citation .cs1-lock-subscription a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Lock-red-alt-2.svg/9px-Lock-red-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration{color:#555}.mw-parser-output .cs1-subscription span,.mw-parser-output .cs1-registration span{border-bottom:1px dotted;cursor:help}.mw-parser-output .cs1-ws-icon a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/4/4c/Wikisource-logo.svg/12px-Wikisource-logo.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output code.cs1-code{color:inherit;background:inherit;border:inherit;padding:inherit}.mw-parser-output .cs1-hidden-error{display:none;font-size:100%}.mw-parser-output .cs1-visible-error{font-size:100%}.mw-parser-output .cs1-maint{display:none;color:#33aa33;margin-left:0.3em}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration,.mw-parser-output .cs1-format{font-size:95%}.mw-parser-output .cs1-kern-left,.mw-parser-output .cs1-kern-wl-left{padding-left:0.2em}.mw-parser-output .cs1-kern-right,.mw-parser-output .cs1-kern-wl-right{padding-right:0.2em}</style>
^ Stigler (1986)
^ Stigler (1986, p 134)
^ Stigler (1986, p 153)
^ Stigler (1986, pp 154–155)
^ Stigler (1986, pp 240–242)
^ Stigler (1986, Chapter 7 – Psychophysics as a Counterpoint)
^ Stigler (1986, p 253)
^ Stigler (1986, pp 314–315)
^ The Correlation Between Relatives on the Supposition of Mendelian Inheritance. Ronald A. Fisher. Philosophical Transactions of the Royal Society of Edinburgh. 1918. (volume 52, pages 399–433)
^ On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. Ronald A. Fisher. Metron, 1: 3–32 (1921)
^ Scheffé (1959, p 291, "Randomization models were first formulated by Neyman (1923) for the completely randomized design, by Neyman (1935) for randomized blocks, by Welch (1937) and Pitman (1937) for the Latin square under a certain null hypothesis, and by Kempthorne (1952, 1955) and Wilk (1955) for many other designs.")
^ Gelman (2005, p 2)
^ Howell (2002, p 320)
^ Montgomery (2001, p 63)
^ Gelman (2005, p 1)
^ Gelman (2005, p 5)
^ "Section 5.7. A Glossary of DOE Terminology". NIST Engineering Statistics handbook. NIST. Retrieved 5 April 2012.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
^ "Section 4.3.1 A Glossary of DOE Terminology". NIST Engineering Statistics handbook. NIST. Retrieved 14 Aug 2012.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
^ Montgomery (2001, Chapter 12: Experiments with random factors)
^ Gelman (2005, pp. 20–21)
^ Snedecor, George W.; Cochran, William G. (1967). Statistical Methods (6th ed.). p. 321.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
^ Cochran & Cox (1992, p 48)
^ Howell (2002, p 323)
^ Anderson, David R.; Sweeney, Dennis J.; Williams, Thomas A. (1996). Statistics for business and economics (6th ed.). Minneapolis/St. Paul: West Pub. Co. pp. 452–453. ISBN 978-0-314-06378-6.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
^ Anscombe (1948)
^ Kempthorne (1979, p 30)
^ ^a ^b Cox (1958, Chapter 2: Some Key Assumptions)
^ Hinkelmann and Kempthorne (2008, Volume 1, Throughout. Introduced in Section 2.3.3: Principles of experimental design; The linear model; Outline of a model)
^ Hinkelmann and Kempthorne (2008, Volume 1, Section 6.3: Completely Randomized Design; Derived Linear Model)
^ ^a ^b Hinkelmann and Kempthorne (2008, Volume 1, Section 6.6: Completely randomized design; Approximating the randomization test)
^ Bailey (2008, Chapter 2.14 "A More General Model" in Bailey, pp. 38–40)
^ Hinkelmann and Kempthorne (2008, Volume 1, Chapter 7: Comparison of Treatments)
^ Kempthorne (1979, pp 125–126, "The experimenter must decide which of the various causes that he feels will produce variations in his results must be controlled experimentally. Those causes that he does not control experimentally, because he is not cognizant of them, he must control by the device of randomization." "[O]nly when the treatments in the experiment are applied by the experimenter using the full randomization procedure is the chain of inductive inference sound. It is only under these circumstances that the experimenter can attribute whatever effects he observes to the treatment and the treatment only. Under these circumstances his conclusions are reliable in the statistical sense.")
^ Freedman^{[full citation needed]}
^ Montgomery (2001, Section 3.8: Discovering dispersion effects)
^ Hinkelmann and Kempthorne (2008, Volume 1, Section 6.10: Completely randomized design; Transformations)
^ Bailey (2008)
^ Montgomery (2001, Section 3-3: Experiments with a single factor: The analysis of variance; Analysis of the fixed effects model)
^ Cochran & Cox (1992, p 2 example)
^ Cochran & Cox (1992, p 49)
^ Hinkelmann and Kempthorne (2008, Volume 1, Section 6.7: Completely randomized design; CRD with unequal numbers of replications)
^ Moore and McCabe (2003, page 763)
^ ^a ^b ^c Gelman (2008)
^ ^a ^b Montgomery (2001, Section 5-2: Introduction to factorial designs; The advantages of factorials)
^ Belle (2008, Section 8.4: High-order interactions occur rarely)
^ Montgomery (2001, Section 5-1: Introduction to factorial designs; Basic definitions and principles)
^ Cox (1958, Chapter 6: Basic ideas about factorial experiments)
^ Montgomery (2001, Section 5-3.7: Introduction to factorial designs; The two-factor factorial design; One observation per cell)
^ Wilkinson (1999, p 596)
^ Montgomery (2001, Section 3-7: Determining sample size)
^ Howell (2002, Chapter 8: Power)
^ Howell (2002, Section 11.12: Power (in ANOVA))
^ Howell (2002, Section 13.7: Power analysis for factorial experiments)
^ Moore and McCabe (2003, pp 778–780)
^ ^a ^b ^c Wilkinson (1999, p 599)
^ Montgomery (2001, Section 3-4: Model adequacy checking)
^ Moore and McCabe (2003, p 755, Qualifications to this rule appear in a footnote.)
^ Montgomery (2001, Section 3-5.8: Experiments with a single factor: The analysis of variance; Practical interpretation of results; Comparing means with a control)
^ Hinkelmann and Kempthorne (2008, Volume 1, Section 7.5: Comparison of Treatments; Multiple Comparison Procedures)
^ Howell (2002, Chapter 12: Multiple comparisons among treatment means)
^ Montgomery (2001, Section 3-5: Practical interpretation of results)
^ Cochran & Cox (1957, p 9, "[T]he general rule [is] that the way in which the experiment is conducted determines not only whether inferences can be made, but also the calculations required to make them.")
^ One-way/single factor ANOVA. Biomedical Statistics Archived 7 November 2014 at the Wayback Machine
^ "The Probable Error of a Mean". Biometrika. 6: 1–25. 1908. doi:10.1093/biomet/6.1.1.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
^ Montgomery (2001, Section 3-3.4: Unbalanced data)
^ Montgomery (2001, Section 14-2: Unbalanced data in factorial design)
^ Wilkinson (1999, p 600)
^ Gelman (2005, p.1) (with qualification in the later text)
^ Montgomery (2001, Section 3.9: The Regression Approach to the Analysis of Variance)
^ Howell (2002, p 604)
^ Howell (2002, Chapter 18: Resampling and nonparametric approaches to data)
^ Montgomery (2001, Section 3-10: Nonparametric methods in the analysis of variance)

References

Anscombe, F. J. (1948). "The Validity of Comparative Experiments". Journal of the Royal Statistical Society. Series A (General). 111 (3): 181–211. doi:10.2307/2984159. JSTOR 2984159. MR 0030181.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Bailey, R. A. (2008). Design of Comparative Experiments. Cambridge University Press. ISBN 978-0-521-68357-9.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/> Pre-publication chapters are available on-line.
Belle, Gerald van (2008). Statistical rules of thumb (2nd ed.). Hoboken, N.J: Wiley. ISBN 978-0-470-14448-0.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Cochran, William G.; Cox, Gertrude M. (1992). Experimental designs (2nd ed.). New York: Wiley. ISBN 978-0-471-54567-5.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Cohen, Jacob (1988). Statistical power analysis for the behavior sciences (2nd ed.). Routledge <link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>ISBN 978-0-8058-0283-2
Cohen, Jacob (1992). "Statistics a power primer". Psychological Bulletin. 112 (1): 155–159. doi:10.1037/0033-2909.112.1.155. PMID 19565683.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Cox, David R. (1958). Planning of experiments. Reprinted as <link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>ISBN 978-0-471-57429-3
Cox, D. R. (2006). Principles of statistical inference. Cambridge New York: Cambridge University Press. ISBN 978-0-521-68567-2.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Freedman, David A.(2005). Statistical Models: Theory and Practice, Cambridge University Press. <link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>ISBN 978-0-521-67105-7
Gelman, Andrew (2005). "Analysis of variance? Why it is more important than ever". The Annals of Statistics. 33: 1–53. arXiv:math/0504499. doi:10.1214/009053604000001048.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Gelman, Andrew (2008). "Variance, analysis of". The new Palgrave dictionary of economics (2nd ed.). Basingstoke, Hampshire New York: Palgrave Macmillan. ISBN 978-0-333-78676-5.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Hinkelmann, Klaus & Kempthorne, Oscar (2008). Design and Analysis of Experiments. I and II (Second ed.). Wiley. ISBN 978-0-470-38551-7.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Howell, David C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA: Duxbury/Thomson Learning. ISBN 978-0-534-37770-0.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Kempthorne, Oscar (1979). The Design and Analysis of Experiments (Corrected reprint of (1952) Wiley ed.). Robert E. Krieger. ISBN 978-0-88275-105-4.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Lehmann, E.L. (1959) Testing Statistical Hypotheses. John Wiley & Sons.
Montgomery, Douglas C. (2001). Design and Analysis of Experiments (5th ed.). New York: Wiley. ISBN 978-0-471-31649-7.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Moore, David S. & McCabe, George P. (2003). Introduction to the Practice of Statistics (4e). W H Freeman & Co. <link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>ISBN 0-7167-9657-0
Rosenbaum, Paul R. (2002). Observational Studies (2nd ed.). New York: Springer-Verlag. <link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>ISBN 978-0-387-98967-9
Scheffé, Henry (1959). The Analysis of Variance. New York: Wiley.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Stigler, Stephen M. (1986). The history of statistics : the measurement of uncertainty before 1900. Cambridge, Mass: Belknap Press of Harvard University Press. ISBN 978-0-674-40340-6.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>
Wilkinson, Leland (1999). "Statistical Methods in Psychology Journals; Guidelines and Explanations". American Psychologist. 5 (8): 594–604. CiteSeerX 10.1.1.120.4818. doi:10.1037/0003-066X.54.8.594.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>

External links

SOCR ANOVA Activity and interactive applet.
Examples of all ANOVA and ANCOVA models with up to three treatment factors, including randomized block, split plot, repeated measures, and Latin squares, and their analysis in R (University of Southampton)
NIST/SEMATECH e-Handbook of Statistical Methods, section 7.4.3: "Are the means equal?"
Analysis of variance: Introduction
One Way & Two Way ANOVA Calculator

</raw>

</toggledisplay>

English Journal

Chemometric evaluation of trace metals in Prunus persica L. Batech and Malus domestica from Minićevo (Serbia).

Alagić SČ1, Tošić SB2, Dimitrijević MD3, Petrović JV4, Medić DV3.
Food chemistry.Food Chem.2017 Feb 15;217:568-75. doi: 10.1016/j.foodchem.2016.09.006. Epub 2016 Sep 6.
The samples of spatial soils and different organs of Prunus persica L. Batech and Malus domestica were analyzed by methods such as inductively coupled plasma optical emission spectroscopy (ICP-OES), Hierarchical Cluster Analysis (HCA), One-way ANOVA, and calculation of biological accumulation factor
PMID 27664673

Fluorescence spectroscopy and principal component analysis of soy protein hydrolysate fractions and the potential to assess their antioxidant capacity characteristics.

Ranamukhaarachchi SA1, Peiris RH1, Moresoli C2.
Food chemistry.Food Chem.2017 Feb 15;217:469-75. doi: 10.1016/j.foodchem.2016.08.029. Epub 2016 Aug 11.
The potential of intrinsic fluorescence and principal component analysis (PCA) to characterize the antioxidant capacity of soy protein hydrolysates (SPH) during sequential ultrafiltration (UF) and nanofiltration (NF) was evaluated. SPH was obtained by enzymatic hydrolysis of soy protein isolate. Ant
PMID 27664660

Time resolved fluorescence of cow and goat milk powder.

Brandao MP1, de Carvalho Dos Anjos V2, Bell MJ2.
Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.Spectrochim Acta A Mol Biomol Spectrosc.2017 Jan 15;171:193-199. doi: 10.1016/j.saa.2016.08.007. Epub 2016 Aug 8.
Milk powder is an international dairy commodity. Goat and cow milk powders are significant sources of nutrients and the investigation of the authenticity and classification of milk powder is particularly important. The use of time-resolved fluorescence techniques to distinguish chemical composition
PMID 27529767

Japanese Journal

特別養護老人ホームにおける組織構造と介護職員の離職に関する一考察

日本福祉大学健康科学論集 = The Journal of Health Sciences 19, 1-10, 2016-03-30
NAID 120005747620

継ぎ足と歩み足の柔道投げ技への効果：大腰，内股，大外刈りについて

秋田大学教育文化学部研究紀要教育科学 71, 51-57, 2016-03-01
NAID 120005741821

Impacts of Extensive Reading on Students at a University

梅光言語文化研究 (7), 19-31, 2016-03
NAID 40020778359

「分散分析」

　　[★]

英: analysis of variance、ANOVA

-分散分析

[1] Nakano Hiroyuki. “js-STAR 2012”. 2012年4月4日閲覧。

[2] Kiriki Kenshi (2002年). “ANOVA4 on the Web”. 2012年4月4日閲覧。

[3] 粒子計測研究室 NMIJ/AIST (2011年). “不確かさWeb 分散分析プログラム”. 2012年4月4日閲覧。

[4] 日本ニューメリカルアルゴリズムズグループ株式会社 (2012年). “Excel NAG 統計解析アドイン”. 2012年4月4日閲覧。

[5] Chris Rorden. “ezANOVA free statistical software”. 2012年4月4日閲覧。

[6] Bioconfuctor. “maanova”. 2012年4月4日閲覧。

[7] H.Akiba (1995年2月27日). “２因子多水準分散分析”. 2012年4月4日閲覧。

[8] 渡辺利夫. “R Language”. 2012年4月4日閲覧。

[1] Nakano Hiroyuki. “js-STAR 2012”. 2012年4月4日閲覧。

[2] Kiriki Kenshi (2002年). “ANOVA4 on the Web”. 2012年4月4日閲覧。

[3] 粒子計測研究室 NMIJ/AIST (2011年). “不確かさWeb 分散分析プログラム”. 2012年4月4日閲覧。

[4] 日本ニューメリカルアルゴリズムズグループ株式会社 (2012年). “Excel NAG 統計解析アドイン”. 2012年4月4日閲覧。

[5] Chris Rorden. “ezANOVA free statistical software”. 2012年4月4日閲覧。

[6] Bioconfuctor. “maanova”. 2012年4月4日閲覧。

[7] H.Akiba (1995年2月27日). “２因子多水準分散分析”. 2012年4月4日閲覧。

[8] 渡辺利夫 (2007年1月30日). “R Language”. 2012年4月4日閲覧。

[1] Nakano Hiroyuki. “js-STAR 2012”. 2012年4月4日閲覧。

[2] Kiriki Kenshi (2002年). “ANOVA4 on the Web”. 2012年4月4日閲覧。

[3] 粒子計測研究室 NMIJ/AIST (2011年). “不確かさWeb 分散分析プログラム”. 2012年4月4日閲覧。

[4] 日本ニューメリカルアルゴリズムズグループ株式会社 (2012年). “Excel NAG 統計解析アドイン”. 2012年4月4日閲覧。

[5] Chris Rorden. “ezANOVA free statistical software”. 2012年4月4日閲覧。

[6] Bioconfuctor. “maanova”. 2012年4月4日閲覧。

[7] H.Akiba (1995年2月27日). “２因子多水準分散分析”. 2012年4月4日閲覧。

[8] 渡辺利夫 (2007年1月30日). “R Language”. 2012年4月4日閲覧。

[1] Nakano Hiroyuki. “js-STAR 2012”. 2012年4月4日閲覧。

[2] Kiriki Kenshi (2002年). “ANOVA4 on the Web”. 2012年4月4日閲覧。

[3] 粒子計測研究室 NMIJ/AIST (2011年). “不確かさWeb 分散分析プログラム”. 2012年4月4日閲覧。

[4] 日本ニューメリカルアルゴリズムズグループ株式会社 (2012年). “Excel NAG 統計解析アドイン”. 2012年4月4日閲覧。

[5] Chris Rorden. “ezANOVA free statistical software”. 2012年4月4日閲覧。

[6] Bioconfuctor. “maanova”. 2012年4月4日閲覧。

[7] H.Akiba (1995年2月27日). “２因子多水準分散分析”. 2012年4月4日閲覧。

[8] 渡辺利夫 (2007年1月30日). “R Language”. 2012年4月4日閲覧。

[20] Randomization is a term used in multiple ways in this material. "Randomization has three roles in applications: as a device for eliminating biases, for example from unobserved explanatory variables and selection effects; as a basis for estimating standard errors; and as a foundation for formally exact significance tests." Cox (2006, page 192) Hinkelmann and Kempthorne use randomization both in experimental design and for statistical analysis.

[28] Unit-treatment additivity is simply termed additivity in most texts. Hinkelmann and Kempthorne add adjectives and distinguish between additivity in the strict and broad senses. This allows a detailed consideration of multiple error sources (treatment, state, selection, measurement and sampling) on page 161.

[45] Rosenbaum (2002, page 40) cites Section 5.7 (Permutation Tests), Theorem 2.3 (actually Theorem 3, page 184) of Lehmann's Testing Statistical Hypotheses (1959).

[47] The F-test for the comparison of variances has a mixed reputation. It is not recommended as a hypothesis test to determine whether two different samples have the same variance. It is recommended for ANOVA where two estimates of the variance of the same sample are compared. While the F-test is not generally robust against departures from normality, it has been found to be robust in the special case of ANOVA. Citations from Moore & McCabe (2003): "Analysis of variance uses F statistics, but these are not the same as the F statistic for comparing two population standard deviations." (page 554) "The F test and other procedures for inference about variances are so lacking in robustness as to be of little use in practice." (page 556) "[The ANOVA F-test] is relatively insensitive to moderate nonnormality and unequal variances, especially when the sample sizes are similar." (page 763) ANOVA assumes homoscedasticity, but it is robust. The statistical test for homoscedasticity (the F-test) is not robust. Moore & McCabe recommend a rule of thumb.

[OpenIntro_Statistics-1] Diez, David M; Barr, Christopher D; Cetinkaya-Rundel, Mine (2017). OpenIntro Statistics (3rd ed.). OpenIntro. Retrieved 11 November 2017.

[2] Stigler (1986)

[3] Stigler (1986, p 134)

[4] Stigler (1986, p 153)

[5] Stigler (1986, pp 154–155)

[6] Stigler (1986, pp 240–242)

[7] Stigler (1986, Chapter 7 – Psychophysics as a Counterpoint)

[8] Stigler (1986, p 253)

[9] Stigler (1986, pp 314–315)

[10] The Correlation Between Relatives on the Supposition of Mendelian Inheritance. Ronald A. Fisher. Philosophical Transactions of the Royal Society of Edinburgh. 1918. (volume 52, pages 399–433)

[11] On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. Ronald A. Fisher. Metron, 1: 3–32 (1921)

[12] Scheffé (1959, p 291, "Randomization models were first formulated by Neyman (1923) for the completely randomized design, by Neyman (1935) for randomized blocks, by Welch (1937) and Pitman (1937) for the Latin square under a certain null hypothesis, and by Kempthorne (1952, 1955) and Wilk (1955) for many other designs.")

[13] Gelman (2005, p 2)

[14] Howell (2002, p 320)

[15] Montgomery (2001, p 63)

[16] Gelman (2005, p 1)

[17] Gelman (2005, p 5)

[18] "Section 5.7. A Glossary of DOE Terminology". NIST Engineering Statistics handbook. NIST. Retrieved 5 April 2012.

[19] "Section 4.3.1 A Glossary of DOE Terminology". NIST Engineering Statistics handbook. NIST. Retrieved 14 Aug 2012.

[21] Montgomery (2001, Chapter 12: Experiments with random factors)

[22] Gelman (2005, pp. 20–21)

[23] Snedecor, George W.; Cochran, William G. (1967). Statistical Methods (6th ed.). p. 321.

[24] Cochran & Cox (1992, p 48)

[25] Howell (2002, p 323)

[26] Anderson, David R.; Sweeney, Dennis J.; Williams, Thomas A. (1996). Statistics for business and economics (6th ed.). Minneapolis/St. Paul: West Pub. Co. pp. 452–453. ISBN 978-0-314-06378-6.

[27] Anscombe (1948)

[29] Kempthorne (1979, p 30)

[Cox-30] Cox (1958, Chapter 2: Some Key Assumptions)

[31] Hinkelmann and Kempthorne (2008, Volume 1, Throughout. Introduced in Section 2.3.3: Principles of experimental design; The linear model; Outline of a model)

[32] Hinkelmann and Kempthorne (2008, Volume 1, Section 6.3: Completely Randomized Design; Derived Linear Model)

[HinkelmannKempthorne-33] Hinkelmann and Kempthorne (2008, Volume 1, Section 6.6: Completely randomized design; Approximating the randomization test)

[34] Bailey (2008, Chapter 2.14 "A More General Model" in Bailey, pp. 38–40)

[35] Hinkelmann and Kempthorne (2008, Volume 1, Chapter 7: Comparison of Treatments)

[36] Kempthorne (1979, pp 125–126, "The experimenter must decide which of the various causes that he feels will produce variations in his results must be controlled experimentally. Those causes that he does not control experimentally, because he is not cognizant of them, he must control by the device of randomization." "[O]nly when the treatments in the experiment are applied by the experimenter using the full randomization procedure is the chain of inductive inference sound. It is only under these circumstances that the experimenter can attribute whatever effects he observes to the treatment and the treatment only. Under these circumstances his conclusions are reliable in the statistical sense.")

[37] Freedman^{[full citation needed]}

[38] Montgomery (2001, Section 3.8: Discovering dispersion effects)

[39] Hinkelmann and Kempthorne (2008, Volume 1, Section 6.10: Completely randomized design; Transformations)

[40] Bailey (2008)

[41] Montgomery (2001, Section 3-3: Experiments with a single factor: The analysis of variance; Analysis of the fixed effects model)

[42] Cochran & Cox (1992, p 2 example)

[43] Cochran & Cox (1992, p 49)

[44] Hinkelmann and Kempthorne (2008, Volume 1, Section 6.7: Completely randomized design; CRD with unequal numbers of replications)

[46] Moore and McCabe (2003, page 763)

[Gelman-48] Gelman (2008)

[Montgomery-49] Montgomery (2001, Section 5-2: Introduction to factorial designs; The advantages of factorials)

[50] Belle (2008, Section 8.4: High-order interactions occur rarely)

[51] Montgomery (2001, Section 5-1: Introduction to factorial designs; Basic definitions and principles)

[52] Cox (1958, Chapter 6: Basic ideas about factorial experiments)

[53] Montgomery (2001, Section 5-3.7: Introduction to factorial designs; The two-factor factorial design; One observation per cell)

[54] Wilkinson (1999, p 596)

[55] Montgomery (2001, Section 3-7: Determining sample size)

[56] Howell (2002, Chapter 8: Power)

[57] Howell (2002, Section 11.12: Power (in ANOVA))

[58] Howell (2002, Section 13.7: Power analysis for factorial experiments)

[59] Moore and McCabe (2003, pp 778–780)

[Wilkinson-60] Wilkinson (1999, p 599)

[61] Montgomery (2001, Section 3-4: Model adequacy checking)

[62] Moore and McCabe (2003, p 755, Qualifications to this rule appear in a footnote.)

[63] Montgomery (2001, Section 3-5.8: Experiments with a single factor: The analysis of variance; Practical interpretation of results; Comparing means with a control)

[64] Hinkelmann and Kempthorne (2008, Volume 1, Section 7.5: Comparison of Treatments; Multiple Comparison Procedures)

[65] Howell (2002, Chapter 12: Multiple comparisons among treatment means)

[66] Montgomery (2001, Section 3-5: Practical interpretation of results)

[67] Cochran & Cox (1957, p 9, "[T]he general rule [is] that the way in which the experiment is conducted determines not only whether inferences can be made, but also the calculations required to make them.")

[68] One-way/single factor ANOVA. Biomedical Statistics Archived 7 November 2014 at the Wayback Machine

[69] "The Probable Error of a Mean". Biometrika. 6: 1–25. 1908. doi:10.1093/biomet/6.1.1.

[70] Montgomery (2001, Section 3-3.4: Unbalanced data)

[71] Montgomery (2001, Section 14-2: Unbalanced data in factorial design)

[72] Wilkinson (1999, p 600)

[73] Gelman (2005, p.1) (with qualification in the later text)

[74] Montgomery (2001, Section 3.9: The Regression Approach to the Analysis of Variance)

[75] Howell (2002, p 604)

[76] Howell (2002, Chapter 18: Resampling and nonparametric approaches to data)

[77] Montgomery (2001, Section 3-10: Nonparametric methods in the analysis of variance)

[20] Randomization is a term used in multiple ways in this material. "Randomization has three roles in applications: as a device for eliminating biases, for example from unobserved explanatory variables and selection effects; as a basis for estimating standard errors; and as a foundation for formally exact significance tests." Cox (2006, page 192) Hinkelmann and Kempthorne use randomization both in experimental design and for statistical analysis.

[28] Unit-treatment additivity is simply termed additivity in most texts. Hinkelmann and Kempthorne add adjectives and distinguish between additivity in the strict and broad senses. This allows a detailed consideration of multiple error sources (treatment, state, selection, measurement and sampling) on page 161.

[45] Rosenbaum (2002, page 40) cites Section 5.7 (Permutation Tests), Theorem 2.3 (actually Theorem 3, page 184) of Lehmann's Testing Statistical Hypotheses (1959).

[47] The F-test for the comparison of variances has a mixed reputation. It is not recommended as a hypothesis test to determine whether two different samples have the same variance. It is recommended for ANOVA where two estimates of the variance of the same sample are compared. While the F-test is not generally robust against departures from normality, it has been found to be robust in the special case of ANOVA. Citations from Moore & McCabe (2003): "Analysis of variance uses F statistics, but these are not the same as the F statistic for comparing two population standard deviations." (page 554) "The F test and other procedures for inference about variances are so lacking in robustness as to be of little use in practice." (page 556) "[The ANOVA F-test] is relatively insensitive to moderate nonnormality and unequal variances, especially when the sample sizes are similar." (page 763) ANOVA assumes homoscedasticity, but it is robust. The statistical test for homoscedasticity (the F-test) is not robust. Moore & McCabe recommend a rule of thumb.

[OpenIntro_Statistics-1] Diez, David M; Barr, Christopher D; Cetinkaya-Rundel, Mine (2017). OpenIntro Statistics (3rd ed.). OpenIntro. Retrieved 11 November 2017.<style data-mw-deduplicate="TemplateStyles:r879151008">.mw-parser-output cite.citation{font-style:inherit}.mw-parser-output .citation q{quotes:"\"""\"""'""'"}.mw-parser-output .citation .cs1-lock-free a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/6/65/Lock-green.svg/9px-Lock-green.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .citation .cs1-lock-limited a,.mw-parser-output .citation .cs1-lock-registration a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Lock-gray-alt-2.svg/9px-Lock-gray-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .citation .cs1-lock-subscription a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Lock-red-alt-2.svg/9px-Lock-red-alt-2.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration{color:#555}.mw-parser-output .cs1-subscription span,.mw-parser-output .cs1-registration span{border-bottom:1px dotted;cursor:help}.mw-parser-output .cs1-ws-icon a{background:url("//upload.wikimedia.org/wikipedia/commons/thumb/4/4c/Wikisource-logo.svg/12px-Wikisource-logo.svg.png")no-repeat;background-position:right .1em center}.mw-parser-output code.cs1-code{color:inherit;background:inherit;border:inherit;padding:inherit}.mw-parser-output .cs1-hidden-error{display:none;font-size:100%}.mw-parser-output .cs1-visible-error{font-size:100%}.mw-parser-output .cs1-maint{display:none;color:#33aa33;margin-left:0.3em}.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration,.mw-parser-output .cs1-format{font-size:95%}.mw-parser-output .cs1-kern-left,.mw-parser-output .cs1-kern-wl-left{padding-left:0.2em}.mw-parser-output .cs1-kern-right,.mw-parser-output .cs1-kern-wl-right{padding-right:0.2em}</style>

[2] Stigler (1986)

[3] Stigler (1986, p 134)

[4] Stigler (1986, p 153)

[5] Stigler (1986, pp 154–155)

[6] Stigler (1986, pp 240–242)

[7] Stigler (1986, Chapter 7 – Psychophysics as a Counterpoint)

[8] Stigler (1986, p 253)

[9] Stigler (1986, pp 314–315)

[10] The Correlation Between Relatives on the Supposition of Mendelian Inheritance. Ronald A. Fisher. Philosophical Transactions of the Royal Society of Edinburgh. 1918. (volume 52, pages 399–433)

[11] On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. Ronald A. Fisher. Metron, 1: 3–32 (1921)

[12] Scheffé (1959, p 291, "Randomization models were first formulated by Neyman (1923) for the completely randomized design, by Neyman (1935) for randomized blocks, by Welch (1937) and Pitman (1937) for the Latin square under a certain null hypothesis, and by Kempthorne (1952, 1955) and Wilk (1955) for many other designs.")

[13] Gelman (2005, p 2)

[14] Howell (2002, p 320)

[15] Montgomery (2001, p 63)

[16] Gelman (2005, p 1)

[17] Gelman (2005, p 5)

[18] "Section 5.7. A Glossary of DOE Terminology". NIST Engineering Statistics handbook. NIST. Retrieved 5 April 2012.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>

[19] "Section 4.3.1 A Glossary of DOE Terminology". NIST Engineering Statistics handbook. NIST. Retrieved 14 Aug 2012.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>

[21] Montgomery (2001, Chapter 12: Experiments with random factors)

[22] Gelman (2005, pp. 20–21)

[23] Snedecor, George W.; Cochran, William G. (1967). Statistical Methods (6th ed.). p. 321.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>

[24] Cochran & Cox (1992, p 48)

[25] Howell (2002, p 323)

[26] Anderson, David R.; Sweeney, Dennis J.; Williams, Thomas A. (1996). Statistics for business and economics (6th ed.). Minneapolis/St. Paul: West Pub. Co. pp. 452–453. ISBN 978-0-314-06378-6.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>

[27] Anscombe (1948)

[29] Kempthorne (1979, p 30)

[Cox-30] Cox (1958, Chapter 2: Some Key Assumptions)

[31] Hinkelmann and Kempthorne (2008, Volume 1, Throughout. Introduced in Section 2.3.3: Principles of experimental design; The linear model; Outline of a model)

[32] Hinkelmann and Kempthorne (2008, Volume 1, Section 6.3: Completely Randomized Design; Derived Linear Model)

[HinkelmannKempthorne-33] Hinkelmann and Kempthorne (2008, Volume 1, Section 6.6: Completely randomized design; Approximating the randomization test)

[34] Bailey (2008, Chapter 2.14 "A More General Model" in Bailey, pp. 38–40)

[35] Hinkelmann and Kempthorne (2008, Volume 1, Chapter 7: Comparison of Treatments)

[36] Kempthorne (1979, pp 125–126, "The experimenter must decide which of the various causes that he feels will produce variations in his results must be controlled experimentally. Those causes that he does not control experimentally, because he is not cognizant of them, he must control by the device of randomization." "[O]nly when the treatments in the experiment are applied by the experimenter using the full randomization procedure is the chain of inductive inference sound. It is only under these circumstances that the experimenter can attribute whatever effects he observes to the treatment and the treatment only. Under these circumstances his conclusions are reliable in the statistical sense.")

[37] Freedman^{[full citation needed]}

[38] Montgomery (2001, Section 3.8: Discovering dispersion effects)

[39] Hinkelmann and Kempthorne (2008, Volume 1, Section 6.10: Completely randomized design; Transformations)

[40] Bailey (2008)

[41] Montgomery (2001, Section 3-3: Experiments with a single factor: The analysis of variance; Analysis of the fixed effects model)

[42] Cochran & Cox (1992, p 2 example)

[43] Cochran & Cox (1992, p 49)

[44] Hinkelmann and Kempthorne (2008, Volume 1, Section 6.7: Completely randomized design; CRD with unequal numbers of replications)

[46] Moore and McCabe (2003, page 763)

[Gelman-48] Gelman (2008)

[Montgomery-49] Montgomery (2001, Section 5-2: Introduction to factorial designs; The advantages of factorials)

[50] Belle (2008, Section 8.4: High-order interactions occur rarely)

[51] Montgomery (2001, Section 5-1: Introduction to factorial designs; Basic definitions and principles)

[52] Cox (1958, Chapter 6: Basic ideas about factorial experiments)

[53] Montgomery (2001, Section 5-3.7: Introduction to factorial designs; The two-factor factorial design; One observation per cell)

[54] Wilkinson (1999, p 596)

[55] Montgomery (2001, Section 3-7: Determining sample size)

[56] Howell (2002, Chapter 8: Power)

[57] Howell (2002, Section 11.12: Power (in ANOVA))

[58] Howell (2002, Section 13.7: Power analysis for factorial experiments)

[59] Moore and McCabe (2003, pp 778–780)

[Wilkinson-60] Wilkinson (1999, p 599)

[61] Montgomery (2001, Section 3-4: Model adequacy checking)

[62] Moore and McCabe (2003, p 755, Qualifications to this rule appear in a footnote.)

[63] Montgomery (2001, Section 3-5.8: Experiments with a single factor: The analysis of variance; Practical interpretation of results; Comparing means with a control)

[64] Hinkelmann and Kempthorne (2008, Volume 1, Section 7.5: Comparison of Treatments; Multiple Comparison Procedures)

[65] Howell (2002, Chapter 12: Multiple comparisons among treatment means)

[66] Montgomery (2001, Section 3-5: Practical interpretation of results)

[67] Cochran & Cox (1957, p 9, "[T]he general rule [is] that the way in which the experiment is conducted determines not only whether inferences can be made, but also the calculations required to make them.")

[68] One-way/single factor ANOVA. Biomedical Statistics Archived 7 November 2014 at the Wayback Machine

[69] "The Probable Error of a Mean". Biometrika. 6: 1–25. 1908. doi:10.1093/biomet/6.1.1.<link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r879151008"/>

[70] Montgomery (2001, Section 3-3.4: Unbalanced data)

[71] Montgomery (2001, Section 14-2: Unbalanced data in factorial design)

[72] Wilkinson (1999, p 600)

[73] Gelman (2005, p.1) (with qualification in the later text)

[74] Montgomery (2001, Section 3.9: The Regression Approach to the Analysis of Variance)

[75] Howell (2002, p 604)

[76] Howell (2002, Chapter 18: Resampling and nonparametric approaches to data)

[77] Montgomery (2001, Section 3-10: Nonparametric methods in the analysis of variance)

v t e Design of experiments
Scientific method	Scientific experiment Statistical design Control Internal and external validity Experimental unit Blinding Optimal design: Bayesian Random assignment Randomization Restricted randomization Replication versus subsampling Sample size
Treatment and blocking	Treatment Effect size Contrast Interaction Confounding Orthogonality Blocking Covariate Nuisance variable
Models and inference	Linear regression Ordinary least squares Bayesian Random effect Mixed model Hierarchical model: Bayesian Analysis of variance (Anova) Cochran's theorem Manova (multivariate) Ancova (covariance) Compare means Multiple comparison
Designs Completely randomized	Factorial Fractional factorial Plackett-Burman Taguchi Response surface methodology Polynomial and rational modeling Box-Behnken Central composite Block Generalized randomized block design (GRBD) Latin square Graeco-Latin square Orthogonal array Latin hypercube Repeated measures design Crossover study Randomized controlled trial Sequential analysis Sequential probability ratio test
Glossary Category Statistics portal Statistical outline Statistical topics

v t e Design of experiments
Scientific method	Scientific experiment Statistical design Control Internal and external validity Experimental unit Blinding Optimal design: Bayesian Random assignment Randomization Restricted randomization Replication versus subsampling Sample size
Treatment and blocking	Treatment Effect size Contrast Interaction Confounding Orthogonality Blocking Covariate Nuisance variable
Models and inference	Linear regression Ordinary least squares Bayesian Random effect Mixed model Hierarchical model: Bayesian Analysis of variance (Anova) Cochran's theorem Manova (multivariate) Ancova (covariance) Compare means Multiple comparison
Designs Completely randomized	Factorial Fractional factorial Plackett-Burman Taguchi Response surface methodology Polynomial and rational modeling Box-Behnken Central composite Block Generalized randomized block design (GRBD) Latin square Graeco-Latin square Orthogonal array Latin hypercube Repeated measures design Crossover study Randomized controlled trial Sequential analysis Sequential probability ratio test
Glossary Category Statistics portal Statistical outline Statistical topics

匿名

検索

案内

ANOVA

Wikipedia preview

wiki ja

目次

ソフトウェア

脚注

関連項目

外部リンク

wiki en

Contents

History

Motivating example

Background and terminology

Design-of-experiments terms

Fixed-effects models

Random-effects models

Mixed-effects models

Assumptions of ANOVA

Textbook analysis using a normal distribution

Randomization-based analysis

Unit-treatment additivity

Derived linear model

Statistical models for observational data

Summary of assumptions

Characteristics of ANOVA

Logic of ANOVA

Partitioning of the sum of squares

The F-test

Extended logic

ANOVA for a single factor

ANOVA for multiple factors

Worked numeric examples

Associated analysis

Preparatory analysis

The number of experimental units

Power analysis

Effect size

Follow-up analysis

Model confirmation

Follow-up tests

Study designs and ANOVAs

ANOVA cautions

Generalizations

Connection to Linear Regression

目次

ソフトウェア

脚注

関連項目

外部リンク

wiki en

Contents

History

Motivating example

Background and terminology

Design-of-experiments terms

Classes of models

Fixed-effects models

Random-effects models

Mixed-effects models

Assumptions of ANOVA

Textbook analysis using a normal distribution

Randomization-based analysis

Unit-treatment additivity

Derived linear model

Statistical models for observational data

Summary of assumptions

Characteristics of ANOVA

Logic of ANOVA

Partitioning of the sum of squares

The F-test

Extended logic

ANOVA for a single factor

ANOVA for multiple factors

Worked numeric examples

Associated analysis

Preparatory analysis

The number of experimental units