## 24 Jun Describe the characteristics of the normal curve and explain why the curve, in sample distributions, never perfectly matches the normal curve. Why is the bell curve used t

Length: 4 to 6 pages not including title page and reference page.

References: Include a minimum of 3 scholarly resources.

Your paper should demonstrate thoughtful consideration of the ideas and concepts presented in the course and provide new thoughts and insights relating directly to this topic. Your response should reflect scholarly writing and current APA standards. Be sure to adhere to Northcentral University's Academic Integrity Policy.

Frank, J., & Klar, B. (2016). Methods to test for equality of two normal distributions. Statistical Methods and Applications, 25(4), 581-599

Li, J. C.-H. (2016). Effect size measures in a two-independent-samples case with nonnormal and nonhomogeneous data. Behavior Research Methods

Introduction to Business Statistics (7th ed.)

NCU School of Business Best Practice Guide for Quantitative Research Design and Methods in Dissertationse

Week 4 – Assignment: Apply the Normal Distribution

Instructions

For this week’s assignment, you will present your answers to the following questions in a formal paper format. Please separate each question with a short heading, e.g., normal curve, bell curve, etc.

Begin with a brief introduction in which you explain the importance of normal distribution.

Next, address the following questions in order:

Describe the characteristics of the normal curve and explain why the curve, in sample distributions, never perfectly matches the normal curve.

Why is the bell curve used to represent the normal distribution? Why not a different shape?

Why is the central limit theorem important in statistics?

What does the central limit theorem inform us about the sampling distribution of the sample means?

Imagine that you recently took an exam for certification in your field. The certifying agency has published the results of the exam and 75% of the test takers in your group scored below the average. In a normal distribution, half of the scores would fall above the mean and the other half below. How can what the certifying agency published be true?

Why do researchers use z-scores to determine probabilities? What are the advantages to using z-scores?

Conclude with a brief discussion of how the concept of probability might affect research that you might undertake in your dissertation project. In other words, how would a basic understanding of probability concepts aid you in analyzing and interpreting data?

Length: 4 to 6 pages not including title page and reference page.

References: Include a minimum of 3 scholarly resources.

Your paper should demonstrate thoughtful consideration of the ideas and concepts presented in the course and provide new thoughts and insights relating directly to this topic. Your response should reflect scholarly writing and current APA standards. Be sure to adhere to Northcentral University's Academic Integrity Policy.

Frank, J., & Klar, B. (2016). Methods to test for equality of two normal distributions. Statistical Methods and Applications, 25(4), 581-599

Li, J. C.-H. (2016). Effect size measures in a two-independent-samples case with nonnormal and nonhomogeneous data. Behavior Research Methods

Introduction to Business Statistics (7th ed.)

NCU School of Business Best Practice Guide for Quantitative Research Design and Methods in Dissertationse

,

6/23/22, 3:09 PM BUS-7105 v3: Statistics I (7103872203) – BUS-7105 v3: Statistics I (7103872203)

https://ncuone.ncu.edu/d2l/le/content/258948/printsyllabus/PrintSyllabus 1/7

Week 4

BUS-7105 v3: Statistics I (7103872203)

Normal Distribution and the Central Limit Theorem

Determining whether or not a phenomenon exists, in a statistical sense, is based on the

principles of probability.

Given a set of data gathered during a study we might ask, “What is the probability of X?”

We use this decision rule to analyze the data.

More specifically, say a manager is interested in her workers’ intentions to quit their jobs.

Knowing this is important to planning staffing needs. She might suggest, based on

observation, or perhaps a literature review, that given the climate of the work

environment (e.g., hours, how demanding the jobs are, etc.) 20% of her employees would

quit their jobs if a reasonable alternative surfaced. She could ask her workers,

anonymously, if they are currently searching for another position and compare the results

to see if her 20% prediction is accurate. Armed with those data she might initiate training

to better support her workers.

Traditional parametric statistics tools are based on the assumption that all data are

“normally distributed.” In other words, they fall under a bell curve where there is the

probability that a few instances occur at the high end of the scale and a few at the low

end with the remaining instances occurring around the middle.

For example, the manager above may be interested also in levels of performance at work.

She may define this as the number of performance errors that her workers make on a

monthly basis. Her hypothesis might be that her workers’ performance is normally

distributed. She could observe her workers over a month and plot their error rates. If her

workers' error rates are “normally distributed,” there would be a few who commit more

than the average number of errors and a few who commit almost no errors. The remaining

people would fall around the average or the high point of the curve.

The normal distribution is perhaps the most common distribution used in business

applications. It is easy to recognize a normal distribution because when a histogram is

constructed from normally distributed data, the shape is "bell-shaped." The distribution

occurs when studying characteristics of people and animals, such as height, weight, and

6/23/22, 3:09 PM BUS-7105 v3: Statistics I (7103872203) – BUS-7105 v3: Statistics I (7103872203)

https://ncuone.ncu.edu/d2l/le/content/258948/printsyllabus/PrintSyllabus 2/7

IQ. It also arises when studying measurement errors as well as in many theoretic

situations related to hypothesis testing.

In the real world of research and data analysis, our samples seldom fall within a normal

distribution. However, the rules of statistics will work even in these skewed samples.

Most behavioral phenomena are normally distributed at the population level even if

samples may be somewhat skewed (e.g., too many outliers on one end, or tail, of the

distribution). Given the reality of a normal distribution at the population level, we can

generally trust our findings even in a skewed sample, provided it was randomly

assembled.

The normal distribution also arises in many business applications. For example, if a

histogram is made of the daily percentage changes in stock prices, it is usually normal. In a

production process, suppose that measurements are taken of a critical aspect, and then a

histogram is created. If the process is working properly, then the measurements should

be normally distributed. But if too much variation, such as poor machine adjustments, or

untrained operators, causes the process to produce defects, this can be detected in a

histogram that distorts a bell-shaped curve.

The following images are depictions of four common distributions found in sample

research. We noted in the introduction to Section 2 that distributions are fairly normal at

the population level, but not so much at the sample level. The images on the left are

histograms from datasets drawn from samples that follow somewhat normal distributions.

The image on the upper right represents something close to bimodal or having two

modes. It is possible that a dataset can have many modes. The histogram on the bottom

right represents a positive or right skew. The distribution is skewed to the right because

of a range of outlying values at the high, or positive, end of the scale. It is also possible to

have a left skew for which outliers at the low, or negative, end of the scale are pulling the

tail of the distribution to the left. An example of a right skew is measuring average salary

across an entire firm and the CEO is paid ten times the rest of the workers. An example of

a left skew is a typical grading distribution for a graduate classroom. Most people who

enroll in graduate programs are motivated and want to do well so they, collectively, earn a

lot of A’s and B’s. Typically grades of C and lower are not considered passing so there is

motivation to do well. Only in extreme instances do we see C’s, D’s, and F’s in the

graduate classroom.

6/23/22, 3:09 PM BUS-7105 v3: Statistics I (7103872203) – BUS-7105 v3: Statistics I (7103872203)

https://ncuone.ncu.edu/d2l/le/content/258948/printsyllabus/PrintSyllabus 3/7

Figure 3. Four common distributions found in sample research

One of the most important and best-known facts about a normal distribution is something

called the "Empirical Rule." The rule gives the approximate areas beneath the normal

curve. Given such a curve we assume that the total area under the curve is equal to 1 or

100%. The rule tells us that, given a normal (bell-shaped) distribution with mean m and

standard deviation s:

68.3% of the total area is between m – s and m + s

95.4% of the total area is between m – 2s and m + 2s

99.7% of the total area is between m – 3s and m + 3s

So, in a normal distribution, the standard deviation has a very specific meaning, as noted

above. But in reality, when working with non-normally distributed sample data, the

standard deviation is less readily interpretable and more useful in informing other

hypothesis testing formulas.

6/23/22, 3:09 PM BUS-7105 v3: Statistics I (7103872203) – BUS-7105 v3: Statistics I (7103872203)

https://ncuone.ncu.edu/d2l/le/content/258948/printsyllabus/PrintSyllabus 4/7

Figure 4. The normal distribution and the empirical rule.

The Central Limit Theorem tells us that a sampling distribution of the mean for an

independent and random variable will be normal if the sample size is large enough. In

other words, if we take a population of 100,000 people, for example, and pull every

possible sample of 30 from it, calculate the means of each sample and plot them under a

distribution curve, the result will be a near-normal distribution. There is plenty of

evidence that the math will work correctly. This principle is what gives us confidence that

traditional statistical tools will function meaningfully in sample research.

The question then becomes, how large a sample is large enough to make statistics work?

The answer to this depends on two circumstances. First, we must ask whether or not the

population is normally distributed? Research findings indicate that personality, and most

attitudinal, measures are near-normal at the population level. This may or may not hold

true for other measurable phenomena. Secondly, there are requirements for how

accurately the sample resembles the population. Random selection or assignment can

help with this, but the reality is that in sample research we often use who and what we

have available. Therefore, the question becomes more salient.

Most statisticians and textbook authors will indicate that a sample size of 30 (n=30) is an

adequate sample size when pulling from a population that is near-normally distributed. If

we know the population to be non-normal, we should then gather as many subjects as

possible to mitigate this.

6/23/22, 3:09 PM BUS-7105 v3: Statistics I (7103872203) – BUS-7105 v3: Statistics I (7103872203)

https://ncuone.ncu.edu/d2l/le/content/258948/printsyllabus/PrintSyllabus 5/7

Books and Resources for this Week

Frank, J., & Klar, B. (2016). Methods to

test for equality of two normal

distributions. Statistical Methods and

Applications, 25(4), 581-599. Link

Central Limit Theorem (CLT)Central Limit Theorem (CLT)

Mean Median Mode

Launch in a separate window

Be sure to review this week's resources carefully. You are expected to apply the

information from these resources when you prepare your assignments.

80 % 4 of 5 topics complete

6/23/22, 3:09 PM BUS-7105 v3: Statistics I (7103872203) – BUS-7105 v3: Statistics I (7103872203)

https://ncuone.ncu.edu/d2l/le/content/258948/printsyllabus/PrintSyllabus 6/7

Li, J. C.-H. (2016). Effect size measures

in a two-independent-samples case

with nonnormal and nonhomogeneous

data. Behavior Research Methods… Link

Introduction to Business Statistics (7th

ed.) External Learning Tool

NCU School of Business Best Practice

Guide for Quantitative Research Design

and Methods in Dissertationse Link

Week 4 – Assignment: Apply the Normal Distribution Assignment

Due June 26 at 11:59 PM

For this week’s assignment, you will present your answers to the following questions in a

formal paper format. Please separate each question with a short heading, e.g., normal

curve, bell curve, etc.

Begin with a brief introduction in which you explain the importance of normal

distribution.

Next, address the following questions in order:

Describe the characteristics of the normal curve and explain why the curve, in

sample distributions, never perfectly matches the normal curve.

Why is the bell curve used to represent the normal distribution? Why not a

different shape?

Why is the central limit theorem important in statistics?

What does the central limit theorem inform us about the sampling distribution of

the sample means?

Imagine that you recently took an exam for certification in your field. The certifying

agency has published the results of the exam and 75% of the test takers in your

group scored below the average. In a normal distribution, half of the scores would

6/23/22, 3:09 PM BUS-7105 v3: Statistics I (7103872203) – BUS-7105 v3: Statistics I (7103872203)

https://ncuone.ncu.edu/d2l/le/content/258948/printsyllabus/PrintSyllabus 7/7

fall above the mean and the other half below. How can what the certifying agency

published be true?

Why do researchers use z-scores to determine probabilities? What are the

advantages to using z-scores?

Conclude with a brief discussion of how the concept of probability might affect research

that you might undertake in your dissertation project. In other words, how would a basic

understanding of probability concepts aid you in analyzing and interpreting data?

Length: 4 to 6 pages not including title page and reference page.

References: Include a minimum of 3 scholarly resources.

Your paper should demonstrate thoughtful consideration of the ideas and concepts

presented in the course and provide new thoughts and insights relating directly to this

topic. Your response should reflect scholarly writing and current APA standards. Be sure

to adhere to Northcentral University's Academic Integrity Policy.

Upload your document and click the Submit to Dropbox button.

,

Stat Methods Appl (2016) 25:581–599 DOI 10.1007/s10260-016-0353-z

ORIGINAL PAPER

Methods to test for equality of two normal distributions

Julian Frank1 · Bernhard Klar1

Accepted: 18 January 2016 / Published online: 29 January 2016 © Springer-Verlag Berlin Heidelberg 2016

Abstract Statistical tests for two independent samples under the assumption of nor- mality are applied routinely by most practitioners of statistics. Likewise, presumably each introductory course in statistics treats some statistical procedures for two inde- pendent normal samples. Often, the classical two-sample model with equal variances is introduced, emphasizing that a test for equality of the expected values is a test for equality of both distributions as well, which is the actual goal. In a second step, usually the assumption of equal variances is discarded. The two-sample t test with Welch correction and the F test for equality of variances are introduced. The first test is solely treated as a test for the equality of central location, as well as the second as a test for the equality of scatter. Typically, there is no discussion if and to which extent testing for equality of the underlying normal distributions is possible, which is quite unsatisfactorily regarding the motivation and treatment of the situation with equal variances. It is the aim of this article to investigate the problem of testing for equality of two normal distributions, and to do so using knowledge and methods adequate to statistical practitioners as well as to students in an introductory statistics course. The power of the different tests discussed in the article is examined empirically. Finally, we apply the tests to several real data sets to illustrate their performance. In particular, we consider several data sets arising from intelligence tests since there is a large body of research supporting the existence of sex differences in mean scores or in variability in specific cognitive abilities.

Keywords Fisher combination method · Minimum combination method · Likelihood ratio test · Two-sample model

B Bernhard Klar [email protected]

1 Department of Mathematics, Karlsruhe Institute of Technology (KIT), Englerstr. 2, 76199 Karlsruhe, Germany

123

582 J. Frank, B. Klar

1 Introduction

Statistical tests for two independent samples under the assumption of normality are applied routinely by most practitioners of statistics. Likewise, statistical inference for two independent normal samples is of great relevance in every introductory statistics course. There, the approach is often quite similar: First, the importance of shift models is stated, motivating the classical two-sample model with equal variances (see, e.g., Bickel and Doksum 2006, page 4). The ultimate aim is to compare both distributions. If normality is assumed, this corresponds to a test for equality of the expected values, i.e. Student’s t test. In a second step, usually the assumption of equal variances is discarded. The two-sample t test with Welch correction is introduced, however, at most times without going into details of Welch’s distribution approximation. The introduction and adjacent discussion on the F test for equality of variances often varies in the level of detail. Welch’s t test is solely treated as a test for the equality of central location, as well as the F test as a test for the equality of scatter. Typically, there is no discussion if and to which extent testing for equality of the underlying normal distributions is possible. Not only is this astonishing looking at the motivation of the classical t test, but also due to (at least) two other reasons: For one thing lectures continue with general procedures for testing nested parametric models, including in particularlikelihood-ratiotests.Foranother,whenitcomestodealingwiththeone-way anova, you rarely fail to see the problem of multiple testing being mentioned, along with suitable corrections including, most of the times, the Bonferroni correction.

In some textbooks testing for equality of variances is merely left as an exercise if not outright skipped. A possible reason for this could be seen in the non-robustness of this particular test against deviances from the normal distribution. Still, as no alternative tests are at least alluded, students get the impression that differences in scatter are more or less irrelevant – variance is a statistical Cinderella. Yet, everybody actually applying statistical procedures knows very well that differences in variance and location are of comparable importance.

Summing up, it can be said that from a practical point of view, given a two-sample model under normality, the aim has to be to judge whether the two samples originate from basically similar distributions or not. However, in many cases the classical and, of course, very comfortable assumption of equal variances has no grounding. In the midst of these considerations, discussion in lectures and textbooks stops without further ado and the students (and maybe some lecturers as well) are left without a clue how to deal with this situation.

It is the aim of the following article to investigate the problem of testing for the equality of two normal distributions, and to do so using knowledge and methods adequate to statistical practitioners as well as to students in an introductory course in mathematical statistics. Mathematically speaking, the following testing problem will be considered: Let X1, . . . , Xm, Y1, . . . , Yn be independent normally distributed random variables, where Xi ∼ N

( μ, σ 2

) for all i = 1, . . . , m and Yj ∼ N

( ν, τ 2

)

for all j = 1, . . . , n. In contrast to Student’s t test, we do not make further assumptions about the parameters, so that

( μ, ν, σ 2, τ 2

) ∈ � = R2 × (0, ∞)2 is arbitrary. It is the objectivetotestifthetwosamplesstemfromidenticaldistributions.Thecorresponding testing problem is given by the following hypothesis and alternative:

123

Methods to test for equality of two normal distributions 583

H0 : ϑ = ( μ, ν, σ

2 , τ

2 )

∈ �0 = { ϑ ∈ � : μ = ν, σ 2 = τ 2

}

vs. H1 : ϑ ∈ � � �0. (1)

The first classical approach is to develop a likelihood-ratio test. Doing so is a simple way to obtain an asymptotically valid test. In Sect. 2 the likelihood-ratio test statistic is derived, and different approximations of the distribution of the test statistic under H0 found in the literature are summed up. Among them one can find an asymptotic expansion proposed by Muirhead (1982), as well as a recently developed method to derive the exact distribution by numerical integration (Zhang et al. 2012).

A further approach is to combine different p values as illustrated in Sect. 3. For this procedure the hypothesis H0 is obtained by combining the hypotheses of both t and F test. Performing both tests using the same data (x1, . . . , xm) and (y1, . . . , yn), the resulting p values can be combined yielding a new test statistic and, thus, a test result for (1). Most combination methods require the tests to be combined being independent under H0 which holds in the case under consideration. In the specific case of Fisher’s method, the same approach, but applied in a slightly different way, can be found in Perng and Littell (1976).

In Sect. 4, power of the different tests is compared empirically. The ability of each method to correctly detect the alternative differs with respect to whether there is a difference in expectation, variance, or both. Loughin (2004) compares the method of combining the p values without regard to a specific testing problem. However, it is instructive to apply these methods directly to the problem at hand and compare them with the likelihood-ratio tests in Sect. 2.

Situations where one is interested in differences in variability as well as in means can be found almost everywhere. A long list of such applications is compiled in Gastwirth et al. (2009). We discuss in Sect. 5 several examples from two subject areas, namely engineering and psychology. In particular, we consider several data sets arising from mental or intelligence tests since there is a large body of research supporting the existence of sex differences in specific cognitive abilities, some favouring men, some favouring women, sometimes differences are found in mean scores, or in variability, or in both.

2 The likelihood ratio test

A classic approach in order to construct a test for H0 is the application of the maximum likelihood method. The unrestricted maximum likelihood estimator ϑ̂ is given by

ϑ̂ = ( μ̂, ν̂, σ̂

2 , τ̂

2 )

= ⎛

⎝X̄, Ȳ , 1

m

m∑

i=1 (Xi − X̄)2,

1

n

n∑

j=1

( Yj − Ȳ

)2 ⎞

⎠ ,

with X̄ = 1m ∑m

i=1 Xi , Ȳ = ∑n

j=1 Yj, while the maximum likelihood estimator ϑ̂0 under H0 is given by

ϑ̂0 = ( μ̂0, μ̂0, σ̂

2 0 , σ̂

2 0

)

123

584 J. Frank, B. Klar

with μ̂0 = mX̄+nȲm+n and σ̂ 20 = 1m+n (∑m

i=1 ( Xi − μ̂0

)2 + ∑nj=1 ( Yj − μ̂0

)2 ) . Denot-

ing the likelihood function by L(ϑ), the likelihood ratio statistic �m,n is equal to

�m,n = L(ϑ̂0)

L(ϑ̂)

= ( 2π σ̂ 20

)− m+n2 exp ( − 1

2σ̂0 2

(∑m i=1

( xi − μ̂0

)2 + ∑mj=1 ( yj − μ̂0

)2 ))

( 2π σ̂ 2

)− m2 exp ( − 1 2σ̂ 2

∑m i=1

( xi −μ̂

)2 )

· (2π τ̂ 2)− n 2 exp

( − 1 2τ̂ 2

∑n j=1

( yj −ν̂

)2 )

= ( σ̂ 2

) m 2 · (τ̂ 2)

n 2

( σ̂ 20

) m+n 2

.

Assuming mm+n → p for m + n → ∞ and some p ∈ (0, 1), it follows from the general theory of likelihood ratio tests, given that �0 and � have dimensions 2 and 4, that

− 2 log �m,n D−→ χ22 for m + n → ∞ under H0 (2)

(Hogg et al. 2005, pp. 351–353). Hence, an asymptotic level α test rejects H0 if

−2 log �m,n ≥ χ22;1−α, (3)

where χ22;p denotes the p-quantile of the χ 2-distribution with 2 degrees of freedom.

Typically, fairly large sample sizes are needed to use these asymptotic results for finite samples. However, there are several approaches available to transform the test statistic or to determine a more exact distribution in order to improve the finite sample behaviour.

Pearson and Neyman (1930) directly considered �m,n, showing that under H0, the limiting distribution is the uniform distributionU(0, 1) [note that, if Z is uniformly dis- tributed, −2 log Z is exponentially distributed with mean 2, or χ22 -distributed; hence, this result is in agreement with (2)]. They proposed to approximate the exact distribu- tion of �m,n for finite n and m by a beta distribution matching the first two moments.

Muirhead (1982) considered an asymptotic expansion of the distribution of the likelihood ratio test statistic under multivariate normality; in the univariate case, we obtain the following corollary.

Corollary 2.1 Let Fχ2q denote the distribution function of the χ 2 q -distribution. It holds

under H0:

PH0 (−2ρ log �m,n ≤ u

) = F χ22

(u) + γ ρ2 (m + n)2

( F

χ26 (u) − F

χ22 (u)

)

+O ( (m + n)−3

) ,

123

Methods to test for equality of two normal distributions 585

Table 1 Comparison of the χ2- and Muirhead-approximation for m = 10 and n = 20

p Asymptotic χ2 Muirhead approximation

F−1 χ22

(p) p-quantile of −2 log �10,20 F−110,20(p) p-quantile of−2ρ log �10,20 0.75 2.77 3.13 2.70 2.79

0.90 4.61 5.19 4.46 4.64

0.95 5.99 6.74 5.77 6.02

0.99 9.21 10.33 8.74 9.22

0.999 13.82 15.48 12.78 13.83

with

ρ = 1 − 22 24(m + n)

( m + n m

+ m + n n

− 1 )

,

γ = 1 2

(( m + n m

)2 +

( m + n

n

)2 − 1

)

− 121 96

( m + n m

+ m + n n

− 1 )2

.

Hence, the function Fm,n, defined by

Fm,n(u) = Fχ22 (u) + γ

ρ2 (m + n)2 ( F

χ26 (u) − F

χ22 (u)

) ,

is an approximation of the distribution function of −2ρ log �m,n under the hypothesis. Then, an approximate test of H0 against H1 rejects H0 if

−2ρ log �m,n ≥ F−1m,n(1 − α). (4)

The improvement achieved by this expansion is illustrated in Table 1. There, the quantiles of F

χ22 and F10,20 are compared with the simulated quantiles of −2 log �10,20

and −2ρ log �10,20 (based on 105 replications) for sample sizes m = 10 and n = 20. Table 1 indicates that the empirical and theoretical levels are much closer for the Muirhead-approximation than the test based on asymptotic χ2 results. For practical purposes, the empirical level of the test based on the expansion is sufficiently close to the theoretical level even for small sample sizes.

There have also been several approaches to determine the exact distribution of �m,n in more or less computable form. Jain et al. (1975) developed computable but complicated series representations for the density and distribution function. Nagar and Gupta (2004) tabulated the distribution of �m,n for the balanced case m = n. Zhang et al. (2012) determined the exact distribution of �m,n as

P ( �m,n &