Editing
Problem 2-1, Comparing a single mean to a specified value
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Problem statement== ''The breaking strength of a fiber is required to be at least 150 psi. Past experience has indicated that the standard deviation of breaking strength is σ = 3 psi. A random sample of four specimens is tested, and the results are y<sub>1</sub> = 145, y<sub>2</sub> = 153, y<sub>3</sub> = 150, and y<sub>4</sub> = 147.'' <ol style="list-style-type:lower-latin"> <li>''State the hypotheses that you think should be tested in this experiment.''</li> <li>''Test these hypothesis using α = 0.05. What are your conclusions?''</li> <li>''Find the ''P''-value for the test in part (b).''</li> <li>''Construct a 95 percent confidence interval on the mean breaking strength.''</li> </ol> ==Solution== [[Image:Gaussian1.png|thumb|left|'''Figure 1:''' Our data compared to a theoretical Gaussian distribution.]] ===Section A: Choosing hypotheses=== In this problem we are given a set of four data points. These data points all come from a distribution of breaking strengths which has an unknown mean μ. We will call this the ''true distribution''. Previous experience indicates that breaking strengths follow a Gaussian ''theoretical distribution'' with a standard deviation of 3 psi, so we assume this for our distribution also. Our task is to determine whether or not the true mean, which is impossible to know exactly, is greater than or equal to 150 psi. We plot this data and the distribution in figure 1. Since the sample mean is an approximation of the true mean, we define the ''standard error of the mean'' (SEM) to be <math>\sigma/\sqrt{n}=1.5</math>, where n=4 is the number of data points. <table border=1 cellspacing=0 cellpadding=4> <tr><th bgcolor="#eeeeee"></th><th bgcolor="#eeeeee">Distribution type</th><th bgcolor="#eeeeee">Mean</th><th bgcolor="#eeeeee">Standard deviation</th></tr> <tr><th bgcolor="#eeeeee">True distribution</th><td>Normal</td><td><math>\mu\approx\overline{y}=148.25</math></td><td><math>\sigma=3</math></td></tr> <tr><th bgcolor="#eeeeee">Theoretical distribution</th><td>Normal</td><td><math>\mu_0=150</math></td><td><math>\sigma_0=3</math></td></tr> </table> We first state two hypotheses. The null hypothesis is that our data does come from the theoretical distribution: the true mean μ = μ<sub>0</sub>. Our alternative hypothesis states that the data comes from a distribution centered around a different mean. There are three choices for the alternative hypothesis: μ < 150, μ > 150, and μ ≠ 150. We adopt the convention that the alternative hypothesis will be true if the data does not meet the requirements. In this case, the breaking strength of the fiber is required to be at least 150 psi, so we choose μ < 150 as our alternative hypothesis. Formally, we state our hypotheses as:<br/> <center>H<sub>0</sub>: μ = 150<br/> H<sub>1</sub>: μ < 150</center> <br clear='all'> [[Image:Gaussian2.png|thumb|left|'''Figure 2:''' Our plot after normalizing.]] ===Section B: Z-values=== For convenience, we start by standardizing our theoretical distribution to have a mean of zero and a standard deviation of one. To do this, we first center the distribution around zero by subtracting the theoretical mean (150) from each point in the distribution. We then divide each point by the standard deviation (3). The sample mean can be standardized in the same manner. We plot the normalized distribution and sample mean in figure 2. We now assume that the null hypothesis is true and ask whether or not this assumption makes sense. Given that this assumption is true, the sample mean is most likely to be close to zero. To test this, we define a range over which we consider our sample mean to be unacceptable, the ''rejection region''. If the sample mean is in the rejection region, it is too far from zero and we reject the null hypothesis. We will define the lower limit to be z<sub>α</sub>, where α=0.05. Graphically, given a standard Gaussian distribution, the area under the curve left of z<sub>0.05</sub> is equal to 5% of the total area. You can either look up z<sub>α</sub> in a table or calculate it using a software package. Using Excel, the appropriate function is <tt>=NORMSINV(alpha)</tt>. The corresponding function in R is <tt>qnorm(alpha)</tt>. Using one of these methods, we find that z<sub>0.05</sub>=−1.64. We then find the z-value of our data and compare the z-value to z<sub>α</sub>. The formula for the z-value is as follows: <center><math>z=(\overline{y}-\mu_0)(\frac{1}{\sigma})(\sqrt{n})</math></center> Because we have already standardized our data, μ<sub>0</sub>=0 and σ=1, so this formula simplifies to <math>\overline{y} \sqrt{n}=-0.417\cdot2=-0.833</math>. Note that the formula above normalizes the data, if it has not already been normalized. The z-value can be interpreted as the distance between the sample mean and μ<sub>0</sub>, scaled by a factor which makes the z-value more extreme with large sample sizes. If we take many samples, our z-value is more likely to fall in the rejection region, because we are more certain of the accuracy of our sample mean. The rejection region for our z-value is from negative infinity to z<sub>α</sub>. We see that our z-value is greater than z<sub>α</sub>. Therefore, we cannot reject the null hypothesis. [[Image:Gaussian3.png|thumb|left|'''Figure 3:''' Illustrating the P-value.]] ===Section C: P-values=== Another way to judge how likely it is that our null hypothesis is true is to calculate the P-value. If we were to redo the experiment, taking four new data points, the P-value gives us the probability of our new sample mean being at least as extreme as our original sample mean. Graphically, if we extend the critical region until it reaches our z-value, the P-value is equal to the area of the shaded region (see figure 3). To calculate the P-value in Excel, use <tt>=NORMSDIST(-ABS(z))</tt>. In R, use <tt>pnorm(-abs(z))</tt>. (We use the negative absolute value because <tt>NORMSDIST</tt> and <tt>pnorm</tt> integrate from negative infinity to the z-value. If the z-value is positive, we instead want to integrate from the z-value to positive infinity, which is mathematically equivalent to integrating from negative infinity to the negative of the z-value.) For this problem, we find that the P-value is 0.202. Note that a P-value of 0.5 indicates that the sample mean is equal to the mean of the theoretical distribution. You can see this graphically by noting that the z-value will be zero in this case, and integrating the theoretical distribution to zero covers half of the area. (Recall that the total area under a standard Gaussian curve is one.) The further the P-value is from 0.5, the greater the distance between the two means. <div style="float:left; vertical-align: top; padding-right: 20px; padding-bottom: 20px;">[[Image:Gaussian4.png|thumb|none|'''Figure 4:''' The confidence interval about the sample mean.]]<br> [[Image:Gaussian5.png|thumb|none|'''Figure 5:''' The confidence interval about the theoretical mean.]]</div> ===Section D: Confidence intervals=== We now return to our original data set and theoretical distribution with the mean of 150 psi; that is, we will no longer use our normalized space. We will now calculate the range of sample means that would lead us to conclude that the breaking strength of our fiber is at least 150 psi, given an α of 0.05. This range is known as the confidence interval about the sample mean. To calculate this interval, we ask what sample mean would give us a z-value equal to z<sub>α</sub>. We can determine this by substituting z<sub>α</sub> for z into the formula for z, and solving for <math>\overline{y}</math>: <center><math>z_\alpha=(\overline{y}-\mu_0)(\frac{1}{\sigma})(\sqrt{n}) \Rightarrow \overline{y} = \mu_0+\frac{z_\alpha \sigma}{\sqrt{n}}=147.53</math></center> This is the lower limit of our confidence interval. Because any sample mean greater than 150 is acceptable, the upper limit of the confidence interval is infinity. We plot this interval in figure 4. Formally, our confidence interval about the sample mean is <center><math>147.53 < \overline{y} < \infty</math></center> We next calculate a confidence interval about the mean of the theoretical distribution, μ<sub>0</sub>. This will give us the range of minimum breaking strengths we could have specified and still found our data acceptable. We can calculate this in much the same way as the previous confidence interval: substitute z<sub>α</sub> for z in the formula for z, but this time solve for μ<sub>0</sub>: <center><math>\mu_0=\overline{y}-\frac{z_\alpha \sigma}{\sqrt{n}}=151.22</math></center> This is the upper limit of our confidence interval. The lower limit is zero, because we simply require the theoretical mean to be less than this number. Formally, our confidence interval about the theoretical mean is <center><math>0 \le \mu_0 < 151.22</math></center>
Summary:
Please note that all contributions to Statistics may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Statistics:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information