# stats homework using R

I’m stuck on a Statistics question and need an explanation.

#Name:

#Student ID:

rm(list=ls())

source(‘Rallfun-v33.txt’)

#PART 1

#A company claims that, when exposed to their toothpaste, 45% of all bacteria related to gingivitis are killed, on average. You run 10 tests and ???nd that the percentages of bacteria killed in each test was:

# 38%, 44%, 62%, 72%, 43%, 40%, 43%, 42%, 39%, 41%

# Assuming normality, you will test the hypothesis that the average percentage of bacteria killed was 45% at alpha=0.05.

#1.1) Write out the Null and Alternative hypotheses

#1.2) Calculate the T-statistic and use Method 1 (we saw in class) to determine if the average bacteria killed was 45%. Do it by “hand”.

#Hint: Method 1 is to compare T to a critical value “c”.

#1.3) Do you reject or fail to reject the null?

################################################

#PART 2

#Now, let’s not assume normality

#2.1) Using the same data as in Part 1, test the hypothesis that the 20% trimmed mean is 45%?

#2.2) Do you reject or fail to reject the null?

#2.3) Assuming your test in 2.1 is the truth, what type of error did you make in #1.3?

################################################

#PART 3

#In a study of court administration, the following times to disposition (in minutes) were determined for twenty cases and found to be:

# 42, 90, 84, 87, 116, 95, 86, 99, 93, 92, 121, 71, 66, 98, 79, 102, 60, 112, 105, 98

#Assuming normality, you will test the hypothesis that the average time to disposition was 99 minutes at alpha=0.05.

#3.1) Write out the Null and Alternative hypotheses

#3.2) Calculate the T-statistic and use Method 2 (we saw in class) to determine if the average time to disposition was 99? Do it by “hand”.

#Hint: Method 2 is to evaluate the confidence interval.

#3.3) Do you reject or fail to reject the null?

################################################

#PART 4

#Now, let’s not assume normality

#4.1) Using the same data as in Part 3, test the hypothesis that the 20% trimmed mean is 99?

#4.2) Do you reject or fail to reject the null?

#4.3) Assuming your test in 4.1 is the truth, what type of error did you make in #3.3?

################################################

#PART 5

#Suppose you run an experiment, and observe the following values:

# 12, 20, 34, 45, 34, 36, 37, 50, 11, 32, 29

#You will test the hypothesis that the average was 25 at alpha=0.05.

#5.1) Write out the Null and Alternative hypotheses. Conduct the hypothesis test assuming normality. Use the “t.test” function. Do you reject or fail to reject the null?

#5.2) Conduct the hypothesis test without assuming normality. Do you reject or fail to reject the null?

#5.3) Assuming the answer in #5.2 is the truth, what type of error (if any) did you make in #5.1 by assuming normality?

——————————————————————————————

Lab 7- Lecture Notes (FOR YOUR REFERENCE)

#Lab 7-Contents

#1. Formulating Hypotheses

#2. T-statistics by Hand

#3. Alpha Level

#4. Evaluating Our Results

#5. Using the t.test function

#6. T-tests with Trimmed Means (trimci function)

#7. Type 1 and Type 2 Errors

# Last week we talked about computations for when the Population

#Variance is known and unknown.

# Given that we rarely know the population variance,

#we will use the T-distribution for all of today’s lab.

#We will primarily work with the dataset brfss09_lab7.txt:

#########################################################################################################################

#Behavioral Risk Factors Surveilance Survey 2009 (BRFSS09) Data Dictionary:

#————————————————————————————————————————

#id: “Subject ID”Values[1,998]

#physhlth: “# Days past month phsycial health poor” Values[1,30]

#menthlth: “# Days past month mental health poor”Values[1,30]

#hlthplan: “Have healthcare coverage?”Values 1=Yes, 2=No

#age:”Age in Years”Values[18,99]

#sex:”Biologic Sex”Values 0=Female, 1=Male

#fruit_day: “# of servings of fruit per day”Values[0,20]

#alcgrp: “Alcohol Consumption Groups”Values 1=None, 2= 1-2 drinks/day 3= 3 or more drinks/day

#smoke:”Smoking Status”Values 0=Never, 1=Current EveryDay, 2=Current SomeDays, 3=Former

#bmi:”Body Mass Index”Values[14,70]

#mi:”Myocardial Infarction (heart attack)”Values 0=No, 1=Yes

#————————————————————————————————————————

# For today’s lab, let’s start by reading in our datafile

# ‘brfss09_lab7.txt’ into an object called mydata

#This file contains:

dim(mydata)#100 Subjects, 11 variables

#With the following variables:

names(mydata)

# We have collected this data and would like to know

#if the values we have found in our sample are different

#from the reported values in the literature.

# For example, it has been reported that the average BMI

# in the population is 27.5. We would like to know if the

#values in our sample are somehow different than this value.

#———————————————————————————

# 1. Formulating Hypotheses

#———————————————————————————

#Step 1 of determining if our BMI values differ from the

#national average of 27.5 is to formulate our hypotheses

#We have TWO hypotheses

#1) The Null Hypothesis: H0: mu = 27.5

#2) The Alternative Hypothesis: HA: mu != 27.5

#NOTE: mu=Population Mean

#The above hypotheses are Two-Sided.

#By this I mean that we are looking to see if our sample values of

#BMI are greater than (>) OR less than (<) 27.5.

# A one-sided hypothesis test would look like:

#H0: mu < 27.5

#HA: mu > 27.5

#OR

#H0: mu > 27.5

#HA: mu < 27.5

#We will always use two-sided tests in this class,

#and similarly in the real world two-sided tests dominate.

#Once we have our hypotheses we will evaluate them

#and determine one of two outcomes:

# A) Reject the Null Hypothesis

# B) Fail to Reject the Null Hypothesis

#———————————————————————————

# 2. T-statistics by Hand (well..with help from the computer)

#———————————————————————————

#Recall from the last lab, that the formula for a T-statistic is:

# T = (SampleMean – PopMean) / (SampleSD/sqrt(N))

#Another way to write this would be:

# T = (xbar – mu) / (s/sqrt(N))

#In this instance PopMean (mu) is the NULL hypothesis

#value we are testing against.

#We can solve for the other values that we don’t yet know:

mu=27.5

xbar=mean(mydata\$bmi) #28.22

s=sd(mydata\$bmi) #6.32

N=100

T = (xbar – mu) / (s/sqrt(N))

T #1.14

#We end up with a T value of ~ 1.14

#But how does this tell us if our mean is different from 27.5 ???!!!

#Before we move on, I want us to think about why we need

#to evaluate if our mean of 28.22 is different from 27.5.

#Certainly we can see that these are different numbers,

#so what are we really asking here?

#One way to think about it is that we are asking if our

#sample mean of 28.22 is different from 27.5 simply due to chance.

#Think of a coin tossing example:

#Your friend tosses a coin in the air and it lands on heads

#3 times in a row!

#While, kinda cool, seems like that is probably random chance.

#What about if it landed on heads 100 times in a row?!

#You would probably think she was cheating somehow!

#Though it is possible to have 100 heads in a row

#by chance alone, it is very unlikely

#The point at which we say that something is random vs not

#is determined by our alpha level.

#———————————————————————————

# 3. Alpha Level

#———————————————————————————

# The alpha level is determined a priori (a head of time)

#and used to set the threshold by which we consider something

#to be random chance

# A common alpha level is 0.05.

# We typically reject the null (think something is not chance)

#when the result we have (eg. 28.22) would only be

#that extreme < 5% of the time by chance.

#Recall from Lab 6, that we use the alpha level

#to help figure out critical values (c)

# c=qt(1-(alpha/2), df)

#———————————————————————————

# 4. Evaluating our Results

#———————————————————————————

# There are 3 ways to evaluate if our mean of 28.22

# is different from the null of 27.5

# All three ways will yield the same conclusion.

#1) Compare T to a critical value (c)

#2) Evaluate the Confidence interval

#3) Compare the p value to our alpha level

###########################################################

#1) Compare T to a critical value (c)

#In order to compute the critical value (c),

#we must know the alpha level.

#We will choose a value of 0.05 (which is standard)

alpha=0.05

df=100-1

c=qt(1-(alpha/2), df)

#We can then compare the abosulte value of T (|T|)

#to the critical value c

#A) If |T| > c, then Reject the Null Hypothesis

#B) If |T| < c, then Fail to Reject the Null Hypothesis

#Let’s look at T can c

abs(T)

c

#What decision do we make about the Null Hypothesis????

###########################################################

#2) Evaluate the Confidence interval

#Rather than compare T to c,

#we could instead compute the confidence interval.

#Recall the formula for the Confidence interval is:

#LB= xbar – c*(s/sqrt(N))

#UB= xbar + c*(s/sqrt(N))

LB = xbar – c*(s/sqrt(N))

UB= xbar + c*(s/sqrt(N))

#A) If mu is not within the Confidence Interval,

#then Reject the Null Hypothesis

#B) If mu is within the Confidence Interval,

#then Fail to Reject the Null Hypothesis

#Let’s look at LB and UB

LB

UB

mu

#What decision do we make about the Null Hypothesis????

###########################################################

#3) Compare the p value to our alpha level

#Lastly, we could find the probability value (or p-value)

#for the T statistic we created.

#We can do this by using the pt() function we learned

#about last week in lab 6.

#There is a forumla for computing P values from T-statitics:

# pval = 2*(1-pt(abs(T), df))

pval = 2*(1-pt(abs(T), df))

#We then compare the p-value to our alpha level

#A) If pval < alpha, then Reject the Null Hypothesis

#B) If pval > alpha, then Fail to Reject the Null Hypothesis

#Let’s look at our p-value.

pval

alpha

#What decision do we make about the Null Hypothesis????

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#Exercise 4-1:

#Evaluate if the mean age from our sample (mydata) is different

#than the populatiuin mean age of 56

# A) Write down the Null and Alternative Hypotheses

# B) Calculate the T-statistic by hand

# C) Evaluate the Null hypothesis by using ALL 3 methods that

# we just discussed

# D) Based on the results in C, do you Reject or Fail to Reject

# the Null Hypothesis?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#A)

#B)

#C)

#Method 1: Compare T to a critical value (c)

#Method 2: Evaluate the Confidence interval

#Method 3: Compare the p value to our alpha level

#D)

#———————————————————————————

# 5. Using the t.test function

#———————————————————————————

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

# One Sample T-Test : t.test(data\$variable, mu)

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#It was really awesome that we figured out T by hand!

#And then figured out the confidence intervals and P values!

#From now on, let’s just use a program to do all this for us.

#The function t.test will presume an alpha level of 0.05 by default.

t.test(mydata\$age, mu=56)

# t.test(mydata\$bmi, mu=27.5)

#Much simpler!

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#Exercise 5-1: Use the t.test function to evaluate if

#A) the mean days of physical health (physhlth) is different

# than the population mean of 10? Reject the Null?

#B) the mean fruits per day (fruit_day) is different than

# the populatiuin mean of 4? Reject the Null?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#A)

#B)

#———————————————————————————

# 6. T-test with Trimmed Means

#———————————————————————————

#To use the T-test with trimmed means,

#we will need to load in the source code ‘Rallfun-v33.txt’

#The trimmed mean T-test is beneficial in that it does not

#presume a perfect Normal Distribution

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

# Trimmed Mean T-Test:

# trimci(data\$variable, tr=0.2, alpha=0.05, null.value=0)

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#For example, if I wanted to test if the age was equal to 56

#using Trimmed Means I could do:

trimci(mydata\$age, null.value=56)

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#Exercise 6-1: Use the trimci function to evaluate if

#A) the 20% trimmed mean of days of physical health (physhlth) is

# different than the populatiuin mean of 10? Reject the Null?

#B) the 20% trimmed mean fruits per day (fruit_day) is different

#than the populatiuin mean of 4? Reject the Null?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#A)

#B)

#———————————————————————————

# 7. Type 1 and Type 2 Errors

#———————————————————————————

#questions in Ex. 5-1 and 6-1

#Depending upon the method that we used.

#This brings us to discussing Type 1 and Type 2 Error

#A Type 1 error is when our test tells us to reject the null,

#but in truth we should not have

#A Type 2 error is when our test tells us to fail to reject the

#null, but in truth we should have rejected the null

#The following 2×2 square might make this easier to see.

# Truth

#————————————

#| H0 | HA |

#————– |——-|———–|

#My Test: H0 | H0 Type 2|

#————– |——-|———–|

#My Test: HA Type 1 | HA |

#————————————

#For the next exercise, let’s presume that our test of the trimmed mean is the Truth

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#Exercise 7-1:

#A) What type of error did we make when evaluating the mean

#of physhlth in exercise 5-1?

#B) What type of error did we make when evaluating the mean

#of fruit_day in exercise 5-1?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#A)

#B)