I’m stuck on a Statistics question and need an explanation.
I CHOSE SUFFOLK COUNTY NEW YORK
For the next three modules we will be looking at outdoor air quality data for counties in New York, Vermont, Massachusetts, New Jersey, and Pennsylvania. For the Module 7 Assignment with Discussion, select one county in New York, Vermont, Massachusetts, New Jersey, or Pennsylvania that has not already been claimed by a classmate – it does not have to be the county you live in but it must be a different county than anyone else has claimed. Create a main post where you claim that county no other student has claimed. That main post will be empty, where the title is the county you are claiming. Then at a later time respond to your main post with your discussion (responses to the questions posed) and attach your Excel file (including work/commands/results/labels) to that response.
You will not receive credit if you post/select a county that has already been claimed by a classmate.
We will obtain a new data set for this module that we will continue to use for the remainder of the term. All external links below will open in a new window.
- Visit the United States Environmental Protection Agency’s Air Data web site.
- Select Download Daily Data (second option under Download Data).
- Select the Pollutant PM2.5.
- Select the Year 2018.
- Select a County.
- For the Monitor Site category select only one of the sites should multiple monitoring sites exist in the geographic area you selected. Do not select All Sites.
- Click on Get Data.
- In a few moments a link will appear at the bottom of that same page. Click on that link to download the data to your own computer. Save the data file in an Excel format (.xls or .xlsx). You may have to use the “Save As” feature in Excel to do this.
Time to check out your data!
Take a look at the names of the columns. We will be analyzing fine particulate pollution referred to as PM2.5. The PM2.5 concentrations are given (Daily Mean PM2.5 Concentration) as well as the air quality index for that pollutant (Daily_AQI_Value). Please take a few minutes right now to learn more about the Air Quality Index (AQI) which is calculated for four major air pollutants regulated by the Clean Air Act: AQI Brochure.
Look in your data set for the AQS_Parameter_Code column and scroll down this column. There are two different codes you might see here, namely 88101 and 88502. These denote the reference method used for measuring mass concentrations of PM2.5. Code 88101 denotes a single filter 24 hour balanced model PQ200 PM2.5 sampler with WINS, while Code 88502 denotes an R&P model 2025 PM2.5 sequential air sampler with VSCC. Each is considered an acceptable method for collecting the PM2.5 particulate measurements. If your data set has both measurement types, you will want to delete all rows of (either) one of these.
Please edit the name of the sheet with the data on it, giving it the name “PM2.5 Data”. Create a second sheet and give that sheet/tab the name “Mod 7” (as we will be working in this same Excel file for the remainder of the term). For the activities of Module 7 please work in the Mod 7 tab you have created.
Use Excel’s COUNTIF function to find the number of days with a PM2.5 AQI value over 50. On such days, the air quality conditions are considered not to be “Good” per the EPA’s Air Quality Index. Use this to find the proportion of days for which the air quality was not good in your sample (phat). A success will be a day in which the PM2.5 AQI is above 50.
We first must check that the conditions of the Central Limit Theorem apply for estimating proportions in a population.
- The Random and Independent condition is met by the EPA’s collection agencies.
- The Large Sample condition must be checked (by you). If phat is the proportion of days with AQI above 50, then both n*phat and n*(1-phat) must be greater than or equal to 10 to meet the Large Sample condition. Within your Excel file, clearly label and show this condition is met.
- The Big Population condition is met for our data.
When these three conditions are met, we can use the Normal distribution to find probabilities concerning the sample proportion. If your data set does not meet the Large Sample condition, obtain a new data set for a different county that has not already been selected by a classmate. You will then need to check the three conditions of the Central Limit Theorem again, making sure your new data set meets these conditions. Within your Excel file, clearly label and show this condition is met.
Clearly label cells with the names and values for the following: number of successes in sample, sample size, sample proportion of successes, z value multiplier for 95% confidence interval, the estimated standard error, and the confidence interval. By hand calculate the estimated standard error and the confidence interval (using a calculator to do the math) using formulas 7.2 from our text. Confirm your results using StatCrunch, inserting your StatCrunch results into your worksheet in the Excel file. Next, find the 90% confidence interval – by hand or with StatCrunch, and clearly include this information in your Excel file.
Create a single post addressing all of the questions below (using complete and descriptive sentences). Attach your Excel data file (*.xls or *.xlsx) to your discussion post. The Excel file will include not only the data, but also the use of commands and clearly labeled results.
How is the 90% confidence interval you found different than the 95% confidence interval? Why is this so? What concerns might you have regarding the actual proportion of days with AQI that is not “Good”? Without doing any calculations, what might you guess would be the 99% confidence interval for you data? Which confidence interval is likely the most reasonable to consider (90%, 95%, or 99%) and why?
In addition to your main post, you are to respond to at least two other students’ main posts. Substantive responses to other students’ main posts might include the following: (1) a detailed comparison of your results to their results including proposing why the results might (or might not) differ between your two regions, or (2) detailed and polite recommendations for improvements should their work be incorrect or not fully meet the assignment’s requirements.