09 Sep All students are to complete simple regression preliminary analysis.? Examine the correlations between all interval and ratio level variables and select, for your investigation, the 2 varia
- This Assignment requires you to submit to Moodle a simple regression preliminary report write up. Submit in PDF format only
You will perform a simple regression pre-assessment to assist you in identifying evidence for (or against) the feasibility of performing a simple regression analysis. That is, you are NOT being requested to run a full simple regression analysis of this dataset. Please DO NOT submit simple regression tables with model or coefficient significance interpretations.
DIRECTIONS:
1) All students are to complete simple regression preliminary analysis. Examine the correlations between all interval and ratio level variables and select, for your investigation, the 2 variables with the highest correlation. Report the correlation table and your 2 variable choices.
2) Reference the preliminary analysis guide and preliminary report writeup exemplar attached to this assignment. Also reference the readings assigned in Moodle, textbook, and notes from this week posted to Moodle to support your efforts.
Be sure to address ALL of the following, in your write up, including all supportive numerical summaries and charts formatted to class expectations:
- Has the linearity assumption been met? Why or why not?
- Has the normality assumption been met? Why or why not?
- Has the independence assumption been met? Why or why not?
- Has the equal variances assumption been met? Why or why not?
- Is a transformation warranted? Why or why not? Why or why not?
NO PLAGIARISM
Simple Regression Preliminary Analysis Report Guide
Please submit preliminary report writeup, to Moodle discussion as an attachment in PDF format only. Tables, charts, and graphs should be formatted as expected for the class. Font size of 11 pt or higher. DO NOT SUBMIT YOUR EXCEL DATASET. Copy and paste the numerical prompts below into your submission document and record your responses , as applicable, directly beneath. Include the following in your submission:
Simple Regression Pre-analysis (include ALL charts, graphs and tables you generate formatted to class expectations). Assume all variables (with the exception of the subject variable) are ratio level.
1. YOU MUST HAVE the Data Analysis Tool Pack add-in installed to complete this assignment. Please review the notes for the week regarding how to use the Regression tool within the Data Analysis Tool Pack.
2. Use Excel to calculate and the report the correlation number for all candidate variables for this analysis. Interpret the result of the correlation number by discussing the linear strength and direction of the relations observed .
3. Select 1 variable to use as an explanatory and one as a response variable. State your choices
4. Use Excel to generate and report a scatter plot for the 2 variables. The Explanatory variable should be on the horizontal axis. Interpret the form and direction of the plot. Format chart to class expectations.
5. Using only the visual inspection of the scatter plot, is a transformation warranted? Please review course notes and use what you notice about the dot pattern within the scatter plot as evidence to support your answer.
6. Use the Data Analysis Tool pack to run the regression analysis over your variables to generate a residual plot in MS Excel for these variables. DO NOT INTERPRET NOR REPORT THE NUMERICAL SUMMARIES generated in the regression tables. Only examine the residual plot generated by Excel. Copy and Paste ONLY the residual plot into your preliminary report. Using only your visual inspection of the residual plot, is a transformation warranted? Review course notes and use what you notice about the dot pattern within the residual plot to support your answer.
7. Check the 4 simple regression assumptions (linearity, normality, equal variance, independence). Reference class notes so that you are clear how to perform these checks using the Excel worksheet . Create 4 subheadings to discuss EACH of the 4 assumptions indicating if the assumptions were met or were violated. Under each subheading, also include (or reference by name and number ‘Table 1, Figure 2 etc …) any tables or charts you’ve generated as evidence for your conclusions. Format all charts and tables to class expectations (see exemplar). Your claims regarding the assumptions must be based on content contained within the tables, numerical summaries , or charts that you generated. Please reference them by name in your write up. For instance “ It can be seen in Figure 1 ….” Etc …
,
Simple Regression Preliminary Analysis Report
Assumption for the analysis: Let us assume each row in dataset indicates the values in a calendar
month.
a) comp.million : Compensation given to ‘n’ the employees working at the time, in millions. b)
sales : Sales in millions.
c) No. employees : Total number of employees working in the given month.
d) Cap. Investment : Amount invested in the company in a given month in millions.
e) Manufacturing : Indication if the row values are relevant to manufacturing or not.
1. YOU MUST HAVE the Data Analysis Tool Pack add-in installed to complete this assignment. Please
review the notes for the week regarding how to use the Regression tool within the Data Analysis
Tool Pack.
2. Use Excel to calculate and the report the correlation number for all candidate variables for this
analysis. Interpret the result of the correlation number by discussing the linear strength and
direction of the relations observed.
A: The correlation coefficient can range between -1 and +1. A value 0 means there is no linear
correlation between the 2 variables. A value of 1 means there is a perfectly positive linear correlation
and a positive number, between 0 and 1 indicates that a variable increases in value along with the other
by the said factor. -1 means there is a perfectly negative correlation and a negative value indicates that a
value decreases as the other value increases by the said factor.
With the above understanding, we can interpret that compensation, sales, no. of employees, and
investment variables are all strongly corelated linearly, although not perfectly since the numbers
highlighted in green are very close to 1. This indicates that an increase of each of these variables, has a
direct impact on the others positively.
Manufacturing, on the other hand, has weak positive correlation with compensation, employees and
investment (highlighted in blue) and a slightly negative impact on sales(highlighted in red). This indicates
that the sales values relatively drops with manufacturing, but manufacturing variable doesn’t have a
major impact on rest of the variables.
Table 1: Correlation matrix of all candidate variables in dataset
comp.million sales no.employees cap.investment manufacturing
comp.million 1 sales 0.950554719 1 no.employees 0.955076813
0.997065477 1 cap.investment 0.922777565 0.994771967 0.990264565 1
manufacturing 0.015793034 -0.003322363 0.055742601 0.01950762 1
3. Select 1 variable to use as an explanatory and one as a response variable. State your choices
A: Explanatory variable (x) – Capital Investment
Response variable (y) – Sales
4. Use Excel to generate and report a scatter plot for the 2 variables. The Explanatory variable should
be on the horizontal axis. Interpret the form and direction of the plot. Format chart to class
expectations.
A: We can interpret from the above graph that with the increase in investment, the sales have
increased. Hence, we can conclude that the variables being studied have a positive linear
association.
Figure 1: Scatter plot of Sales with respect to Capital investment (both in millions)
5. Using only the visual inspection of the scatter plot, is a transformation warranted? Please review
course notes and use what you notice about the dot pattern within the scatter plot as evidence to
support your answer.
A: Figure 1, shows a positive linear association and if we were to draw a regression line, it would be
pretty close to the plotted variables. Transformations are warranted when ‘residual’ values are high
and thus applied on explanatory variable, response variable, or both, on a trial and error basis until
we find a best fit to match a linear regression equation. Transformations can be a Log, Ln, square
root, reciprocal or exponential functions. Once we have an equation that brings the scatter plot dots
close enough to the regression line, we can then predict values on both higher and lower ends of
the variables by evaluating the equation. But since in this case, we have a scatter plot that is already
close to the imaginary regression line, a transformation is not warranted.
6. Use the Data Analysis Tool pack to run the regression analysis over your variables to generate a
residual plot in MS Excel for these variables. DO NOT INTERPRET NOR REPORT THE NUMERICAL
SUMMARIES generated in the regression tables. Only examine the residual plot generated by Excel.
Copy and Paste ONLY the residual plot into your preliminary report. Using only your visual
0.0
100.0
200.0
300.0
400.0
500.0
600.0
700.0
800.0
900.0
0.0 50.0 100.0 150.0 200.0 250.0 300.0
Capital investment in millions
inspection of the residual plot, is a transformation warranted? Review course notes and use what
you notice about the dot pattern within the residual plot to support your answer.
A: Residuals and residual plots determine how well a regression line describes the data. This can be
confirmed if no obvious pattern is observed in a residual plot and if the residuals are relatively small
in size. In Figure 2 below, though we do not see an obvious pattern, we do see 2 values (marked in
red) that are far from the regression line with a deviation of about +40 and -60 units. Also, 4 plots
between +20 and +40 units on the Residual axis(marked in green). This does not warrant for much
of a deviation, but since in our case, we are dealing with values in millions, I believe this can have a
significant effect and warrants a transformation. As we move from left to right on the x-axis, we can
see that the spread increases in the top half indicating that our transformation may be warranted.
Figure 2: Excel graph of Residual plot of Sales (response variable)
7. Check the 4 simple regression assumptions (linearity, normality, equal variance, independence).
Reference class notes so that you are clear how to perform these checks using the Excel worksheet .
Create 4 subheadings to discuss EACH of the 4 assumptions indicating if the assumptions were met
or were violated. Under each subheading, also include (or reference by name and number ‘Table
1, Figure 2 etc …) any tables or charts you’ve generated as evidence for your conclusions. Format all
charts and tables to class expectations (see exemplar). Your claims regarding the assumptions must
be based on content contained within the tables, numerical summaries , or charts that you
generated. Please reference them by name in your write up. For instance “ It can be seen in Figure
1 ….” Etc …
A: For a dataset to be eligible for a simple linear regression, all 4 assumptions mentioned below
should be satisfied. I will assess the same for sales and capital investment variables assessing their
properties for each.
-80
-60
-40
-20
0
20
40
60
0 100 200 300 400 500 600 700 800 900
Sales
Linearity: Linearity can be checked from a scatter plot to see what type of correlation it displays i.e.
positive, negative, or none. Since the scatter plot below displays positive linearity, the linearity
assumption is met and Sales and Capital investment data are eligible for linear regression analysis.
Figure 3: Regression line for the scatter plot data
Independence: Here, we only see if there exists a positive, negative or no correlation in the residual
plot and since Figure 2 showing residual plot of sales does not display correlation between the 2
variables and appear to be random, we are good to use this data for a linear regression. To support
this theory, applying =correl(<predicted_sales>,<residuals>) formula in excel fetches the
value -2.98663𝑒!"# which is very close to zero.
Normality: To check the normality, a probability plot of residuals are used as shown in Figure 4
below. The rule is that normality assumption is considered met if the dot pattern closely follows the
trend line, and depending on what pattern it follows, we can deduce the shape of the curve to be
normal, skewed left or right, thick or thin tailed. Figure 4 shows the plots following the trend line
and this specific pattern indicates the data is skewed to the left thus proving the data is good fit for
linear regression.
y = 3.0108x + 23.287 R² = 0.9896
0.0
100.0
200.0
300.0
400.0
500.0
600.0
700.0
800.0
900.0
1000.0
0.0 50.0 100.0 150.0 200.0 250.0 300.0
Capital Investment in millions
Figure 4: Normality probability Plot of residuals of Sales
Equal variance: To meet the assumption of variance, the variance of residuals should statistically be
the same across all values of the explanatory variable – meaning it should have a scattered dot
pattern where the variance across values is consistent. In figure 5 below we can notice that most of
the values after ~75 on the x-axis is positive, leading to believe it is NOT statistically same across the
dataset, but again, the dot plot at the extreme right shows a -60 on the y-axis. Due to this, I’m
inclined to trust that the data meets the assumption of variance to an extent and considering the
rest of the assumptions being met, I conclude the data qualifies for linear regression.
Figure 5: Excel graph of Residual plot of Capital investment (exploratory variable)
-200
0
200
400
600
800
1000
-10 10 30 50 70 90 110
Sample Percentile
-80
-60
-40
-20
0
20
40
60
0.0 50.0 100.0 150.0 200.0 250.0 300.0
Capital Investment
,
Sheet1
| Region | NumberOfEmployees | InventoryValue | AverageMonthlyOrders |
| 1 | 4 | 690.23 | 277 |
| 1 | 2 | 786.10 | 254 |
| 1 | 5 | 764.79 | 244 |
| 1 | 5 | 766.59 | 259 |
| 1 | 6 | 729.11 | 238 |
| 1 | 7 | 687.91 | 245 |
| 1 | 7 | 724.88 | 238 |
| 1 | 8 | 817.39 | 232 |
| 1 | 8 | 785.23 | 258 |
| 1 | 8 | 756.49 | 265 |
| 1 | 9 | 757.32 | 257 |
| 1 | 9 | 670.83 | 231 |
| 1 | 10 | 684.18 | 235 |
| 1 | 10 | 789.23 | 243 |
| 1 | 10 | 705.89 | 250 |
| 1 | 10 | 704.03 | 261 |
| 1 | 11 | 699.69 | 262 |
| 1 | 11 | 756.45 | 231 |
| 1 | 11 | 728.59 | 214 |
| 1 | 11 | 668.52 | 286 |
| 1 | 11 | 683.30 | 294 |
| 1 | 11 | 656.70 | 281 |
| 1 | 11 | 826.58 | 289 |
| 1 | 11 | 715.32 | 283 |
| 1 | 11 | 747.24 | 286 |
| 1 | 11 | 720.59 | 283 |
| 1 | 12 | 811.53 | 280 |
| 1 | 12 | 747.63 | 273 |
| 1 | 12 | 725.63 | 273 |
| 1 | 12 | 769.11 | 267 |
| 2 | 12 | 726.56 | 273 |
| 2 | 12 | 781.35 | 271 |
| 2 | 12 | 767.29 | 279 |
| 2 | 12 | 688.26 | 275 |
| 2 | 13 | 762.97 | 269 |
| 2 | 13 | 771.11 | 285 |
| 2 | 13 | 739.76 | 289 |
| 2 | 14 | 637.04 | 293 |
| 2 | 14 | 828.24 | 284 |
| 2 | 14 | 740.42 | 294 |
| 2 | 14 | 787.41 | 309 |
| 2 | 14 | 786.27 | 309 |
| 2 | 15 | 674.31 | 314 |
| 2 | 15 | 731.20 | 294 |
| 2 | 15 | 748.92 | 287 |
| 2 | 16 | 746.96 | 289 |
| 2 | 16 | 712.93 | 309 |
| 2 | 16 | 806.78 | 320 |
| 2 | 16 | 789.56 | 296 |
| 2 | 17 | 748.03 | 295 |
| 2 | 17 | 713.43 | 295 |
| 2 | 17 | 706.83 | 313 |
| 2 | 17 | 668.78 | 310 |
| 2 | 17 | 673.44 | 326 |
| 2 | 17 | 726.98 | 318 |
| 2 | 17 | 648.94 | 308 |
| 2 | 17 | 729.23 | 293 |
| 2 | 17 | 662.34 | 298 |
| 2 | 17 | 764.95 | 313 |
| 2 | 18 | 699.68 | 293 |
| 2 | 18 | 624.28 | 317 |
| 3 | 18 | 779.76 | 322 |
| 3 | 18 | 802.43 | 304 |
| 3 | 18 | 705.70 | 325 |
| 3 | 18 | 769.53 | 315 |
| 3 | 18 | 697.75 | 312 |
| 3 | 18 | 703.19 | 291 |
| 3 | 18 | 802.11 | 321 |
| 3 | 18 | 829.01 | 323 |
| 3 | 19 | 642.04 | 307 |
| 3 | 19 | 743.34 | 303 |
| 3 | 19 | 806.15 | 317 |
| 3 | 19 | 726.79 | 334 |
| 3 | 19 | 798.96 | 329 |
