Chat with us, powered by LiveChat All students are to complete simple regression preliminary analysis.? Examine the correlations between all interval and ratio level variables and select, for your investigation, the 2 varia - EssayAbode

All students are to complete simple regression preliminary analysis.? Examine the correlations between all interval and ratio level variables and select, for your investigation, the 2 varia

 

  • This Assignment requires you to submit to Moodle a simple regression preliminary report write up. Submit in PDF format only

 You will perform a simple regression pre-assessment to assist you in identifying evidence for (or against) the feasibility of performing a simple regression analysis.  That is, you are NOT being requested to run a full simple regression analysis of this dataset. Please DO NOT submit simple regression tables with model or coefficient significance interpretations.

DIRECTIONS: 

1) All students are to complete simple regression preliminary analysis.  Examine the correlations between all interval and ratio level variables and select, for your investigation, the 2 variables with the highest correlation. Report the correlation table and your 2 variable choices.

2) Reference the preliminary analysis guide and preliminary report writeup exemplar attached to this assignment.  Also reference the readings assigned in Moodle, textbook,  and notes from this week posted to Moodle to support your efforts. 

Be sure to address ALL of the following, in your write up, including all supportive numerical summaries and charts formatted to class expectations:

  • Has the linearity assumption been met? Why or why not? 
  • Has the normality assumption been met? Why or why not? 
  • Has the independence assumption been met? Why or why not? 
  • Has the equal variances assumption been met? Why or why not?
  • Is a transformation warranted? Why or why not? Why or why not? 

NO PLAGIARISM

Simple Regression Preliminary Analysis Report Guide

Please submit preliminary report writeup, to Moodle discussion as an attachment in PDF format only. Tables, charts, and graphs should be formatted as expected for the class. Font size of 11 pt or higher.  DO NOT SUBMIT YOUR EXCEL DATASET.  Copy and paste the numerical prompts below into your submission document and record your responses , as applicable, directly beneath. Include the following in your submission:

Simple Regression Pre-analysis (include ALL charts, graphs and tables you generate formatted to class expectations). Assume all variables (with the exception of the subject variable) are ratio level.

1. YOU MUST HAVE the Data Analysis Tool Pack add-in installed to complete this assignment. Please review the notes for the week regarding how to use the Regression tool within the Data Analysis Tool Pack.

2. Use Excel to calculate and the report the correlation number for all candidate variables for this analysis. Interpret the result of the correlation number by discussing the linear strength and direction of the relations observed .

3. Select 1 variable to use as an explanatory and one as a response variable. State your choices

4. Use Excel to generate and report a scatter plot for the 2 variables. The Explanatory variable should be on the horizontal axis. Interpret the form and direction of the plot. Format chart to class expectations.

5. Using only the visual inspection of the scatter plot, is a transformation warranted? Please review course notes and use what you notice about the dot pattern within the scatter plot as evidence to support your answer.

6. Use the Data Analysis Tool pack to run the regression analysis over your variables to generate a residual plot in MS Excel for these variables. DO NOT INTERPRET NOR REPORT THE NUMERICAL SUMMARIES generated in the regression tables. Only examine the residual plot generated by Excel. Copy and Paste ONLY the residual plot into your preliminary report. Using only your visual inspection of the residual plot, is a transformation warranted? Review course notes and use what you notice about the dot pattern within the residual plot to support your answer.

7. Check the 4 simple regression assumptions (linearity, normality, equal variance, independence). Reference class notes so that you are clear how to perform these checks using the Excel worksheet . Create 4 subheadings to discuss EACH of the 4 assumptions indicating if the assumptions were met or were violated. Under each subheading, also include (or reference by name and number ‘Table 1, Figure 2 etc …) any tables or charts you’ve generated as evidence for your conclusions. Format all charts and tables to class expectations (see exemplar). Your claims regarding the assumptions must be based on content contained within the tables, numerical summaries , or charts that you generated. Please reference them by name in your write up. For instance “ It can be seen in Figure 1 ….” Etc …

,

Simple Regression Preliminary Analysis Report

Assumption for the analysis: Let us assume each row in dataset indicates the values in a calendar

month.

a) comp.million : Compensation given to ‘n’ the employees working at the time, in millions. b)

sales : Sales in millions.

c) No. employees : Total number of employees working in the given month.

d) Cap. Investment : Amount invested in the company in a given month in millions.

e) Manufacturing : Indication if the row values are relevant to manufacturing or not.

1. YOU MUST HAVE the Data Analysis Tool Pack add-in installed to complete this assignment. Please

review the notes for the week regarding how to use the Regression tool within the Data Analysis

Tool Pack.

2. Use Excel to calculate and the report the correlation number for all candidate variables for this

analysis. Interpret the result of the correlation number by discussing the linear strength and

direction of the relations observed.

A: The correlation coefficient can range between -1 and +1. A value 0 means there is no linear

correlation between the 2 variables. A value of 1 means there is a perfectly positive linear correlation

and a positive number, between 0 and 1 indicates that a variable increases in value along with the other

by the said factor. -1 means there is a perfectly negative correlation and a negative value indicates that a

value decreases as the other value increases by the said factor.

With the above understanding, we can interpret that compensation, sales, no. of employees, and

investment variables are all strongly corelated linearly, although not perfectly since the numbers

highlighted in green are very close to 1. This indicates that an increase of each of these variables, has a

direct impact on the others positively.

Manufacturing, on the other hand, has weak positive correlation with compensation, employees and

investment (highlighted in blue) and a slightly negative impact on sales(highlighted in red). This indicates

that the sales values relatively drops with manufacturing, but manufacturing variable doesn’t have a

major impact on rest of the variables.

Table 1: Correlation matrix of all candidate variables in dataset

comp.million sales no.employees cap.investment manufacturing

comp.million 1 sales 0.950554719 1 no.employees 0.955076813

0.997065477 1 cap.investment 0.922777565 0.994771967 0.990264565 1

manufacturing 0.015793034 -0.003322363 0.055742601 0.01950762 1

3. Select 1 variable to use as an explanatory and one as a response variable. State your choices

A: Explanatory variable (x) – Capital Investment

Response variable (y) – Sales

4. Use Excel to generate and report a scatter plot for the 2 variables. The Explanatory variable should

be on the horizontal axis. Interpret the form and direction of the plot. Format chart to class

expectations.

A: We can interpret from the above graph that with the increase in investment, the sales have

increased. Hence, we can conclude that the variables being studied have a positive linear

association.

Figure 1: Scatter plot of Sales with respect to Capital investment (both in millions)

5. Using only the visual inspection of the scatter plot, is a transformation warranted? Please review

course notes and use what you notice about the dot pattern within the scatter plot as evidence to

support your answer.

A: Figure 1, shows a positive linear association and if we were to draw a regression line, it would be

pretty close to the plotted variables. Transformations are warranted when ‘residual’ values are high

and thus applied on explanatory variable, response variable, or both, on a trial and error basis until

we find a best fit to match a linear regression equation. Transformations can be a Log, Ln, square

root, reciprocal or exponential functions. Once we have an equation that brings the scatter plot dots

close enough to the regression line, we can then predict values on both higher and lower ends of

the variables by evaluating the equation. But since in this case, we have a scatter plot that is already

close to the imaginary regression line, a transformation is not warranted.

6. Use the Data Analysis Tool pack to run the regression analysis over your variables to generate a

residual plot in MS Excel for these variables. DO NOT INTERPRET NOR REPORT THE NUMERICAL

SUMMARIES generated in the regression tables. Only examine the residual plot generated by Excel.

Copy and Paste ONLY the residual plot into your preliminary report. Using only your visual

0.0

100.0

200.0

300.0

400.0

500.0

600.0

700.0

800.0

900.0

0.0 50.0 100.0 150.0 200.0 250.0 300.0

Capital investment in millions

inspection of the residual plot, is a transformation warranted? Review course notes and use what

you notice about the dot pattern within the residual plot to support your answer.

A: Residuals and residual plots determine how well a regression line describes the data. This can be

confirmed if no obvious pattern is observed in a residual plot and if the residuals are relatively small

in size. In Figure 2 below, though we do not see an obvious pattern, we do see 2 values (marked in

red) that are far from the regression line with a deviation of about +40 and -60 units. Also, 4 plots

between +20 and +40 units on the Residual axis(marked in green). This does not warrant for much

of a deviation, but since in our case, we are dealing with values in millions, I believe this can have a

significant effect and warrants a transformation. As we move from left to right on the x-axis, we can

see that the spread increases in the top half indicating that our transformation may be warranted.

Figure 2: Excel graph of Residual plot of Sales (response variable)

7. Check the 4 simple regression assumptions (linearity, normality, equal variance, independence).

Reference class notes so that you are clear how to perform these checks using the Excel worksheet .

Create 4 subheadings to discuss EACH of the 4 assumptions indicating if the assumptions were met

or were violated. Under each subheading, also include (or reference by name and number ‘Table

1, Figure 2 etc …) any tables or charts you’ve generated as evidence for your conclusions. Format all

charts and tables to class expectations (see exemplar). Your claims regarding the assumptions must

be based on content contained within the tables, numerical summaries , or charts that you

generated. Please reference them by name in your write up. For instance “ It can be seen in Figure

1 ….” Etc …

A: For a dataset to be eligible for a simple linear regression, all 4 assumptions mentioned below

should be satisfied. I will assess the same for sales and capital investment variables assessing their

properties for each.

-80

-60

-40

-20

0

20

40

60

0 100 200 300 400 500 600 700 800 900

Sales

Linearity: Linearity can be checked from a scatter plot to see what type of correlation it displays i.e.

positive, negative, or none. Since the scatter plot below displays positive linearity, the linearity

assumption is met and Sales and Capital investment data are eligible for linear regression analysis.

Figure 3: Regression line for the scatter plot data

Independence: Here, we only see if there exists a positive, negative or no correlation in the residual

plot and since Figure 2 showing residual plot of sales does not display correlation between the 2

variables and appear to be random, we are good to use this data for a linear regression. To support

this theory, applying =correl(<predicted_sales>,<residuals>) formula in excel fetches the

value -2.98663𝑒!"# which is very close to zero.

Normality: To check the normality, a probability plot of residuals are used as shown in Figure 4

below. The rule is that normality assumption is considered met if the dot pattern closely follows the

trend line, and depending on what pattern it follows, we can deduce the shape of the curve to be

normal, skewed left or right, thick or thin tailed. Figure 4 shows the plots following the trend line

and this specific pattern indicates the data is skewed to the left thus proving the data is good fit for

linear regression.

y = 3.0108x + 23.287 R² = 0.9896

0.0

100.0

200.0

300.0

400.0

500.0

600.0

700.0

800.0

900.0

1000.0

0.0 50.0 100.0 150.0 200.0 250.0 300.0

Capital Investment in millions

Figure 4: Normality probability Plot of residuals of Sales

Equal variance: To meet the assumption of variance, the variance of residuals should statistically be

the same across all values of the explanatory variable – meaning it should have a scattered dot

pattern where the variance across values is consistent. In figure 5 below we can notice that most of

the values after ~75 on the x-axis is positive, leading to believe it is NOT statistically same across the

dataset, but again, the dot plot at the extreme right shows a -60 on the y-axis. Due to this, I’m

inclined to trust that the data meets the assumption of variance to an extent and considering the

rest of the assumptions being met, I conclude the data qualifies for linear regression.

Figure 5: Excel graph of Residual plot of Capital investment (exploratory variable)

-200

0

200

400

600

800

1000

-10 10 30 50 70 90 110

Sample Percentile

-80

-60

-40

-20

0

20

40

60

0.0 50.0 100.0 150.0 200.0 250.0 300.0

Capital Investment

,

Sheet1

Related Tags

AcademicAPAAssignmentBusinessCapstoneCollegeConclusionCourseDayDiscussionDouble SpacedEssayEnglishFinanceGeneralGraduateHistoryInformationJustifyLiteratureManagementMarketMastersMathMinimumMLANursingOrganizationalOutlinePagesPaperPresentationQuestionsQuestionnaireReferenceResponseResponseSchoolSubjectSlidesSourcesStudentSupportTimes New RomanTitleTopicsWordWriteWriting

Region NumberOfEmployees InventoryValue AverageMonthlyOrders
1 4 690.23 277
1 2 786.10 254
1 5 764.79 244
1 5 766.59 259
1 6 729.11 238
1 7 687.91 245
1 7 724.88 238
1 8 817.39 232
1 8 785.23 258
1 8 756.49 265
1 9 757.32 257
1 9 670.83 231
1 10 684.18 235
1 10 789.23 243
1 10 705.89 250
1 10 704.03 261
1 11 699.69 262
1 11 756.45 231
1 11 728.59 214
1 11 668.52 286
1 11 683.30 294
1 11 656.70 281
1 11 826.58 289
1 11 715.32 283
1 11 747.24 286
1 11 720.59 283
1 12 811.53 280
1 12 747.63 273
1 12 725.63 273
1 12 769.11 267
2 12 726.56 273
2 12 781.35 271
2 12 767.29 279
2 12 688.26 275
2 13 762.97 269
2 13 771.11 285
2 13 739.76 289
2 14 637.04 293
2 14 828.24 284
2 14 740.42 294
2 14 787.41 309
2 14 786.27 309
2 15 674.31 314
2 15 731.20 294
2 15 748.92 287
2 16 746.96 289
2 16 712.93 309
2 16 806.78 320
2 16 789.56 296
2 17 748.03 295
2 17 713.43 295
2 17 706.83 313
2 17 668.78 310
2 17 673.44 326
2 17 726.98 318
2 17 648.94 308
2 17 729.23 293
2 17 662.34 298
2 17 764.95 313
2 18 699.68 293
2 18 624.28 317
3 18 779.76 322
3 18 802.43 304
3 18 705.70 325
3 18 769.53 315
3 18 697.75 312
3 18 703.19 291
3 18 802.11 321
3 18 829.01 323
3 19 642.04 307
3 19 743.34 303
3 19 806.15 317
3 19 726.79 334
3 19 798.96 329