Chat with us, powered by LiveChat The final project for the course is a technical blog post related to a data analysis project you will work on piecemeal over the course of the se - EssayAbode

The final project for the course is a technical blog post related to a data analysis project you will work on piecemeal over the course of the se

 

The final project for the course is a technical blog post related to a data analysis project you will work on piecemeal over the course of the semester. 

The project is very open ended. The objective is to demonstrate your skill in asking meaningful questions of your data and answering them with results of the data analysis using R / Rmarkdown, and that your proficiency in interpreting and presenting the results.  The goal is not to conduct an exhaustive data analysis. The data analysis part should meet the following criteria:

1. Perform exploratory data analysis summarizing your data using descriptive statistics / summary statistics and visualizations relevant to your questions or ones that highlight some interesting insight.

2. Demonstrate at least two of the following techniques we have learned in class and that helps answer your question: PCA, hypothesis testing / confidence interval, regression analysis (linear /logistic) 

Proposal

The first task is to identify the dataset, understand the data and write questions you are planning to answer using that dataset. You may pick a data set from one of the resources mentioned on this webpage (Links to an external site.).  The proposal should meet the following criteria:

1. Perform checks to determine quality of the data (missing values, outliers, etc.)

2. Proposal on what questions you are interested in answering from the data

3. Initial visualizations and if required transform to get the data ready 

A good reference for ideas on questions and EDA in general: https://r4ds.had.co.nz/exploratory-data-analysis.html#questions

More information on the format:

It should be about 2+ pages in length, not exceeding 10 with appendix. It should include roughly the following sections:

1. Background or the context of data selected – sources, description of how it was collected, time period it represents, context in it was collected if available, perhaps why you selected it

2. Description of the data – how big is it (number of observations, variables), how many numeric variables, how many categorical variables, description of the variables

3. Goal – What questions you plan to understand from the data. 

3. Analysis – Descriptive statistics and visualization of key variables 

4. Summary of findings from the analysis and further questions for future analysis

5. References – link to data or analysis sources you have referenced for the report

6. Appendix – all the visualization that does not support your questions directly can go here

Final Write-up

The project should include

1. Introduction: What is your research question? Why do you care? Why should others care? If you know of any other related work done by others, please include a brief description.

2. Data: Include context about the data covering:

a. Data source: Include the citation for your data, and provide link to the source.

b. Data collection: Context on how the data was collected?

c. Cases: What are the cases (units of observation or experiment)? What do the rows represent in your dataset?

d. Variables: What are the variables you will be studying?

e. Type of study: was it an observational study or an experiment?

f. Data clean-up: (Optional) If you had to do any data clean up (missing values, outliers, transformation), include a very brief description of your steps.

3. Exploratory Data Analysis: summarize your data using descriptive statistics / summary statistics and visualizations relevant to your questions or ones that highlight some interesting insight. Additional plots not relevant to your research question can be included in the appendix.

4. Data Analysis: Pick and perform two of the following techniques we have learned in class and that helps answer your question about the dataset: PCA, hypothesis testing / confidence interval, regression analysis (linear /logistic) 

5. Conclusion: Summarize your findings and include a discussion of what you have learned about your data through this project. You may also want to include limitations of your approach and include ideas for possible future work. 

6. References: Include links that you have referenced for this project.

Patient Diabetes Data (1)

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
6 148 72 35 0 33.6 0.627 50 1
1 85 66 29 0 26.6 0.351 31 0
8 183 64 0 0 23.3 0.672 32 1
1 89 66 23 94 28.1 0.167 21 0
0 137 40 35 168 43.1 2.288 33 1
5 116 74 0 0 25.6 0.201 30 0
3 78 50 32 88 31 0.248 26 1
10 115 0 0 0 35.3 0.134 29 0
2 197 70 45 543 30.5 0.158 53 1
8 125 96 0 0 0 0.232 54 1
4 110 92 0 0 37.6 0.191 30 0
10 168 74 0 0 38 0.537 34 1
10 139 80 0 0 27.1 1.441 57 0
1 189 60 23 846 30.1 0.398 59 1
5 166 72 19 175 25.8 0.587 51 1
7 100 0 0 0 30 0.484 32 1
0 118 84 47 230 45.8 0.551 31 1
7 107 74 0 0 29.6 0.254 31 1
1 103 30 38 83 43.3 0.183 33 0
1 115 70 30 96 34.6 0.529 32 1
3 126 88 41 235 39.3 0.704 27 0
8 99 84 0 0 35.4 0.388 50 0
7 196 90 0 0 39.8 0.451 41 1
9 119 80 35 0 29 0.263 29 1
11 143 94 33 146 36.6 0.254 51 1
10 125 70 26 115 31.1 0.205 41 1
7 147 76 0 0 39.4 0.257 43 1
1 97 66 15 140 23.2 0.487 22 0
13 145 82 19 110 22.2 0.245 57 0
5 117 92 0 0 34.1 0.337 38 0
5 109 75 26 0 36 0.546 60 0
3 158 76 36 245 31.6 0.851 28 1
3 88 58 11 54 24.8 0.267 22 0
6 92 92 0 0 19.9 0.188 28 0
10 122 78 31 0 27.6 0.512 45 0
4 103 60 33 192 24 0.966 33 0
11 138 76 0 0 33.2 0.42 35 0
9 102 76 37 0 32.9 0.665 46 1
2 90 68 42 0 38.2 0.503 27 1
4 111 72 47 207 37.1 1.39 56 1
3 180 64 25 70 34 0.271 26 0
7 133 84 0 0 40.2 0.696 37 0
7 106 92 18 0 22.7 0.235 48 0
9 171 110 24 240 45.4 0.721 54 1
7 159 64 0 0 27.4 0.294 40 0
0 180 66 39 0 42 1.893 25 1
1 146 56 0 0 29.7 0.564 29 0
2 71 70 27 0 28 0.586 22 0
7 103 66 32 0 39.1 0.344 31 1
7 105 0 0 0 0 0.305 24 0
1 103 80 11 82 19.4 0.491 22 0
1 101 50 15 36 24.2 0.526 26 0
5 88 66 21 23 24.4 0.342 30 0
8 176 90 34 300 33.7 0.467 58 1
7 150 66 42 342 34.7 0.718 42 0
1 73 50 10 0 23 0.248 21 0
7 187 68 39 304 37.7 0.254 41 1
0 100 88 60 110 46.8 0.962 31 0
0 146 82 0 0 40.5 1.781 44 0
0 105 64 41 142 41.5 0.173 22 0
2 84 0 0 0 0 0.304 21 0
8 133 72 0 0 32.9 0.27 39 1
5 44 62 0 0 25 0.587 36 0
2 141 58 34 128 25.4 0.699 24 0
7 114 66 0 0 32.8 0.258 42 1
5 99 74 27 0 29 0.203 32 0
0 109 88 30 0 32.5 0.855 38 1
2 109 92 0 0 42.7 0.845 54 0
1 95 66 13 38 19.6 0.334 25 0
4 146 85 27 100 28.9 0.189 27 0
2 100 66 20 90 32.9 0.867 28 1
5 139 64 35 140 28.6 0.411 26 0
13 126 90 0 0 43.4 0.583 42 1
4 129 86 20 270 35.1 0.231 23 0
1 79 75 30 0 32 0.396 22 0
1 0 48 20 0 24.7 0.14 22 0
7 62 78 0 0 32.6 0.391 41 0
5 95 72 33 0 37.7 0.37 27 0
0 131 0 0 0 43.2 0.27 26 1
2 112 66 22 0 25 0.307 24 0
3 113 44 13 0 22.4 0.14 22 0
2 74 0 0 0 0 0.102 22 0
7 83 78 26 71 29.3 0.767 36 0
0 101 65 28 0 24.6 0.237 22 0
5 137 108 0 0 48.8 0.227 37 1
2 110 74 29 125 32.4 0.698 27 0
13 106 72 54 0 36.6 0.178 45 0
2 100 68 25 71 38.5 0.324 26 0
15 136 70 32 110 37.1 0.153 43 1
1 107 68 19 0 26.5 0.165 24 0
1 80 55 0 0 19.1 0.258 21 0
4 123 80 15 176 32 0.443 34 0
7 81 78 40 48 46.7 0.261 42 0
4 134 72 0 0 23.8 0.277 60 1
2 142 82 18 64 24.7 0.761 21 0
6 144 72 27 228 33.9 0.255 40 0
2 92 62 28 0 31.6 0.13 24 0
1 71 48 18 76 20.4 0.323 22 0
6 93 50 30 64 28.7 0.356 23 0
1 122 90 51 220 49.7 0.325 31 1
1 163 72 0 0 39 1.222 33 1
1 151 60 0 0 26.1 0.179 22 0
0 125 96 0 0 22.5 0.262 21 0
1 81 72 18 40 26.6 0.283 24 0
2 85 65 0 0 39.6 0.93 27 0
1 126 56 29 152 28.7 0.801 21 0
1 96 122 0 0 22.4 0.207 27 0
4 144 58 28 140 29.5 0.287 37 0
3 83 58 31 18 34.3 0.336 25 0
0 95 85 25 36 37.4 0.247 24 1
3 171 72 33 135 33.3 0.199 24 1
8 155 62 26 495 34 0.543 46 1
1 89 76 34 37 31.2 0.192 23 0
4 76 62 0 0 34 0.391 25 0
7 160 54 32 175 30.5 0.588 39 1
4 146 92 0 0 31.2 0.539 61 1
5 124 74 0 0 34 0.22 38 1
5 78 48 0 0 33.7 0.654 25 0
4 97 60 23 0 28.2 0.443 22 0
4 99 76 15 51 23.2 0.223 21 0
0 162 76 56 100 53.2 0.759 25 1
6 111 64 39 0 34.2 0.26 24 0
2 107 74 30 100 33.6 0.404 23 0
5 132 80 0 0 26.8 0.186 69 0
0 113 76 0 0 33.3 0.278 23 1
1 88 30 42 99 55 0.496 26 1
3 120 70 30 135 42.9 0.452 30 0
1 118 58 36 94 33.3 0.261 23 0
1 117 88 24 145 34.5 0.403 40 1
0 105 84 0 0 27.9 0.741 62 1
4 173 70 14 168 29.7 0.361 33 1
9 122 56 0 0 33.3 1.114 33 1
3 170 64 37 225 34.5 0.356 30 1
8 84 74 31 0 38.3 0.457 39 0
2 96 68 13 49 21.1 0.647 26 0
2 125 60 20 140 33.8 0.088 31 0
0 100 70 26 50 30.8 0.597 21 0
0 93 60 25 92 28.7 0.532 22 0
0 129 80 0 0 31.2 0.703 29 0
5 105 72 29 325 36.9 0.159 28 0
3 128 78 0 0 21.1 0.268 55 0
5 106 82 30 0 39.5 0.286 38 0
2 108 52 26 63 32.5 0.318 22 0
10 108 66 0 0 32.4 0.272 42 1
4 154 62 31 284 32.8 0.237 23 0
0 102 75 23 0 0 0.572 21 0
9 57 80 37 0 32.8 0.096 41 0
2 106 64 35 119 30.5 1.4 34 0
5 147 78 0 0 33.7 0.218 65 0
2 90 70 17 0 27.3 0.085 22 0
1 136 74 50 204 37.4 0.399 24 0
4 114 65 0 0 21.9 0.432 37 0
9 156 86 28 155 34.3 1.189 42 1
1 153 82 42 485 40.6 0.687 23 0
8 188 78 0 0 47.9 0.137 43 1
7 152 88 44 0 50 0.337 36 1
2 99 52 15 94 24.6 0.637 21 0
1 109 56 21 135 25.2 0.833 23 0
2 88 74 19 53 29 0.229 22 0
17 163 72 41 114 40.9 0.817 47 1
4 151 90 38 0 29.7 0.294 36 0
7 102 74 40 105 37.2 0.204 45 0
0 114 80 34 285 44.2 0.167 27 0
2 100 64 23 0 29.7 0.368 21 0
0 131 88 0 0 31.6 0.743 32 1
6 104 74 18 156 29.9 0.722 41 1
3 148 66 25 0 32.5 0.256 22 0
4 120 68 0 0 29.6 0.709 34 0
4 110 66 0 0 31.9 0.471 29 0
3 111 90 12 78 28.4 0.495 29 0
6 102 82 0 0 30.8 0.18 36 1
6 134 70 23 130 35.4 0.542 29 1
2 87 0 23 0 28.9 0.773 25 0
1 79 60 42 48 43.5 0.678 23 0
2 75 64 24 55 29.7 0.37 33 0
8 179 72 42 130 32.7 0.719 36 1
6 85 78 0 0 31.2 0.382 42 0
0 129 110 46 130 67.1 0.319 26 1
5 143 78 0 0 45 0.19 47 0
5 130 82 0 0 39.1 0.956 37 1
6 87 80 0 0 23.2 0.084 32 0
0 119 64 18 92 34.9 0.725 23 0
1 0 74 20 23 27.7 0.299 21 0
5 73 60 0 0 26.8 0.268 27 0
4 141 74 0 0 27.6 0.244 40 0
7 194 68 28 0 35.9 0.745 41 1
8 181 68 36 495 30.1 0.615 60 1
1 128 98 41 58 32 1.321 33 1
8 109 76 39 114 27.9 0.64 31 1
5 139 80 35 160 31.6 0.361 25 1
3 111 62 0 0 22.6 0.142 21 0
9 123 70 44 94 33.1 0.374 40 0
7 159 66 0 0 30.4 0.383 36 1
11 135 0 0 0 52.3 0.578 40 1
8 85 55 20 0 24.4 0.136 42 0
5 158 84 41 210 39.4 0.395 29 1
1 105 58 0 0 24.3 0.187 21 0
3 107 62 13 48 22.9 0.678 23 1
4 109 64 44 99 34.8 0.905 26 1
4 148 60 27 318 30.9 0.15 29 1
0 113 80 16 0 31 0.874 21 0
1 138 82 0 0 40.1 0.236 28 0
0 108 68 20 0 27.3 0.787 32 0
2 99 70 16 44 20.4 0.235 27 0
6 103 72 32 190 37.7 0.324 55 0
5 111 72 28 0 23.9 0.407 27 0
8 196 76 29 280 37.5 0.605 57 1
5 162 104 0 0 37.7 0.151 52 1
1 96 64 27 87 33.2 0.289 21 0
7 184 84 33 0 35.5 0.355 41 1
2 81 60 22 0 27.7 0.29 25 0
0 147 85 54 0 42.8 0.375 24 0
7 179 95 31 0 34.2 0.164 60 0
0 140 65 26 130 42.6 0.431 24 1
9 112 82 32 175 34.2 0.26 36 1
12 151 70 40 271 41.8 0.742 38 1
5 109 62 41 129 35.8 0.514 25 1
6 125 68 30 120 30 0.464 32 0
5 85 74 22 0 29 1.224 32 1
5 112 66 0 0 37.8 0.261 41 1
0 177 60 29 478 34.6 1.072 21 1
2 158 90 0 0 31.6 0.805 66 1
7 119 0 0 0 25.2 0.209 37 0
7 142 60 33 190 28.8 0.687 61 0
1 100 66 15 56 23.6 0.666 26 0
1 87 78 27 32 34.6 0.101 22 0
0 101 76 0 0 35.7 0.198 26 0
3 162 52 38 0 37.2 0.652 24 1
4 197 70 39 744 36.7 2.329 31 0
0 117 80 31 53 45.2 0.089 24 0
4 142 86 0 0 44 0.645 22 1
6 134 80 37 370 46.2 0.238 46 1
1 79 80 25 37 25.4 0.583 22 0
4 122 68 0 0 35 0.394 29 0
3 74 68 28 45 29.7 0.293 23 0
4 171 72 0 0 43.6 0.479 26 1
7 181 84 21 192 35.9 0.586 51 1
0 179 90 27 0 44.1 0.686 23 1
9 164 84 21 0 30.8 0.831 32 1
0 104 76 0 0 18.4 0.582 27 0
1 91 64 24 0 29.2 0.192 21 0
4 91 70 32 88 33.1 0.446 22 0
3 139 54 0 0 25.6 0.402 22 1
6 119 50 22 176 27.1 1.318 33 1
2 146 76 35 194 38.2 0.329 29 0
9 184 85 15 0 30 1.213 49 1
10 122 68 0 0 31.2 0.258 41 0
0 165 90 33 680 52.3 0.427 23 0
9 124 70 33 402 35.4 0.282 34 0
1 111 86 19 0 30.1 0.143 23 0
9 106 52 0 0 31.2 0.38 42 0
2 129 84 0 0 28 0.284 27 0
2 90 80 14 55 24.4 0.249 24 0
0 86 68 32 0 35.8 0.238 25 0
12 92 62 7 258 27.6 0.926 44 1
1 113 64 35 0 33.6 0.543 21 1
3 111 56 39 0 30.1 0.557 30 0
2 114 68 22 0 28.7 0.092 25 0
1 193 50 16 375 25.9 0.655 24 0
11 155 76 28 150 33.3 1.353 51 1
3 191 68 15 130 30.9 0.299 34 0
3 141 0 0 0 30 0.761 27 1
4 95 70 32 0 32.1 0.612 24 0
3 142 80 15 0 32.4 0.2 63 0
4 123 62 0 0 32 0.226 35 1
5 96 74 18 67 33.6 0.997 43 0
0 138 0 0 0 36.3 0.933

Related Tags

Academic APA Assignment Business Capstone College Conclusion Course Day Discussion Double Spaced Essay English Finance General Graduate History Information Justify Literature Management Market Masters Math Minimum MLA Nursing Organizational Outline Pages Paper Presentation Questions Questionnaire Reference Response Response School Subject Slides Sources Student Support Times New Roman Title Topics Word Write Writing