Chat with us, powered by LiveChat For each game, calculate the total global sales as well as the total sales (still for each game) in each of North America, Europe, Japan, and other parts of the world. - EssayAbode

For each game, calculate the total global sales as well as the total sales (still for each game) in each of North America, Europe, Japan, and other parts of the world.

Question 1 (3 points)

Use the vgsales data from the file vgsales.xlsx. For each game, calculate the total global sales as well as the total sales (still for each game) in each of North America, Europe, Japan, and other parts of the world.  

Put these data together as one table where the left-most column is the game name, the middle 4 columns are the total sales in NA, EU, JP, and other sales, and the right-most column is the total global sales.  Sort the data by total global sales.  Show only the top-10 rows (ie, 10 games with the highest total global sales).  

It is fine to take a screenshot of your data in RStudio, provided that the font is large enough that the TA can read it; you do not need to export the data from R and create a “pretty” table in another program.

Hint re importing data: To import the vgsales data into R, you can first convert the data to a CSV file and import the data as demonstrated in class with read_csv().  Or you can use the read_excel() function from the readxl package.

Hint re calculations: once the data are in R, you should be able to create the requested table with one set of piped-together commands. This is not a requirement and you will be awared full credit as long as you create the requested table using R any way you like.

Question 2 (5 points)

Import the Order and OrderDetail datasets from order.csv and orderdetail.csv. Use these datasets to calculate the total revenue (in millions) per shipping region. Also calculate the percent of revenue for each shipping region.  Order the rows by total revenue such that the shipping region with the largest total revenue is at the top.

Revenue can be calculated as Unit Price * Quantity * (1 – Discount).

Your table should have nine rows (one per shipping region) and three columns (the shipping region, the total revenue in millions, and revenue per region as a percent of all revenue).  

Question 3 (5 points)

Continue to use the Order and OrderDetail data from question 2, as well as the revenue values you calculated.  Use the “unaggregated” dataset with 621,883 rows (each row is a line item from an order).  Drop the 73 rows with a missing (ie, N/A) value for the shipping year.

Then use facet_grid() to create a grid of histograms on this line-item data. 

Each plot in the grid will be a histogram of the log(revenue)

revenue is a very positively skewed distribution, so we are plotting the natural log of revenue (R uses the log() function to calculate the natural log)

Each row of the grid should be a Shipping Region.

Each column of the grid will be the year in which an order was shipped.

Before creating the plot, you will need to create a ship_year variable. To do that, use the following code to help you:

mutate( ship_date = as.Date(shippeddate, "%m/%d/%Y"),
       ship_year = lubridate::year(ship_date))

This code says that we want to convert the shippeddate variable into a date using the as.Date() function.  We have to tell the as.Date() function how the date us currently written, and so we use "%m/%d/%Y". Then we use the year()function from the lubridate package to “extract” the year values.

You can use code like filter(!is.na(ship_year)) to remove rows where the shipping year has a missing value.

  • Question 4 (5 points)

Use the smartphone customer dataset.  Scale the 6 phone-use variables (gaming, chat, maps, video, social, and reading).  Then run k-means on all 6 variables with K=3.  Answer the following two questions:

  • How many customers are in each cluster?
  • What is the within-cluster sum of squares value?
  • (2 points) Question 5

Plot gaming vs reading minutes as a scatter plot and color the points according to their cluster assignment from question 4. Why do the clusters “overlap” in this plot — ie, the points “mix” near the cluster boundaries — but the clusters did not overlap when we did the k-means example in class?

Related Tags

Academic APA Assignment Business Capstone College Conclusion Course Day Discussion Double Spaced Essay English Finance General Graduate History Information Justify Literature Management Market Masters Math Minimum MLA Nursing Organizational Outline Pages Paper Presentation Questions Questionnaire Reference Response Response School Subject Slides Sources Student Support Times New Roman Title Topics Word Write Writing