For each game, calculate the total global sales as well as the total sales (still for each game) in each of North America, Europe, Japan, and other parts of the world.

17 Aug For each game, calculate the total global sales as well as the total sales (still for each game) in each of North America, Europe, Japan, and other parts of the world.

Posted at 15:05h in business by

Question 1 (3 points)

Use the vgsales data from the file vgsales.xlsx. For each game, calculate the total global sales as well as the total sales (still for each game) in each of North America, Europe, Japan, and other parts of the world.

Put these data together as one table where the left-most column is the game name, the middle 4 columns are the total sales in NA, EU, JP, and other sales, and the right-most column is the total global sales. Sort the data by total global sales. Show only the top-10 rows (ie, 10 games with the highest total global sales).

It is fine to take a screenshot of your data in RStudio, provided that the font is large enough that the TA can read it; you do not need to export the data from R and create a “pretty” table in another program.

Hint re importing data: To import the vgsales data into R, you can first convert the data to a CSV file and import the data as demonstrated in class with read_csv(). Or you can use the read_excel() function from the readxl package.

Hint re calculations: once the data are in R, you should be able to create the requested table with one set of piped-together commands. This is not a requirement and you will be awared full credit as long as you create the requested table using R any way you like.

Question 2 (5 points)

Import the Order and OrderDetail datasets from order.csv and orderdetail.csv. Use these datasets to calculate the total revenue (in millions) per shipping region. Also calculate the percent of revenue for each shipping region. Order the rows by total revenue such that the shipping region with the largest total revenue is at the top.

Revenue can be calculated as Unit Price * Quantity * (1 – Discount).

Your table should have nine rows (one per shipping region) and three columns (the shipping region, the total revenue in millions, and revenue per region as a percent of all revenue).

Question 3 (5 points)

Continue to use the Order and OrderDetail data from question 2, as well as the revenue values you calculated. Use the “unaggregated” dataset with 621,883 rows (each row is a line item from an order). Drop the 73 rows with a missing (ie, N/A) value for the shipping year.

Then use facet_grid() to create a grid of histograms on this line-item data.

Each plot in the grid will be a histogram of the log(revenue)

revenue is a very positively skewed distribution, so we are plotting the natural log of revenue (R uses the log() function to calculate the natural log)

Each row of the grid should be a Shipping Region.

Each column of the grid will be the year in which an order was shipped.

Before creating the plot, you will need to create a ship_year variable. To do that, use the following code to help you:

mutate( ship_date = as.Date(shippeddate, "%m/%d/%Y"),
ship_year = lubridate::year(ship_date))

This code says that we want to convert the shippeddate variable into a date using the as.Date() function. We have to tell the as.Date() function how the date us currently written, and so we use "%m/%d/%Y". Then we use the year()function from the lubridate package to “extract” the year values.

You can use code like filter(!is.na(ship_year)) to remove rows where the shipping year has a missing value.

Question 4 (5 points)

Use the smartphone customer dataset. Scale the 6 phone-use variables (gaming, chat, maps, video, social, and reading). Then run k-means on all 6 variables with K=3. Answer the following two questions:

How many customers are in each cluster?
What is the within-cluster sum of squares value?
(2 points) Question 5

Plot gaming vs reading minutes as a scatter plot and color the points according to their cluster assignment from question 4. Why do the clusters “overlap” in this plot — ie, the points “mix” near the cluster boundaries — but the clusters did not overlap when we did the k-means example in class?

17 Aug For each game, calculate the total global sales as well as the total sales (still for each game) in each of North America, Europe, Japan, and other parts of the world.

Related Tags

Who We Are

Some Categories

More Links

We Accept