- Business Task
- Entity Relationship Diagram
- Solution
Please note that all the information regarding the case study has been sourced from the following link: here.
I also published this on Medium!
Danny is expanding his new Pizza Empire and at the same time, he wants to Uberize it, so Pizza Runner was launched!
Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.
Looking at the customer_orders
table below, we can see that there are
- In the
exclusions
column, there are missing/ blank spaces ' ' and null values. - In the
extras
column, there are missing/ blank spaces ' ' and null values.
Our course of action to clean the table:
- Create a temporary table with all the columns
- Remove null values in
exlusions
andextras
columns and replace with blank space ' '.
CREATE TEMP TABLE customer_orders_temp AS
SELECT
order_id,
customer_id,
pizza_id,
CASE
WHEN exclusions IS null OR exclusions LIKE 'null' THEN ' '
ELSE exclusions
END AS exclusions,
CASE
WHEN extras IS NULL or extras LIKE 'null' THEN ' '
ELSE extras
END AS extras,
order_time
FROM pizza_runner.customer_orders;
This is how the clean customers_orders_temp
table looks like and we will use this table to run all our queries.
Looking at the runner_orders
table below, we can see that there are
- In the
exclusions
column, there are missing/ blank spaces ' ' and null values. - In the
extras
column, there are missing/ blank spaces ' ' and null values
Our course of action to clean the table:
- In
pickup_time
column, remove nulls and replace with blank space ' '. - In
distance
column, remove "km" and nulls and replace with blank space ' '. - In
duration
column, remove "minutes", "minute" and nulls and replace with blank space ' '. - In
cancellation
column, remove NULL and null and and replace with blank space ' '.
CREATE TEMP TABLE runner_orders_temp AS
SELECT
order_id,
runner_id,
CASE
WHEN pickup_time LIKE 'null' THEN ' '
ELSE pickup_time
END AS pickup_time,
CASE
WHEN distance LIKE 'null' THEN ' '
WHEN distance LIKE '%km' THEN TRIM('km' from distance)
ELSE distance
END AS distance,
CASE
WHEN duration LIKE 'null' THEN ' '
WHEN duration LIKE '%mins' THEN TRIM('mins' from duration)
WHEN duration LIKE '%minute' THEN TRIM('minute' from duration)
WHEN duration LIKE '%minutes' THEN TRIM('minutes' from duration)
ELSE duration
END AS duration,
CASE
WHEN cancellation IS NULL or cancellation LIKE 'null' THEN ' '
ELSE cancellation
END AS cancellation
FROM pizza_runner.runner_orders;
Then, we alter the pickup_time
, distance
and duration
columns to the correct data type.
ALTER TABLE runner_orders_temp
ALTER COLUMN pickup_time DATETIME,
ALTER COLUMN distance FLOAT,
ALTER COLUMN duration INT;
This is how the clean runner_orders_temp
table looks like and we will use this table to run all our queries.
SELECT COUNT(*) AS pizza_order_count
FROM customer_orders_temp;
Answer:
- Total of 14 pizzas were ordered.
SELECT COUNT(DISTINCT order_id) AS unique_order_count
FROM customer_orders_temp;
Answer:
- There are 10 unique customer orders.
SELECT
runner_id,
COUNT(order_id) AS successful_orders
FROM #runner_orders
WHERE distance != 0
GROUP BY runner_id;
Answer:
- Runner 1 has 4 successful delivered orders.
- Runner 2 has 3 successful delivered orders.
- Runner 3 has 1 successful delivered order.
SELECT
p.pizza_name,
COUNT(c.pizza_id) AS delivered_pizza_count
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
JOIN pizza_names AS p
ON c.pizza_id = p.pizza_id
WHERE r.distance != 0
GROUP BY p.pizza_name;
Answer:
- There are 9 delivered Meatlovers pizzas and 3 Vegetarian pizzas.
SELECT
c.customer_id,
p.pizza_name,
COUNT(p.pizza_name) AS order_count
FROM #customer_orders AS c
JOIN pizza_names AS p
ON c.pizza_id= p.pizza_id
GROUP BY c.customer_id, p.pizza_name
ORDER BY c.customer_id;
Answer:
- Customer 101 ordered 2 Meatlovers pizzas and 1 Vegetarian pizza.
- Customer 102 ordered 2 Meatlovers pizzas and 2 Vegetarian pizzas.
- Customer 103 ordered 3 Meatlovers pizzas and 1 Vegetarian pizza.
- Customer 104 ordered 1 Meatlovers pizza.
- Customer 105 ordered 1 Vegetarian pizza.
WITH pizza_count_cte AS
(
SELECT
c.order_id,
COUNT(c.pizza_id) AS pizza_per_order
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
WHERE r.distance != 0
GROUP BY c.order_id
)
SELECT
MAX(pizza_per_order) AS pizza_count
FROM pizza_count_cte;
Answer:
- Maximum number of pizza delivered in a single order is 3 pizzas.
SELECT
c.customer_id,
SUM(
CASE WHEN c.exclusions <> ' ' OR c.extras <> ' ' THEN 1
ELSE 0
END) AS at_least_1_change,
SUM(
CASE WHEN c.exclusions = ' ' AND c.extras = ' ' THEN 1
ELSE 0
END) AS no_change
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
WHERE r.distance != 0
GROUP BY c.customer_id
ORDER BY c.customer_id;
Answer:
- Customer 101 and 102 likes his/her pizzas per the original recipe.
- Customer 103, 104 and 105 have their own preference for pizza topping and requested at least 1 change (extra or exclusion topping) on their pizza.
SELECT
SUM(
CASE WHEN exclusions IS NOT NULL AND extras IS NOT NULL THEN 1
ELSE 0
END) AS pizza_count_w_exclusions_extras
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
WHERE r.distance >= 1
AND exclusions <> ' '
AND extras <> ' ';
Answer:
- Only 1 pizza delivered that had both extra and exclusion topping. That’s one fussy customer!
SELECT
DATEPART(HOUR, [order_time]) AS hour_of_day,
COUNT(order_id) AS pizza_count
FROM #customer_orders
GROUP BY DATEPART(HOUR, [order_time]);
Answer:
- Highest volume of pizza ordered is at 13 (1:00 pm), 18 (6:00 pm) and 21 (9:00 pm).
- Lowest volume of pizza ordered is at 11 (11:00 am), 19 (7:00 pm) and 23 (11:00 pm).
SELECT
FORMAT(DATEADD(DAY, 2, order_time),'dddd') AS day_of_week, -- add 2 to adjust 1st day of the week as Monday
COUNT(order_id) AS total_pizzas_ordered
FROM #customer_orders
GROUP BY FORMAT(DATEADD(DAY, 2, order_time),'dddd');
Answer:
- There are 5 pizzas ordered on Friday and Monday.
- There are 3 pizzas ordered on Saturday.
- There is 1 pizza ordered on Sunday.
SELECT
DATEPART(WEEK, registration_date) AS registration_week,
COUNT(runner_id) AS runner_signup
FROM runners
GROUP BY DATEPART(WEEK, registration_date);
Answer:
- On Week 1 of Jan 2021, 2 new runners signed up.
- On Week 2 and 3 of Jan 2021, 1 new runner signed up.
2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
WITH time_taken_cte AS
(
SELECT
c.order_id,
c.order_time,
r.pickup_time,
DATEDIFF(MINUTE, c.order_time, r.pickup_time) AS pickup_minutes
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
WHERE r.distance != 0
GROUP BY c.order_id, c.order_time, r.pickup_time
)
SELECT
AVG(pickup_minutes) AS avg_pickup_minutes
FROM time_taken_cte
WHERE pickup_minutes > 1;
Answer:
- The average time taken in minutes by runners to arrive at Pizza Runner HQ to pick up the order is 15 minutes.
WITH prep_time_cte AS
(
SELECT
c.order_id,
COUNT(c.order_id) AS pizza_order,
c.order_time,
r.pickup_time,
DATEDIFF(MINUTE, c.order_time, r.pickup_time) AS prep_time_minutes
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
WHERE r.distance != 0
GROUP BY c.order_id, c.order_time, r.pickup_time
)
SELECT
pizza_order,
AVG(prep_time_minutes) AS avg_prep_time_minutes
FROM prep_time_cte
WHERE prep_time_minutes > 1
GROUP BY pizza_order;
Answer:
- On average, a single pizza order takes 12 minutes to prepare.
- An order with 3 pizzas takes 30 minutes at an average of 10 minutes per pizza.
- It takes 16 minutes to prepare an order with 2 pizzas which is 8 minutes per pizza — making 2 pizzas in a single order the ultimate efficiency rate.
SELECT
c.customer_id,
AVG(r.distance) AS avg_distance
FROM #customer_orders AS c
JOIN #runner_orders AS r
ON c.order_id = r.order_id
WHERE r.duration != 0
GROUP BY c.customer_id;
Answer:
(Assuming that distance is calculated from Pizza Runner HQ to customer’s place)
- Customer 104 stays the nearest to Pizza Runner HQ at average distance of 10km, whereas Customer 105 stays the furthest at 25km.
Edit 08/10/21: Thanks to my reader, Ankush Taneja on Medium who caught my mistake. I've amended to the correct solution. Also, I was doing this case study using SQL Server few months ago, but I'm using PostgreSQL on SQLpad now so there could be a slight difference to the syntax.
Firstly, I'm going to filter results with non-null duration first just to have a feel. You can skip this step and go straight to the answer.
SELECT
order_id, duration
FROM #runner_orders
WHERE duration not like ' ';
SELECT MAX(duration::NUMERIC) - MIN(duration::NUMERIC) AS delivery_time_difference
FROM runner_orders2
where duration not like ' ';
Answer:
- The difference between longest (40 minutes) and shortest (10 minutes) delivery time for all orders is 30 minutes.
6. What was the average speed for each runner for each delivery and do you notice any trend for these values?
SELECT
r.runner_id,
c.customer_id,
c.order_id,
COUNT(c.order_id) AS pizza_count,
r.distance, (r.duration / 60) AS duration_hr ,
ROUND((r.distance/r.duration * 60), 2) AS avg_speed
FROM #runner_orders AS r
JOIN #customer_orders AS c
ON r.order_id = c.order_id
WHERE distance != 0
GROUP BY r.runner_id, c.customer_id, c.order_id, r.distance, r.duration
ORDER BY c.order_id;
Answer:
(Average speed = Distance in km / Duration in hour)
- Runner 1’s average speed runs from 37.5km/h to 60km/h.
- Runner 2’s average speed runs from 35.1km/h to 93.6km/h. Danny should investigate Runner 2 as the average speed has a 300% fluctuation rate!
- Runner 3’s average speed is 40km/h
SELECT
runner_id,
ROUND(100 * SUM(
CASE WHEN distance = 0 THEN 0
ELSE 1 END) / COUNT(*), 0) AS success_perc
FROM #runner_orders
GROUP BY runner_id;
Answer:
- Runner 1 has 100% successful delivery.
- Runner 2 has 75% successful delivery.
- Runner 3 has 50% successful delivery
(It’s not right to attribute successful delivery to runners as order cancellations are out of the runner’s control.)
WITH toppings_cte AS (
SELECT
pizza_id,
REGEXP_SPLIT_TO_TABLE(toppings, '[,\s]+')::INTEGER AS topping_id
FROM pizza_runner.pizza_recipes)
SELECT
t.topping_id, pt.topping_name,
COUNT(t.topping_id) AS topping_count
FROM toppings_cte t
INNER JOIN pizza_runner.pizza_toppings pt
ON t.topping_id = pt.topping_id
GROUP BY t.topping_id, pt.topping_name
ORDER BY topping_count DESC;
Solution
4. Generate an order item for each record in the customers_orders table in the format of one of the following:
- Meat Lovers
- Meat Lovers - Exclude Beef
- Meat Lovers - Extra Bacon
- Meat Lovers - Exclude Cheese, Bacon - Extra Mushroom, Peppers
5. Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients
7. What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?
- If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?
- What if there was an additional $1 charge for any pizza extras?
- Add cheese is $1 extra
- The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.
- Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?
- customer_id
- order_id
- runner_id
- rating
- order_time
- pickup_time
- Time between order and pickup
- Delivery duration
- Average speed
- Total number of pizzas
- If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometre traveled - how much money does Pizza Runner have left over after these deliveries?