아직은 GCP 뱃지가 없지만, 이번 챌린지를 계속 진행하려면
- Pre-requisite: Earn the Build and Optimize Data Warehouses with BigQuery skill badge. Already earned it? Great! Don't have it yet? Be sure to earn it by July 19, 2022.
7월19 일까지 Earn the Build and Optimize Data Warehouses with BigQuery skill badge 를 얻어야 한다.
첫번째 과제를 끝내면
이런식으로 진행중인 퀘스트를 볼 수 있게 된다.
이렇게 많은 LAB을 도전해서 통과해야 뱃지를 받을 수 있다.
오늘 7/3일 일요일 안으로 이 퀘스트를 모두 완료해 보려고 한다.
이번 포스팅에서는 2번째 과정인
Creating a Data Warehouse Through Joins and Unions
과정에 대해서 공부하려고 한다.
이번에는 CLI 환경이 아닌 GUI 환경에서 빅쿼리를 하는 과제다.
Open the BigQuery console
- In the Google Cloud Console, select Navigation menu > BigQuery.
The Welcome to BigQuery in the Cloud Console message box opens. This message box provides a link to the quickstart guide and the release notes.
- Click Done.
The BigQuery console opens.
Create a new dataset to store your tables
First, create a new dataset titled ecommerce in BigQuery.
- In the left pane, click on the name of your BigQuery project (qwiklabs-gcp-xxxx).
- Click on the three dots next to your project name, then select CREATE DATASET.
The Create dataset dialog opens.
- Set the Dataset ID to ecommerce, leave all other options at their default values.
Click Create dataset.
이커머스라는 데이터셋을 빅쿼리에서 생성한다.
모든 옵션들은 default로 설정한다.
- Click on the Disable Editor Tabs link to enable the Query Editor.
Scenario: Your marketing team provided you and your data science team all of the product reviews for your ecommerce website. You partner with them to create a data warehouse in BigQuery which joins together data from three sources:
- Website ecommerce data
- Product inventory stock levels and lead times
- Product review sentiment analysis
In this lab, you examine a new dataset based on product reviews.
시나리오: 귀하의 마케팅 팀은 귀하와 귀하의 데이터 과학 팀에서 귀하의 전자상거래 웹사이트에 대한 모든 제품 리뷰를 제공했습니다. 이들과 협력하여 BigQuery에서 세 가지 소스의 데이터를 결합하는 데이터 웨어하우스를 만듭니다.
Explore the product sentiment dataset
Your data science team has run all of your product reviews through the API and provided you with the average sentiment score and magnitude for each of your products.
First, create a copy the table that the data science team made so you can read it:
빅쿼리에서 이커머스라는 데이터셋을 만들었다.
감정 점수 순으로 내림차순 정렬하기
반대로 감정 점수가 낮은 것을 내림차순으로 정렬하기
SELECT
SKU,
name,
sentimentScore,
sentimentMagnitude
FROM
`data-to-insights.ecommerce.products`
WHERE sentimentScore IS NOT NULL
ORDER BY
sentimentScore
LIMIT 5
Join datasets to find insights
Scenario It's the first of the month and your inventory team has informed you that the orderedQuantity field in the product inventory dataset is out of date. They need your help to query the total sales by product for 08/01/2017 and reference that against the current stock levels in inventory to see which products need to be resupplied first.
시나리오 매월 1일이며 재고 팀에서 제품 재고 데이터 세트의 ordersQuantity 필드가 오래되었다고 알려왔습니다. 그들은 2017년 8월 1일에 대한 제품별 총 매출을 쿼리하고 이를 재고의 현재 재고 수준과 비교하여 어떤 제품이 먼저 재공급되어야 하는지 확인하는 데 도움이 필요합니다.
Calculate daily sales volume by productSKU
Create a new table in your ecommerce dataset with the below requirements:
- Title it sales_by_sku_20170801
- Source the data from data-to-insights.ecommerce.all_sessions_raw
- Include only distinct results
- Return productSKU
- Return the total quantity ordered (productQuantity). Hint: Use a SUM() with a IFNULL condition
- Filter for only sales on 20170801
- ORDER BY the SKUs with the most orders first
Possible Solution:
# pull what sold on 08/01/2017
CREATE OR REPLACE TABLE ecommerce.sales_by_sku_20170801 AS
SELECT DISTINCT
productSKU,
SUM(IFNULL(productQuantity,0)) AS total_ordered
FROM
`data-to-insights.ecommerce.all_sessions_raw`
WHERE date = '20170801'
GROUP BY productSKU
ORDER BY total_ordered DESC #462 skus sold
Next, enrich your sales data with product inventory information by joining the two datasets.
Join sales data and inventory data
Using a JOIN, enrich the website ecommerce data with the following fields from the product inventory dataset:
- name
- stockLevel
- restockingLeadTime
- sentimentScore
- sentimentMagnitude
# calculate ratio and filter
SELECT DISTINCT
website.productSKU,
website.total_ordered,
inventory.name,
inventory.stockLevel,
inventory.restockingLeadTime,
inventory.sentimentScore,
inventory.sentimentMagnitude,
SAFE_DIVIDE(website.total_ordered, inventory.stockLevel) AS ratio
FROM
ecommerce.sales_by_sku_20170801 AS website
LEFT JOIN `data-to-insights.ecommerce.products` AS inventory
ON website.productSKU = inventory.SKU
# gone through more than 50% of inventory for the month
WHERE SAFE_DIVIDE(website.total_ordered,inventory.stockLevel) >= .50
ORDER BY total_ordered DESC
Create a new empty table to store sales by productSKU for 08/02/2017
For the schema, specify the following fields:
- table name is ecommerce.sales_by_sku_20170802
- productSKU STRING
- total_ordered as an INT64 field
CREATE OR REPLACE TABLE ecommerce.sales_by_sku_20170802
(
productSKU STRING,
total_ordered INT64
);
INSERT INTO ecommerce.sales_by_sku_20170802
(productSKU, total_ordered)
VALUES('GGOEGHPA002910', 101)
Append together historical data
There are multiple ways to append together data that has the same schema. Two common ways are using UNIONs and table wildcards.
- Union is an SQL operator that appends together rows from different result sets.
- Table wildcards enable you to query multiple tables using concise SQL statements. Wildcard tables are available only in standard SQL.
SELECT * FROM ecommerce.sales_by_sku_20170801
UNION ALL
SELECT * FROM ecommerce.sales_by_sku_20170802
'Learn to Earn Data Challenge' 카테고리의 다른 글
Build and Execute MySQL, PostgreSQL, and SQLServer to Data Catalog Connectors (0) | 2022.07.03 |
---|---|
Working with JSON, Arrays, and Structs in BigQuery (0) | 2022.07.03 |
Troubleshooting and Solving Data Join Pitfalls (0) | 2022.07.03 |
Creating Date-Partitioned Tables in BigQuery (0) | 2022.07.03 |
Learn to Earn Data Challenge 도전 (0) | 2022.07.03 |