Learn to Earn Data Challenge

Working with JSON, Arrays, and Structs in BigQuery

with_AI 2022. 7. 3. 18:41

5번째 퀘스트에 도전했다.

 

JSON, 배열 그리고 구조체가 빅쿼리에서 어떻게 동작할까

 

In traditional relational database SQL, you would look at the repetition of names and immediately think to split the above table into two separate tables: Fruit Items and People. That process is called normalization (going from one table to many). This is a common approach for transactional databases like mySQL.

For data warehousing, data analysts often go the reverse direction (denormalization) and bring many separate tables into one large reporting table.

 

기존의 관계형 데이터베이스 SQL에서는 이름의 반복을 보고 즉시 위의 테이블을 과일 항목과 사람이라는 두 개의 개별 테이블로 분할하는 것을 생각할 것입니다. 이 프로세스를 정규화(한 테이블에서 여러 테이블로 이동)라고 합니다. 이것은 mySQL과 같은 트랜잭션 데이터베이스에 대한 일반적인 접근 방식입니다. 데이터 웨어하우징의 경우 데이터 분석가는 종종 반대 방향(비정규화)으로 이동하여 많은 개별 테이블을 하나의 큰 보고 테이블로 가져옵니다.

 

Both of these tables are exactly the same. There are two key learnings here:

  • An array is simply a list of items in brackets [ ]
  • BigQuery visually displays arrays as flattened. It simply lists the value in the array vertically (note that all of those values still belong to a single row)

배열 타입의 데이터가 존재한다.

 

JSON 형태로도 데이터를 볼 수 있다.

Introduction to STRUCTs

You may have wondered why the field alias hit.page.pageTitle looks like three fields in one separated by periods. Just as ARRAY values give you the flexibility to go deep into the granularity of your fields, another data type allows you to go wide in your schema by grouping related fields together. That SQL data type is the STRUCT data type.

The easiest way to think about a STRUCT is to consider it conceptually like a separate table that is already pre-joined into your main table.

A STRUCT can have:

  • one or many fields in it
  • the same or different data types for each field
  • it's own alias

Sounds just like a table right?

 

STRUCT 소개 필드 별칭 hit.page.pageTitle이 마침표로 구분된 하나의 세 필드처럼 보이는 이유가 궁금할 것입니다. ARRAY 값을 사용하면 필드의 세분성을 깊숙이 들어갈 수 있는 유연성을 제공하는 것처럼 다른 데이터 유형을 사용하면 관련 필드를 함께 그룹화하여 스키마에서 더 넓게 이동할 수 있습니다. 해당 SQL 데이터 유형은 STRUCT 데이터 유형입니다. STRUCT에 대해 생각하는 가장 쉬운 방법은 개념적으로 기본 테이블에 이미 사전 조인된 별도의 테이블처럼 생각하는 것입니다. STRUCT는 다음을 가질 수 있습니다. 하나 이상의 필드 각 필드에 대해 동일하거나 다른 데이터 유형 자신의 별명이다 딱 테이블 같죠?

 

Recap of STRUCTs:

  • A SQL STRUCT is simply a container of other data fields which can be of different data types. The word struct means data structure. Recall the example from earlier:
  • STRUCT(``"Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as splits``)`` AS runner
  • STRUCTs are given an alias (like runner above) and can conceptually be thought of as a table inside of your main table.
  • STRUCTs (and ARRAYs) must be unpacked before you can operate over their elements. Wrap an UNNEST() around the name of the struct itself or the struct field that is an array in order to unpack and flatten it.

STRUCT 요약: SQL STRUCT는 단순히 다른 데이터 유형일 수 있는 다른 데이터 필드의 컨테이너입니다. 구조체라는 단어는 데이터 구조를 의미합니다. 이전의 예를 생각해 보십시오. STRUCT(``"Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as splits``)`` AS runner STRUCT는 별칭(위의 runner와 유사)이 주어지며 개념적으로 기본 테이블 내부의 테이블로 생각할 수 있습니다. STRUCT(및 ARRAY)는 해당 요소에 대해 작업하기 전에 압축을 풀어야 합니다. 압축을 풀고 병합하기 위해 구조체 자체 또는 배열인 구조체 필드의 이름 주위에 UNNEST()를 래핑합니다.