Learn to Earn Data Challenge

Form Parsing Using Document AI

with_AI 2022. 7. 4. 22:10

Overview

As consumers, we are used to filling out forms to apply for insurance, make insurance claims, specify healthcare preferences, apply for employment, tax withholdings, etc. Businesses on the other side of these transactions get a form that they need to parse, extract specific pieces of data from, and populate a database with.

In this lab you will use Google Cloud's Document AI solution to parse forms within a Jupyter Notebook so that you can automatically extract information from digitally scanned paper forms.

This lab you learn how to:

  • Create a Jupyter Notebook instance on Cloud Vertex AI
  • Create a Service Account so that you can automate form processing.
  • Upload a PDF document to Cloud Storage.
  • Invoke Document AI.
  • Parse the response using low-level functions based on the visual layout of the form.
  • Parse the response using high-level functions based on the semantic structure of the form.

What you will build

You will parse a use it to parse a campaign disclosure form that all US political campaigns are required to file. From this form, you will pull out the cash that the campaign has on hand.

 

 

개요 소비자로서 우리는 보험 신청, 보험 청구, 의료 기본 설정 지정, 고용 신청, 원천 징수 등을 위한 양식을 작성하는 데 익숙합니다. 이러한 거래의 반대편에 있는 기업은 구문 분석하고 추출해야 하는 양식을 얻습니다. 특정 데이터 조각을 가져와 데이터베이스를 채웁니다. 이 실습에서는 Google Cloud의 Document AI 솔루션을 사용하여 디지털 스캔한 종이 양식에서 정보를 자동으로 추출할 수 있도록 Jupyter Notebook 내의 양식을 구문 분석합니다. 이 실습에서는 다음 방법을 배웁니다. Cloud Vertex AI에서 Jupyter Notebook 인스턴스 만들기 양식 처리를 자동화할 수 있도록 서비스 계정을 만드십시오. PDF 문서를 Cloud Storage에 업로드합니다. 문서 AI를 호출합니다. 양식의 시각적 레이아웃을 기반으로 하는 저수준 함수를 사용하여 응답을 구문 분석합니다. 양식의 의미 체계를 기반으로 하는 고급 기능을 사용하여 응답을 구문 분석합니다. 무엇을 만들 것인가 모든 미국 정치 캠페인이 제출해야 하는 캠페인 공개 양식을 구문 분석하는 데 사용합니다. 이 양식에서 캠페인에 있는 현금을 꺼낼 것입니다.

 

스캔한 종이 양식에서 정보를 자동화 하여 추출하는 퀘스트.

 

https://youtu.be/HcqpanDadyQ

 

미리 볼 수 있는 유튜브 링크는 다음과 같다.