본문 바로가기

공부 이야기/그냥 찾아보는 공부

Data Pipeline orchestator Apache AirFlow

# 1. data pipeline orchestrator ?

- task define, schedule

- monitor, error handling

- coordinate dependency

- execute order of tasks

- data movement(ETL)

- scalar or parallel

# 2. similar product

- oozie, Uber-temporal(go-base), AWS-step

 

# 3. components

- Web Server : monitor

- Metadata Database

- Scheduler

- Executor

- Worker

- Triggerer

 

# 4. DAG

- set of tasks, task is unit of execution

- tasks can be written by python, bash, SQL

- operator : Action operator, Transfer operator, Sensor operator

 

# 5. Architecture

- Single node : Web UI, Queue, Scheduler, Metadata DB, Executor

- Multi node : seperate Single node unit by each feature