공부 이야기/그냥 찾아보는 공부
Data Pipeline orchestator Apache AirFlow
mind:
2024. 2. 13. 13:21
# 1. data pipeline orchestrator ?
- task define, schedule
- monitor, error handling
- coordinate dependency
- execute order of tasks
- data movement(ETL)
- scalar or parallel
# 2. similar product
- oozie, Uber-temporal(go-base), AWS-step
# 3. components
- Web Server : monitor
- Metadata Database
- Scheduler
- Executor
- Worker
- Triggerer
# 4. DAG
- set of tasks, task is unit of execution
- tasks can be written by python, bash, SQL
- operator : Action operator, Transfer operator, Sensor operator
# 5. Architecture
- Single node : Web UI, Queue, Scheduler, Metadata DB, Executor
- Multi node : seperate Single node unit by each feature