聊聊Docker中 Airflow 2.2.3 容器化安装

2025-05-27 0 48

聊聊Docker中 Airflow 2.2.3 容器化安装

上文简单的了解了airflow的概念与使用场景,今天就通过Docker安装一下Airflow,在使用中在深入的了解一下airflow有哪些具体的功能。

1Airflow容器化部署

阿里云的宿主机环境:

  • 操作系统: Ubuntu 20.04.3 LTS
  • 内核版本: Linux 5.4.0-91-generic

安装docker

安装Docker可参考官方文档[1],纯净系统,就没必要卸载旧版本了,因为是云上平台,为防止配置搞坏环境,你可以先提前进行快照。

  1. #更新repo
  2. sudoapt-getupdate
  3. sudoapt-getinstall\\
  4. ca-certificates\\
  5. curl\\
  6. gnupg\\
  7. lsb-release
  8. #添加dockergpgkey
  9. curl-fsSLhttps://download.docker.com/linux/ubuntu/gpg|sudogpg–dearmor-o/usr/share/keyrings/docker-archive-keyring.gpg
  10. #设置dockerstable仓库地址
  11. echo\\
  12. "deb[arch=$(dpkg–print-architecture)signed-by=/usr/share/keyrings/docker-archive-keyring.gpg]https://download.docker.com/linux/ubuntu\\
  13. $(lsb_release-cs)stable"|sudotee/etc/apt/sources.list.d/docker.list>/dev/null
  14. #查看可安装的docker-ce版本
  15. root@bigdata1:~#apt-cachemadisondocker-ce
  16. docker-ce|5:20.10.12~3-0~ubuntu-focal|https://download.docker.com/linux/ubuntufocal/stableamd64Packages
  17. docker-ce|5:20.10.11~3-0~ubuntu-focal|https://download.docker.com/linux/ubuntufocal/stableamd64Packages
  18. docker-ce|5:20.10.10~3-0~ubuntu-focal|https://download.docker.com/linux/ubuntufocal/stableamd64Packages
  19. docker-ce|5:20.10.9~3-0~ubuntu-focal|https://download.docker.com/linux/ubuntufocal/stableamd64Packages
  20. #安装命令格式
  21. #sudoapt-getinstalldocker-ce=<VERSION_STRING>docker-ce-cli=<VERSION_STRING>containerd.io
  22. #安装指定版本
  23. sudoapt-getinstalldocker-ce=5:20.10.12~3-0~ubuntu-focaldocker-ce-cli=5:20.10.12~3-0~ubuntu-focalcontainerd.io

优化Docker配置

  1. {
  2. "data-root":"/var/lib/docker",
  3. "exec-opts":[
  4. "native.cgroupdriver=systemd"
  5. ],
  6. "registry-mirrors":[
  7. "https://****.mirror.aliyuncs.com"#此处配置一些加速的地址,比如阿里云的等等…
  8. ],
  9. "storage-driver":"overlay2",
  10. "storage-opts":[
  11. "overlay2.override_kernel_check=true"
  12. ],
  13. "log-driver":"json-file",
  14. "log-opts":{
  15. "max-size":"100m",
  16. "max-file":"3"
  17. }
  18. }

配置开机自己

  1. systemctldaemon-reload
  2. systemctlenable–nowdocker.service

容器化安装Airflow

数据库选型

根据官网的说明,数据库建议使用MySQL8+和postgresql 9.6+,在官方的docker-compose脚本[2]中使用是PostgreSQL,因此我们需要调整一下docker-compose.yml的内容

聊聊Docker中 Airflow 2.2.3 容器化安装

  1. version:'3'
  2. x-airflow-common:
  3. &airflow-common
  4. #Inordertoaddcustomdependenciesorupgradeproviderpackagesyoucanuseyourextendedimage.
  5. #Commenttheimageline,placeyourDockerfileinthedirectorywhereyouplacedthedocker-compose.yaml
  6. #anduncommentthe"build"linebelow,Thenrun`docker-composebuild`tobuildtheimages.
  7. image:${AIRFLOW_IMAGE_NAME:-apache/airflow:2.2.3}
  8. #build:.
  9. environment:
  10. &airflow-common-env
  11. AIRFLOW__CORE__EXECUTOR:CeleryExecutor
  12. AIRFLOW__CORE__SQL_ALCHEMY_CONN:mysql+mysqldb://airflow:aaaa@mysql/airflow#此处替换为mysql连接方式
  13. AIRFLOW__CELERY__RESULT_BACKEND:db+mysql://airflow:aaaa@mysql/airflow#此处替换为mysql连接方式
  14. AIRFLOW__CELERY__BROKER_URL:redis://:xxxx@redis:6379/0#为保证安全,我们对redis开启了认证,因此将此处xxxx替换为redis密码
  15. AIRFLOW__CORE__FERNET_KEY:''
  16. AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION:'true'
  17. AIRFLOW__CORE__LOAD_EXAMPLES:'true'
  18. AIRFLOW__API__AUTH_BACKEND:'airflow.api.auth.backend.basic_auth'
  19. _PIP_ADDITIONAL_REQUIREMENTS:${_PIP_ADDITIONAL_REQUIREMENTS:-}
  20. volumes:
  21. -./dags:/opt/airflow/dags
  22. -./logs:/opt/airflow/logs
  23. -./plugins:/opt/airflow/plugins
  24. user:"${AIRFLOW_UID:-50000}:0"
  25. depends_on:
  26. &airflow-common-depends-on
  27. redis:
  28. condition:service_healthy
  29. mysql:#此处修改为mysqlservice名称
  30. condition:service_healthy
  31. services:
  32. mysql:
  33. image:mysql:8.0.27#修改为mysql最新版镜像
  34. environment:
  35. MYSQL_ROOT_PASSWORD:bbbb#MySQLroot账号密码
  36. MYSQL_USER:airflow
  37. MYSQL_PASSWORD:aaaa#airflow用户的密码
  38. MYSQL_DATABASE:airflow
  39. command:
  40. –default-authentication-plugin=mysql_native_password#指定默认的认证插件
  41. –collation-server=utf8mb4_general_ci#依据官方指定字符集
  42. –character-set-server=utf8mb4#依据官方指定字符编码
  43. volumes:
  44. -/apps/airflow/mysqldata8:/var/lib/mysql#持久化MySQL数据
  45. -/apps/airflow/my.cnf:/etc/my.cnf#持久化MySQL配置文件
  46. healthcheck:
  47. test:mysql–user=$$MYSQL_USER–password=$$MYSQL_PASSWORD-e'SHOWDATABASES;'#healthcheckcommand
  48. interval:5s
  49. retries:5
  50. restart:always
  51. redis:
  52. image:redis:6.2
  53. expose:
  54. -6379
  55. command:redis-server–requirepassxxxx#redis-server开启密码认证
  56. healthcheck:
  57. test:["CMD","redis-cli","-a","xxxx","ping"]#redis使用密码进行healthcheck
  58. interval:5s
  59. timeout:30s
  60. retries:50
  61. restart:always
  62. airflow-webserver:
  63. <<:*airflow-common
  64. command:webserver
  65. ports:
  66. -8080:8080
  67. healthcheck:
  68. test:["CMD","curl","–fail","http://localhost:8080/health"]
  69. interval:10s
  70. timeout:10s
  71. retries:5
  72. restart:always
  73. depends_on:
  74. <<:*airflow-common-depends-on
  75. airflow-init:
  76. condition:service_completed_successfully
  77. airflow-scheduler:
  78. <<:*airflow-common
  79. command:scheduler
  80. healthcheck:
  81. test:["CMD-SHELL",'airflowjobscheck–job-typeSchedulerJob–hostname"$${HOSTNAME}"']
  82. interval:10s
  83. timeout:10s
  84. retries:5
  85. restart:always
  86. depends_on:
  87. <<:*airflow-common-depends-on
  88. airflow-init:
  89. condition:service_completed_successfully
  90. airflow-worker:
  91. <<:*airflow-common
  92. command:celeryworker
  93. healthcheck:
  94. test:
  95. "CMD-SHELL"
  96. 'celery–appairflow.executors.celery_executor.appinspectping-d"celery@$${HOSTNAME}"'
  97. interval:10s
  98. timeout:10s
  99. retries:5
  100. environment:
  101. <<:*airflow-common-env
  102. #Requiredtohandlewarmshutdownoftheceleryworkersproperly
  103. #Seehttps://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
  104. DUMB_INIT_SETSID:"0"
  105. restart:always
  106. depends_on:
  107. <<:*airflow-common-depends-on
  108. airflow-init:
  109. condition:service_completed_successfully
  110. airflow-triggerer:
  111. <<:*airflow-common
  112. command:triggerer
  113. healthcheck:
  114. test:["CMD-SHELL",'airflowjobscheck–job-typeTriggererJob–hostname"$${HOSTNAME}"']
  115. interval:10s
  116. timeout:10s
  117. retries:5
  118. restart:always
  119. depends_on:
  120. <<:*airflow-common-depends-on
  121. airflow-init:
  122. condition:service_completed_successfully
  123. airflow-init:
  124. <<:*airflow-common
  125. entrypoint:/bin/bash
  126. #yamllintdisablerule:line-length
  127. command:
  128. –c
  129. -|
  130. functionver(){
  131. printf"%04d%04d%04d%04d"$${1//./}
  132. }
  133. airflow_version=$$(gosuairflowairflowversion)
  134. airflow_version_comparable=$$(ver$${airflow_version})
  135. min_airflow_version=2.2.0
  136. min_airflow_version_comparable=$$(ver$${min_airflow_version})
  137. if((airflow_version_comparable<min_airflow_version_comparable));then
  138. echo
  139. echo-e"\\033[1;31mERROR!!!:ToooldAirflowversion$${airflow_version}!\\e[0m"
  140. echo"TheminimumAirflowversionsupported:$${min_airflow_version}.Onlyusethisorhigher!"
  141. echo
  142. exit1
  143. fi
  144. if[[-z"${AIRFLOW_UID}"]];then
  145. echo
  146. echo-e"\\033[1;33mWARNING!!!:AIRFLOW_UIDnotset!\\e[0m"
  147. echo"IfyouareonLinux,youSHOULDfollowtheinstructionsbelowtoset"
  148. echo"AIRFLOW_UIDenvironmentvariable,otherwisefileswillbeownedbyroot."
  149. echo"Forotheroperatingsystemsyoucangetridofthewarningwithmanuallycreated.envfile:"
  150. echo"See:https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#setting-the-right-airflow-user"
  151. echo
  152. fi
  153. one_meg=1048576
  154. mem_available=$$(($$(getconf_PHYS_PAGES)*$$(getconfPAGE_SIZE)/one_meg))
  155. cpus_available=$$(grep-cE'cpu[0-9]+'/proc/stat)
  156. disk_available=$$(df/|tail-1|awk'{print$$4}')
  157. warning_resources="false"
  158. if((mem_available<4000));then
  159. echo
  160. echo-e"\\033[1;33mWARNING!!!:NotenoughmemoryavailableforDocker.\\e[0m"
  161. echo"Atleast4GBofmemoryrequired.Youhave$$(numfmt–toiec$$((mem_available*one_meg)))"
  162. echo
  163. warning_resources="true"
  164. fi
  165. if((cpus_available<2));then
  166. echo
  167. echo-e"\\033[1;33mWARNING!!!:NotenoughCPUSavailableforDocker.\\e[0m"
  168. echo"Atleast2CPUsrecommended.Youhave$${cpus_available}"
  169. echo
  170. warning_resources="true"
  171. fi
  172. if((disk_available<one_meg*10));then
  173. echo
  174. echo-e"\\033[1;33mWARNING!!!:NotenoughDiskspaceavailableforDocker.\\e[0m"
  175. echo"Atleast10GBsrecommended.Youhave$$(numfmt–toiec$$((disk_available*1024)))"
  176. echo
  177. warning_resources="true"
  178. fi
  179. if[[$${warning_resources}=="true"]];then
  180. echo
  181. echo-e"\\033[1;33mWARNING!!!:YouhavenotenoughresourcestorunAirflow(seeabove)!\\e[0m"
  182. echo"Pleasefollowtheinstructionstoincreaseamountofresourcesavailable:"
  183. echo"https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#before-you-begin"
  184. echo
  185. fi
  186. mkdir-p/sources/logs/sources/dags/sources/plugins
  187. chown-R"${AIRFLOW_UID}:0"/sources/{logs,dags,plugins}
  188. exec/entrypointairflowversion
  189. #yamllintenablerule:line-length
  190. environment:
  191. <<:*airflow-common-env
  192. _AIRFLOW_DB_UPGRADE:'true'
  193. _AIRFLOW_WWW_USER_CREATE:'true'
  194. _AIRFLOW_WWW_USER_USERNAME:${_AIRFLOW_WWW_USER_USERNAME:-airflow}
  195. _AIRFLOW_WWW_USER_PASSWORD:${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
  196. user:"0:0"
  197. volumes:
  198. -.:/sources
  199. airflow-cli:
  200. <<:*airflow-common
  201. profiles:
  202. -debug
  203. environment:
  204. <<:*airflow-common-env
  205. CONNECTION_CHECK_MAX_COUNT:"0"
  206. #Workaroundforentrypointissue.See:https://github.com/apache/airflow/issues/16252
  207. command:
  208. -bash
  209. –c
  210. -airflow
  211. flower:
  212. <<:*airflow-common
  213. command:celeryflower
  214. ports:
  215. -5555:5555
  216. healthcheck:
  217. test:["CMD","curl","–fail","http://localhost:5555/"]
  218. interval:10s
  219. timeout:10s
  220. retries:5
  221. restart:always
  222. depends_on:
  223. <<:*airflow-common-depends-on
  224. airflow-init:
  225. condition:service_completed_successfully

在官方docker-compose.yaml基础上只修改了x-airflow-common,MySQL,Redis相关配置,接下来就应该启动容器了,在启动之前,需要创建几个持久化目录:

  1. mkdir-p./dags./logs./plugins
  2. echo-e"AIRFLOW_UID=$(id-u)">.env#注意,此处一定要保证AIRFLOW_UID是普通用户的UID,且保证此用户有创建这些持久化目录的权限

如果不是普通用户,在运行容器的时候,会报错,找不到airflow模块

  1. docker-composeupairflow-init#初始化数据库,以及创建表
  2. docker-composeup-d#创建airflow容器

聊聊Docker中 Airflow 2.2.3 容器化安装

当出现容器的状态为unhealthy的时候,要通过docker inspect $container_name查看报错的原因,至此airflow的安装就已经完成了。

参考资料

[1]Install Docker Engine on Ubuntu: https://docs.docker.com/engine/install/ubuntu/

[2]官方docker-compose.yaml: https://airflow.apache.org/docs/apache-airflow/2.2.3/docker-compose.yaml

原文链接:https://mp.weixin.qq.com/s/VncpyXcTtlvnDkFrsAZ5lQ

收藏 (0) 打赏

感谢您的支持,我会继续努力的!

打开微信/支付宝扫一扫,即可进行扫码打赏哦,分享从这里开始,精彩与您同在
点赞 (0)

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。

快网idc优惠网 行业资讯 聊聊Docker中 Airflow 2.2.3 容器化安装 https://www.kuaiidc.com/62906.html

相关文章

发表评论
暂无评论