黄玮
ansible
的使用对目标运行环境也有依赖要求
puppet
为代表的基于 agent
的自动化管理系统方案需要在「基础镜像」中提前安装好对应的 agent
并进行运行时环境相关的必要配置以上需求可以通过在定制「基础镜像」阶段将上述依赖软件预装在系统中
但是,上述做法并不「优雅」。
「基础镜像」应该满足以下特性:
考虑以下集群服务器管理的常见需求:
以上需求对应的操作过程均要求 自动化 。
这就是 从一到二、到三,乃至万物
的 自动化 问题。
怎么解决
从一到二、到三,乃至万物
的 自动化 问题?
解决
从一到二、到三,乃至万物
(从基本系统到可编程、可配置) 的 自动化 问题
# 在制作无人值守安装镜像或 PXE 镜像时将 cloud-init 和 openssh-server 预装
sudo apt update && sudo apt install cloud-init genisoimage
mkdir -p ~/workspace/cloud-init && cd ~/workspace/cloud-init
cat << EOF > ~/workspace/cloud-init/meta-data
instance-id: 1
local-hostname: cuc-cloud-init
EOF
cat << EOF > ~/workspace/cloud-init/user-data
#cloud-config
password: mypassword
chpasswd: { expire: False }
ssh_pwauth: True
EOF
genisoimage -output init-cidata.iso -volid cidata -joliet -rock user-data meta-data
init-cidata.iso
下载到宿主机系统init-cidata.iso
重启 当前虚拟机
,登录进入系统后:
# 系统的主机名变成了上述配置文件示例里的 cuc-cloud-init
# 查看系统中会新多出一个用户 `ubuntu`
id ubuntu
# 输入上述配置文件示例里设置的口令 `mypassword`
su ubuntu
# 检查当前用户身份
id
#cloud-config
cloud-config
类型文件,这是最常用的一类 User-Data
password: mypassword
Ubuntu
发行版默认创建的用户名为 ubuntu
,此处 password
指令等价于先创建新用户 ubuntu
,然后设置口令为 mypassword
。
mkpasswd
创建的「哈希后口令」。sudo apt install whois
mkpasswd -m help # 获取当前版本支持的哈希算法
mkpasswd -m sha-512 mypassword
chpasswd: { expire: False }
chpasswd
用于对系统中 已有 用户更改口令或设置口令强制过期策略。支持 expire
和 list
属性。其中 list
关键字支持 username:password
列表形式,既可以是 YAML list
格式,也可以是「一行一键值对」的 多行
字符串。
ssh_pwauth: True
ssh_pwauth
配置 /etc/ssh/sshd_config
里 PasswordAuthentication
字段值。
支持的发行版 | 支持的公有云 | 支持的私有云 |
---|---|---|
Ubuntu SLES/openSUSE RHEL/CentOS Fedora Gentoo Linux Debian ArchLinux FreeBSD NetBSD OpenBSD |
Amazon Web Services Microsoft Azure Google Cloud Platform Oracle Cloud Infrastructure Softlayer Rackspace Public Cloud IBM Cloud Digital Ocean Bigstep Hetzner Joyent CloudSigma Alibaba Cloud OVH OpenNebula Exoscale Scaleway CloudStack AltCloud SmartOS HyperOne Rootbox |
Bare metal installs OpenStack LXD KVM Metal-as-a-Service (MAAS) |
cloud-init --version
# /usr/bin/cloud-init 19.4-33-gbb4131a2-0ubuntu1~18.04.1
根据以上版本信息,推荐阅读官方 v19.4 版文档 。
cloud-init
的配置数据来源,通常来自用户(例如 user-data
)或来自创建配置驱动器的云(例如 meta-data
)。实例(Instance)在云计算场景中,一般指「云主机实例」。对于 Virtualbox
来说,每一个本地虚拟机也可以看作是一个「实例」。
当前系统上执行过 Cloud-Init
之后会将「元数据」保存在 /run/cloud-init/instance-data.json
,如下所示是「第一个 🌰 」对应的「元数据」:
{
"base64_encoded_keys": [],
"ds": {
"_doc": "EXPERIMENTAL: The structure and format of content scoped under the 'ds' key may change in subsequent releases of cloud-init.",
"meta_data": {
"dsmode": "net",
"instance-id": 1,
"local-hostname": "cuc-cloud-init"
}
},
"sensitive_keys": [],
"v1": {
"_beta_keys": [
"subplatform"
],
"availability-zone": null,
"availability_zone": null,
"cloud-name": "unknown",
"cloud_name": "unknown",
"instance-id": "1",
"instance_id": "1",
"local-hostname": "cuc-cloud-init",
"local_hostname": "cuc-cloud-init",
"platform": "nocloud",
"public_ssh_keys": [],
"region": null,
"subplatform": "config-disk (/dev/sr0)"
}
}
排 | 名 | 不 | 分 | 先后 |
---|---|---|---|---|
Alibaba Cloud (AliYun) | Alt Cloud | Azure | CloudSigma | CloudStack |
Config Drive | Digital Ocean | E24Cloud | Amazon EC2 | Exoscale |
Fallback/None | Google Compute Engine | MAAS | NoCloud | OpenNebula |
OpenStack | Oracle | OVF | Rbx Cloud | SmartOS Datasource |
ZStack |
cloud-id
# nocloud
回顾之前我们创建 iso 镜像文件时使用的命令
genisoimage -output init-cidata.iso -volid cidata -joliet -rock user-data meta-data
-volid cidata
指定新创建的 iso 文件的「卷标识」为 cidata
回顾之前我们创建 iso 镜像文件时使用的命令
genisoimage -output init-cidata.iso -volid cidata -joliet -rock user-data meta-data
user-data
和 meta-data
回顾之前的 meta-data
文件内容
instance-id: 1
local-hostname: cuc-cloud-init
其中 instance-id: 1
用来告诉 cloud-init
引擎该实例「是否首次启动」。
# 查看该实例启动过一次之后在当前系统内留下的痕迹
ls /var/lib/cloud/instances/1/
# boot-finished datasource obj.pkl sem user-data.txt.i vendor-data.txt.i
# cloud-config.txt handlers scripts user-data.txt vendor-data.txt
其中 boot-finished
文件记录了该实例定义的启动时间,例如:
cat /var/lib/cloud/instances/1/boot-finished
# 11.28 - Tue, 14 Apr 2020 07:33:47 +0000 - v. 19.4-33-gbb4131a2-0ubuntu1~18.04.1
当上述目录结构存在时,只要 init-cidata.iso
内容不变。即使一直挂载在虚拟机的光驱里,每次启动虚拟机系统时,也不会再重新执行一遍光盘里定义的 cloud-init
操作了。
如果希望重新执行 init-cidata.iso
里定义的操作,需要变更 meta-data
文件里的 instance-id
赋值为 /var/lib/cloud/instances/
下不存在重名子目录的其他值。
# 验证 user-data 文件是否存在语法错误
cloud-init devel schema -c user-data --annotate
# Valid cloud-config file user-data
Cloud-Init
在「不同阶段」加载了哪些模块。结合 /etc/cloud/cloud.cfg
文件内容来理解不同「启动阶段」:文件中一共定义了 3 个阶段:
每个阶段可以「完成」哪些操作可以根据包含的「模块」功能知悉。
/etc/cloud/cloud.cfg
🌰
# ref: https://cloudinit.readthedocs.io/en/19.4/topics/modules.html
# 以下各个阶段定义的「模块」均在以上官方文档中有详细使用说明
# The modules that run in the 'init' stage
cloud_init_modules:
- migrator
- seed_random
- bootcmd
- write-files
- growpart
- resizefs
- disk_setup
- mounts
- set_hostname
- update_hostname
- update_etc_hosts
- ca-certs
- rsyslog
- users-groups
- ssh
# The modules that run in the 'config' stage
cloud_config_modules:
# Emit the cloud config ready event
# this can be used by upstart jobs for 'start on cloud-config'.
- emit_upstart
- snap
- ssh-import-id
- locale
- set-passwords
- grub-dpkg
- apt-pipelining
- apt-configure
- ubuntu-advantage
- ntp
- timezone
- disable-ec2-metadata
- runcmd
- byobu
# The modules that run in the 'final' stage
cloud_final_modules:
- package-update-upgrade-install
- fan
- landscape
- lxd
- ubuntu-drivers
- puppet
- chef
- mcollective
- salt-minion
- rightscale_userdata
- scripts-vendor
- scripts-per-once
- scripts-per-boot
- scripts-per-instance
- scripts-user
- ssh-authkey-fingerprints
- keys-to-console
- phone-home
- final-message
- power-state-change
结合 官方文档里「启动阶段」 一节的描述可知:init
阶段又可以分为:
相当于「导入函数」或「导入库」的作用,如果在指定「启动阶段」没有「定义」使用某个「模块」,则在 user-data
中不能调用相应指令。
再回看「第一个 🌰 」
#cloud-config
password: mypassword
chpasswd: { expire: False }
ssh_pwauth: True
上述 3 个配置指令 password
, chpasswd
, ssh_pwauth
均定义在 set-passwords 模块中。
# The top level settings are used as module
# and system configuration.
# A set of users which may be applied and/or used by various modules
# when a 'default' entry is found it will reference the 'default_user'
# from the distro configuration specified below
users:
- default
# If this is set, 'root' will not be able to ssh in and they
# will get a message to login instead as the default $user
disable_root: true
# This will cause the set+update hostname module to not operate (if true)
preserve_hostname: false
# Example datasource config
# datasource:
# Ec2:
# metadata_urls: [ 'blah.com' ]
# timeout: 5 # (defaults to 50 seconds)
# max_wait: 10 # (defaults to 120 seconds)
# 此处省略 3 个阶段的模块定义相关指令
# System and/or distro specific settings
# (not accessible to handlers/transforms)
system_info:
# This will affect which distro class gets used
distro: ubuntu
# Default user name + that default users groups (if added/used)
default_user:
name: ubuntu
lock_passwd: True
gecos: Ubuntu
groups: [adm, audio, cdrom, dialout, dip, floppy, lxd, netdev, plugdev, sudo, video]
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
shell: /bin/bash
# Automatically discover the best ntp_client
ntp_client: auto
# Other config here will be given to the distro class and/or path classes
paths:
cloud_dir: /var/lib/cloud/
templates_dir: /etc/cloud/templates/
upstart_dir: /etc/init/
package_mirrors:
- arches: [i386, amd64]
failsafe:
primary: http://archive.ubuntu.com/ubuntu
security: http://security.ubuntu.com/ubuntu
search:
primary:
- http://%(ec2_region)s.ec2.archive.ubuntu.com/ubuntu/
- http://%(availability_zone)s.clouds.archive.ubuntu.com/ubuntu/
- http://%(region)s.clouds.archive.ubuntu.com/ubuntu/
security: []
- arches: [arm64, armel, armhf]
failsafe:
primary: http://ports.ubuntu.com/ubuntu-ports
security: http://ports.ubuntu.com/ubuntu-ports
search:
primary:
- http://%(ec2_region)s.ec2.ports.ubuntu.com/ubuntu-ports/
- http://%(availability_zone)s.clouds.ports.ubuntu.com/ubuntu-ports/
- http://%(region)s.clouds.ports.ubuntu.com/ubuntu-ports/
security: []
- arches: [default]
failsafe:
primary: http://ports.ubuntu.com/ubuntu-ports
security: http://ports.ubuntu.com/ubuntu-ports
ssh_svcname: ssh
machine-id
The
/etc/machine-id
file contains the unique machine ID of the local system that is set during installation or boot. The machine ID is a single newline-terminated, hexadecimal, 32-character, lowercase ID. When decoded from hexadecimal, this corresponds to a 16-byte/128-bit value. This ID may not be all zeros.
systemd-machine-id-setup(1) may be used by installer tools to initialize the machine ID at install time, but /etc/machine-id may also be written using any other means.
For operating system images which are created once and used on multiple machines, for example for containers or in the cloud,
/etc/machine-id
should be an empty file in the generic file system image.An ID will be generated during boot and saved to this file if possible
. Having an empty file in place is useful because it allows a temporary file to be bind-mounted over the real file, in case the image is used read-only.
对于 Debian 及其衍生发行版,例如 Kali ,以上操作步骤均可能失效:无法更新 /etc/machine-id 。
# Virtualbox 的多重加载镜像机制克隆出来的虚拟机使用的虚拟磁盘,磁盘 uuid 值是相同的
sudo blkid /dev/sda1
# /dev/sda1: UUID="dff30eeb-7332-438d-964c-d5c7f4d357f7" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="f0f6b9b0-01"
ls -l /dev/disk/by-uuid
# lrwxrwxrwx 1 root root 10 Dec 3 00:32 dff30eeb-7332-438d-964c-d5c7f4d357f7 -> ../../sda1
# /etc/machine-id 的值与磁盘 uuid 值无关
# 局域网中,【不同主机】的磁盘分区 uuid 值相同无影响,但要避免 machine-id 值重复
# machine-id 重复的一个最直接影响是对于使用 Net-Plan 方式进行 DHCP 获取 IP 地址的客户端来说,缺省 DHCP 请求策略会导致局域网中出现 IP 地址冲突
# 除此之外,对于分布式集群系统来说,重复 machine-id 可能会导致一些不确定性错误
# ref-1: https://unix.stackexchange.com/questions/402999/is-it-ok-to-change-etc-machine-id
# ref-2: https://documentation.suse.com/external-tree/en-us/suma/4.0/suse-manager/administration/tshoot-registerclones.html
# TL;DR 对于 Debian 及其衍生发行版系统 /etc/machine-id 的值是在系统启动时拷贝自 /var/lib/dbus/machine-id
sudo rm /var/lib/dbus/machine-id /etc/machine-id
## ref-1 的方法
sudo dbus-uuidgen --ensure=/etc/machine-id
# 从 /etc/machine-id 拷贝内容到 /var/lib/dbus/machine-id
sudo dbus-uuidgen --ensure
## ref-2 的方法
# 当 /etc/machine-id 内容为空或文件缺失时,创建 /var/lib/dbus/machine-id 并写入 machine-id
sudo dbus-uuidgen --ensure
# 从 /var/lib/dbus/machine-id 拷贝内容到 /etc/machine-id
sudo systemd-machine-id-setup
# 以上 ref-1 和 ref-2 的方法在 Kali 上效果相同
# 验证 machine-id
cat /etc/machine-id
cat /var/lib/dbus/machine-id
# 重启系统,以确保配置变更生效
sudo reboot
根据网友 wickedchicken
在 SO 网站上的回答
systemd-networkd
uses a different method to generate the DUID thandhclient
.dhclient
by default uses the link-layer address while systemd-networkd uses the contents of /etc/machine-id. Since the VMs were cloned, they have the samemachine-id
and the DHCP server returns the same IP for both.
To fix, replace the contents of one or both of /etc/machine-id. This can be anything, but deleting the file and running systemd-machine-id-setup will create a random machine-id in the same way done on machine setup.
另一种解决方案:修改 /etc/netplan/01-netcfg.yaml
,在网卡的 dhcp
配置列表里添加配置参数:dhcp-identifier: mac
# ref: https://netplan.io/examples#integration-with-a-windows-dhcp-server
network:
version: 2
ethernets:
enp0s3:
dhcp4: yes
enp8s0:
dhcp4: yes
dhcp-identifier: mac
machine-id
之后为什么需要重启以下内容摘自 man dbus-uuidgen
If you try to change an existing machine-id on a running system, it will probably result in bad things happening. Don’t try to change this file. Also, don’t make it the same on two different systems; it needs to be different anytime there are two different kernels running.
以下结论摘自 网友 hvd
在 SO 网站上的回答结论 :
So after doing this, definitely don’t continue using the system without rebooting.
cloud-init
/etc/machine-id
此时的值/etc/netplan/
目录下此时的文件/etc/netplan/01-netcfg.yaml
文件内容user-data
meta-data
user-data
示例#cloud-config
users:
- name: cuc
lock_passwd: false # Disable password login. Default: true
passwd: 123456 # !!DEMO only!! Hashed password is recommended in Production.
ssh_authorized_keys:
- <ssh-pub-key-1>
- <ssh-pub-key-2>
# run the following cmds after system is booted and rootfs is mounted
runcmd:
- echo -n '' > /etc/machine-id # clear but not delete
- /bin/systemd-machine-id-setup
# Install additional packages on first boot
#
# Default: none
#
# if packages are specified, this apt_update will be set to true
#
# packages may be supplied as a single package name or as a list
# with the format [<package>, <version>] wherein the specifc
# package version will be installed.
packages:
- python3 # required by ansible remote
## poweroff or reboot system after finished
# default: none
#
# power_state can be used to make the system shutdown, reboot or
# halt after boot is finished. This same thing can be acheived by
# user-data scripts or by runcmd by simply invoking 'shutdown'.
#
# Doing it this way ensures that cloud-init is entirely finished with
# modules that would be executed, and avoids any error/log messages
# that may go to the console as a result of system services like
# syslog being taken down while cloud-init is running.
#
# If you delay '+5' (5 minutes) and have a timeout of
# 120 (2 minutes), then the max time until shutdown will be 7 minutes.
# cloud-init will invoke 'shutdown +5' after the process finishes, or
# when 'timeout' seconds have elapsed.
#
# delay: form accepted by shutdown. default is 'now'. other format
# accepted is +m (m in minutes)
# mode: required. must be one of 'poweroff', 'halt', 'reboot'
# message: provided as the message argument to 'shutdown'. default is none.
# timeout: the amount of time to give the cloud-init process to finish
# before executing shutdown.
# condition: apply state change only if condition is met.
# May be boolean True (always met), or False (never met),
# or a command string or list to be executed.
# command's exit code indicates:
# 0: condition met
# 1: condition not met
# other exit codes will result in 'not met', but are reserved
# for future use.
#
power_state:
delay: "now"
mode: reboot
message: Make new machine-id take effect
timeout: 10
condition: True
meta-data
示例instance-id: 1
local-hostname: ansible-slave
/etc/machine-id
此时的值/etc/netplan/
目录下此时的文件