Conversation
80280d3 to
40bc416
Compare
c816666 to
6a91a50
Compare
|
Hi @sk4zuzu. I've noticed some issues with the systemd unit execution when no ipv4 configuration is declared on the bridge. If the ipv4 configuration is missing from the addrs:
- cidr: "{{ ansible_default_ipv4.address ~ '/' ~ ansible_default_ipv4.prefix }}"
metric: 400
gw: "{{ ansible_default_ipv4.gateway }}"
dns: ["{{ ansible_default_ipv4.gateway }}"]this leads to failing the systemd unit with a bash error The "compiled bash" has an empty if logic code in this case [root@sm15 ovs]# tail -n +90 /usr/local/sbin/opennebula-ovs.sh
# --- Nameserver configuration
if type -p systemctl resolvectl &>/dev/null && systemctl is-active --quiet systemd-resolved; then
# dpdk0
# dpdk1
# ovsbr0
fi
# --- Networking refresh
# ovsbr0
# ---
log 'OVS network configuration completed successfully'
exit 0
[root@sm15 ovs]# if type -p systemctl resolvectl &>/dev/null && systemctl is-active --quiet systemd-resolved; then
# dpdk0
# dpdk1
# ovsbr0
fi
-bash: syntax error near unexpected token `fi'This also leads to losing connectivity post reboot since In general, we shouldn't tinker with ipv4 related networking configuration if we don't define it on the bridge. We could also do this transparently possibly, by detecting if the interfaces used as bridge ports have ip configuration, but perhaps this is very complex. |
|
We also need support for traffic mirroring. We validated the method to issue mirroring as follows The bash logic to create mirrors if [[ -n $OVS_MIRROR ]]; then
ovs-vsctl add-port $BRIDGE $GRE \
-- set interface $GRE type=gre options:remote_ip=$MIRROR_HOST \
-- --id=@p get port $GRE \
-- --id=@m create mirror name=$OVS_MIRROR select-all=true output-port=@p \
-- set bridge $BRIDGE mirrors=@m
fiWe can split that into port creation ovs-vsctl add-port $BRIDGE $GRE -- set interface $GRE type=gre -- set interface $GRE options=remote_ip=$MIRROR_HOSTand mirror creation ovs-vsctl --id=@p get port $GRE \
-- --id=@m create mirror name=$OVS_MIRROR select-all=true output-port=@p \
-- set bridge $BRIDGE mirrors=@mExample output of bridge state I tried setting that in the port:
ovsbr0:
set:
- tag: 741
iface:
ovsbr0:
set:
- mtu_request: 1500
gre1:
set:
- type: >-
gre options:remote_ip=sm14 -- --id@p get port gre1 -- --id=@m create mirror name=m0
select-all=true output-port=@p -- set bridge ovsbr0 mirrors=@mwhich yield this error So, in general, we need the ability to add GRE mirrors and it looks like the jinja logic doesn't handle mirroring I propose this interface to address this. Does it make sense ? First we declare the gre port as an interface. This requires the current logic to skip udev query ovs:
iface:
gre1:
set:
- type: gre
- options: remote_ip=10.0.1.15then we declare the mirror ovs:
mirror:
m0:
set:
- output-port: gre1
- select_all: true
bridge: ovsbr0The mirror cannot be created without adding it to a bridge in the same command, otherwise it could be possible to declare it on the bridge br:
ovsbr0:
ports: [bond0, gre1]
set:
- datapath_type: netdev
- mirrors: m0 |
|
@dann1 the first problem is just a mistake, |
|
There is also an edge case with DPDK. For mellanox cards, the So maybe we can do some driver detection prior to the binding and skip if driver is DPDK capable. As of now we only know this is an issue with vmnic0:
set:
- type: dpdk
- options:dpdk-devargs: '0000:01:00.0'
- mtu_request: 9126
driver: vfio-pci
paco111:
set:
- type: dpdk
- options:dpdk-devargs: '0000:81:00.0'
- mtu_request: 9126
driver: mlx5_core |
|
@dann1 I see, so it's similar issue as with SR-IOV, so unhardcoding the driver is required I guess. 🤔 |
|
@dann1 Please take a look at last 3 commits, they should be handling:
For the mellanox card you can use I'll try to implement the |
|
Mirroring port traffic in the OVS should be implemented at the API level / ovs driver and not in the configuration phase. I will skip this in one-deploy, at least for now. |
|
Systemd unit doesn't fail anymore 👌 Some more feedback. Checksum offloadI see the following in the jinja scripting logic While doing testing before the ovs role automation existed we didn't need this change. Is there some unwanted behavior when not disabling the offloading in the DPDK bridge ? We did however, found a seemingly related problem. When consuming DPDK bridges with regular OVS system interfaces (instead of NetworkManager and NetplanFor the use case that warranted this OVS role, disabling the network configuration logic is perfect. In this case the management interfaces are the same ones becoming bonds. However, if we don't declare ipv4 configuration over the bridge, then there is no need to disable the OS network management logic. If the server management interfaces are not used as OVS bridge interfaces, then the host becomes unreachable on the next power cycle. Do you think it is reasonable to avoid this deactivation if no PermissionsIn rhel, the # Create DPDK vhost sockets directory
mkdir -p /var/lib/one/vhost-sockets
chown oneadmin:oneadmin /var/lib/one/vhost-socketsIn the project specific logic we made a compatible-ish Would you like me to upload this ansible logic to a branch for a reference of the needed changes ? |
|
@dann1 Hi, thank you for feedback! 👍 😍
We actually have more serious problem it seems, another needed change is integration with the infra playbook as there's a scenario where FE VMs are expected to use |
|
There is yet another problem of conflicting OVS/DPDK and SR-IOV roles when their configs point to overlapping PCI devices... 😞 This one is a bit awkward to solve. |
- Handle both kernel and DPDK networking in OVS - Allow for creation of arbitrary number of bonds and bridges - Assert that 'ovs' structure is valid before applying any changes - Fix up checksum offloading for Debian-like distros - Make OVS configuration persistent (across reboots) Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
- Use external_ids to mark and manage only subset of OVS resources - Auto-cleanup OVS ports and bridges managed by one-deploy - Allow for interface and bridge re-configurations (limited) - Re-assembly bond resources on interface changes (fix) - Make it possible to move interfaces between different OVS bridges / bonds - Improve readability of opennebula-ovs.sh.jinja template Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
- Allow for setting options on "internal" ports - Recognize "internal" ports automatically matching br names - Update README.md to include "internal" port example
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
- Add ovs.port dictionary - Remove all/ALL prefixes from variable names and messages Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
In this case we can assume the bridge already exists (someone did manual configuration that one-deploy would do), and verify it with an assertion. But yes, this is more of an infra role and ideally we would have it at an early execution to avoid chicken and egg. Agree.
What is the issue here ? AFAIK, the PCI device would either be used for extracting VFs, PT or DPDK |
|
The issuse is that pci passthrough both in one-deploy and OpenNebula itself uses wildcards, so you can effectively set driver for a range of devices, but then you can on top of that run OVS/DPDK and claim the same PCI devices twice (by mistake), then if a device is used already either driverctl or dpdk-devbind.py may hang forever, which is a horrible UX. I think the fix here should be that PCI pasthrough part from helper/pci role should be executed first then we need to compare findings from the lspci_devices fact (produced by helper/pci role) with what user declared for dpdk and fail early without executing any operations on any devices. 🤔 Or we could remove driver management from ovs role and rely on pci role exclusively.
WDYT? 🤔 |
- Move 'global' repository management to pre.yml - Rely on already existing per-role repository management - Use helper/pci role to pre-configure PCI/SR-IOV early - Move openvswitch role invocation to pre.yml Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
9088758 to
4cb38e2
Compare
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
|
@dann1 Hi :)
I handled this differently, after some testing I think it's better to exclusively move to OVS (I dont want to maintain double-standard kind of a solution as I would have to make it reversable both ways, which is a small nightmare 🤗), just request user to define some IP address -> 3421a66
I looked at this one, I think we'd do this in another PR. 🤔
I think this commit 9acb255 should help 👍😇
I managed to make it so all non-nebula PCI/SR-IOV and OVS/DPDK code is run from the pre playbook, which helps with testing and sets the stage for infra role. |
Host connectivity can still be borked if you give it a placeholder IP but it seems super reasonable to avoid maintenance nightmares. In the role doc maybe we should state that using this role -> handing over the ip configuration to OK, since the selinux changes will come in another PR, I think we are good to merge. Do you want an issue open for selinux or will you make a PR directly? I just don't want it to be forgotten since VMs need this. Thank you VERY much for your patience 🤗 |
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
No description provided.