we've got a very specific requirement which I want to solve with Open vSwitch. It already works somehow - can you show me what I'm missing here?
Requirement: a Docker container connected to a mac-vlan interface exposes services on a specific port (needs to broadcast on the local network). We need to have the services available on a different port - and there is no way to configure the port on which the service is run. We already tried different approaches (reverse proxy, docker --ports directives, etc.) which didn't work for various reasons, mostly because we still have to stick to the IP of the mac-vlan interface.
That base setup is rather fix, my primary goal is to get it working that way, I think it should be possible.
Environment: Arch Linux with Kernel core/linux 5.10.9
and packages community/openvswitch 2.14.1-1
, community/docker 1:20.10.2-4
Enter Open vSwitch: we created the mac-vlan interface on an OVS Bridge and want to use OpenFlow directives to change the port.
# ovs-vsctl show
... output omitted
Bridge br1
Port br1.200
tag: 200
Interface br1.200 <<< our container is connected here
type: internal
Port br1
Interface br1
type: internal
Port patch-br0
Interface patch-br0 <<< uplink to OVS bridge with physical interface
type: patch
options: {peer=patch-br1}
Using Nginx for demonstration, should work with any container...
# docker network create -d macvlan --subnet=172.16.0.0/20 --ip-range=172.16.13.0/29 --gateway=172.16.0.1 -o parent=br1.200 mv.200
# docker run -d --name web --network mv.200 nginx
So far so clear, curl http://172.16.13.0
(which is container web in this case) returns the 'Welcome to nginx!' default page. Now we're trying the following OpenFlow configurations to make container service accessible on port 9080.
Variant 1:
# ovs-ofctl dump-flows br1
cookie=0x0, duration=1647.225s, table=0, n_packets=16, n_bytes=1435, priority=50,ct_state=-trk,tcp,nw_dst=172.16.13.0,tp_dst=9080 actions=ct(table=0)
cookie=0x0, duration=1647.223s, table=0, n_packets=3, n_bytes=234, priority=50,ct_state=+new+trk,tcp,nw_dst=172.16.13.0,tp_dst=9080 actions=ct(commit,nat(dst=172.16.13.0:80)),NORMAL
cookie=0x0, duration=1647.221s, table=0, n_packets=11, n_bytes=956, priority=50,ct_state=+est+trk,tcp,nw_dst=172.16.13.0,tp_dst=9080 actions=ct(nat),NORMAL
cookie=0x0, duration=1647.219s, table=0, n_packets=0, n_bytes=0, priority=50,ct_state=-trk,tcp,nw_src=172.16.13.0,tp_src=80 actions=ct(table=0)
cookie=0x0, duration=1647.217s, table=0, n_packets=12, n_bytes=2514, priority=50,ct_state=+trk,tcp,nw_src=172.16.13.0,tp_src=80 actions=ct(nat),NORMAL
cookie=0x0, duration=84061.461s, table=0, n_packets=309364, n_bytes=36251324, priority=0 actions=NORMAL
Outcome variant 1:
Now a curl http://172.16.13.0:9080
only works if there is already an active flow, but it breaks for the first attempt (tcpdump -i br1.200
on server).
Client > Server : 172.16.1.51:46056 > 172.16.13.0:80 SYN
Server > Client : 172.16.13.0:80 > 172.16.1.51:46056 SYN ACK
Client > Server : 172.16.1.51:46056 > 172.16.13.0:9080 ACK (destination port not translated)
Server > Client : 172.16.13.0:9080 > 172.16.1.51:46056 RST (unknown to server)
Server > Client : 172.16.13.0:80 > 172.16.1.51:46056 SYN ACK
Client > Server : 172.16.1.51:46056 > 172.16.13.0:80 RST (already ACK'ed)
Client > Server : 172.16.1.51:46058 > 172.16.13.0:80 SYN (second curl)
Server > Client : 172.16.13.0:80 > 172.16.1.51:46058 SYN ACK
Client > Server : 172.16.1.51:46058 > 172.16.13.0:80 ACK (now with correct port 80)
... (normal TCP connection from here)
Packet #3 should be covered by flow #3, apparently it is not working the way I thought.
# ovs-appctl dpctl/dump-conntrack | grep 172.16.13.0
tcp,orig=(src=172.16.1.51,dst=172.16.13.0,sport=46056,dport=9080),reply=(src=172.16.13.0,dst=172.16.1.51,sport=80,dport=46056),protoinfo=(state=CLOSING)
tcp,orig=(src=172.16.1.51,dst=172.16.13.0,sport=46058,dport=9080),reply=(src=172.16.13.0,dst=172.16.1.51,sport=80,dport=46058),protoinfo=(state=TIME_WAIT)
Can you help me understand why the ct(nat) action for the +trk+est flow is not working for the first connection (but then for the second one)?
Variant 2: (add mod_tp_dst to flow #2)
# ovs-ofctl dump-flows br1
cookie=0x0, duration=6182.935s, table=0, n_packets=0, n_bytes=0, priority=50,ct_state=-trk,tcp,nw_dst=172.16.13.0,tp_dst=9080 actions=ct(table=0)
cookie=0x0, duration=6182.931s, table=0, n_packets=0, n_bytes=0, priority=50,ct_state=+new+trk,tcp,nw_dst=172.16.13.0,tp_dst=9080 actions=mod_tp_dst:80,ct(commit,nat(dst=172.16.13.0:80)),NORMAL
cookie=0x0, duration=6182.928s, table=0, n_packets=0, n_bytes=0, priority=50,ct_state=+est+trk,tcp,nw_dst=172.16.13.0,tp_dst=9080 actions=ct(nat),NORMAL
cookie=0x0, duration=6182.925s, table=0, n_packets=0, n_bytes=0, priority=50,ct_state=-trk,tcp,nw_src=172.16.13.0,tp_src=80 actions=ct(table=0)
cookie=0x0, duration=6182.923s, table=0, n_packets=0, n_bytes=0, priority=50,ct_state=+trk,tcp,nw_src=172.16.13.0,tp_src=80 actions=ct(nat),NORMAL
cookie=0x0, duration=81462.938s, table=0, n_packets=302990, n_bytes=35637543, priority=0 actions=NORMAL
Outcome variant 2:
Running curl http://172.16.13.0:9080
the situation is improved a bit over variant 1 (tcpdump -i eth0
on client).
Client > Server : 172.16.1.51:45974 > 172.16.13.0:9080 SYN
Server > Client : 172.16.13.0:80 > 172.16.1.51:45974 SYN ACK (response source port not translated)
Client > Server : 172.16.1.51:45974 > 172.16.13.0:80 RST (unknown to client)
Client > Server : 172.16.1.51:45974 > 172.16.13.0:9080 SYN (retransmission)
Server > Client : 172.16.13.0:9080 > 172.16.1.51:45974 SYN ACK (now with correct port 9080)
Client > Server : 172.16.1.51:45974 > 172.16.13.0:9080 ACK
That way the connection always works, but it also add the SYN retransmission timeout to the session setup delay.
# ovs-appctl dpctl/dump-conntrack | grep 172.16.13.0
tcp,orig=(src=172.16.1.51,dst=172.16.13.0,sport=45974,dport=80),reply=(src=172.16.13.0,dst=172.16.1.51,sport=80,dport=45974),protoinfo=(state=SYN_SENT)
tcp,orig=(src=172.16.1.51,dst=172.16.13.0,sport=45974,dport=9080),reply=(src=172.16.13.0,dst=172.16.1.51,sport=80,dport=1355),protoinfo=(state=TIME_WAIT)
Can you help me understand why the first SYN ACK is received untranslated? Flow #5 and ct_state=+trk and actions=ct(nat) should have covered that.
Thanks for reading this long post. I'm thankful for any hints!