# [SOLVED] iproute2 ignores connection marks set with nftables

## 39

Hello!

I currently use a VPN service to connect to the internet.

This VPN-Provider has some slow servers sometimes, so I would like to have more than one connection open and load balance all connections over the different interfaces.

I actually planned on putting this in a script, add some nftables counters to it and make it configurable. But right now, baby steps.

I thought of it like this:

 Mark every outgoing packet with an incremental fwmark from 390 to 393 (I have 3 interfaces right now)

 Have an ip rule for each mark-interface connection

Right now, I have the following setup:

nft list ruleset

```
table inet wgbalance {

   set wgblanace-vpnservers {

      type ipv4_addr . inet_service

      elements = { 185.213.154.68 . 51820,

              193.32.127.69 . 51820,

              193.32.127.70 . 51820 }

   }

   chain nat {

      type nat hook postrouting priority security; policy accept;

      oif "wgbalance0" snat ip to 10.67.177.66

      oif "wgbalance1" snat ip to 10.64.239.90

      oif "wgbalance2" snat ip to 10.67.221.61

   }

   chain output {

      type filter hook output priority filter; policy accept;

      ct state { established, related } accept

      ip daddr 1.1.1.1 meta nftrace set 1

      meta nfproto ipv4 ip daddr . udp dport @wgblanace-vpnservers meta mark set 0x00000027 ct mark set meta mark

      ip daddr 1.1.1.1 ct mark set numgen inc mod 3 offset 390

   }

   chain forward {

      type filter hook forward priority filter; policy accept;

   }

}
```

wg

```
[CUT FOR BREVITY]

interface: wgbalance0

  public key: [REDACTED]

  private key: (hidden)

  listening port: [REDACTED]

peer: C3jAgPirUZG6sNYe4VuAgDEYunENUyG34X42y+SBngQ=

  endpoint: 193.32.127.69:51820

  allowed ips: 0.0.0.0/0

  latest handshake: 11 seconds ago

  transfer: 10.60 KiB received, 43.55 KiB sent

  persistent keepalive: every 20 seconds

interface: wgbalance1

  public key: [REDACTED]

  private key: (hidden)

  listening port: [REDACTED]

peer: BLNHNoGO88LjV/wDBa7CUUwUzPq/fO2UwcGLy56hKy4=

  endpoint: 185.213.154.68:51820

  allowed ips: 0.0.0.0/0

  latest handshake: 12 seconds ago

  transfer: 10.60 KiB received, 43.55 KiB sent

  persistent keepalive: every 20 seconds

interface: wgbalance2

  public key: [REDACTED]

  private key: (hidden)

  listening port: [REDACTED]

peer: dV/aHhwG0fmp0XuvSvrdWjCtdyhPDDFiE/nuv/1xnRM=

  endpoint: 193.32.127.70:51820

  allowed ips: 0.0.0.0/0

  latest handshake: 1 minute, 5 seconds ago

  transfer: 10.60 KiB received, 43.55 KiB sent

  persistent keepalive: every 20 seconds

```

ip rule show

```
0:   from all lookup local

0:   from all fwmark 0x186 lookup rt_wgbalance0

0:   from all fwmark 0x187 lookup rt_wgbalance1

0:   from all fwmark 0x188 lookup rt_wgbalance2

0:   from all fwmark 0x390 lookup rt_wgbalance0

0:   from all fwmark 0x391 lookup rt_wgbalance1

0:   from all fwmark 0x392 lookup rt_wgbalance2

1:   from 172.16.100.0/24 lookup 1

32766:   from all lookup main

32767:   from all lookup default

```

And all of my tables contain a blackhole route, so all traffic that would go through VPN will get dropped (for testing purposes)

```
# ip route show table rt_wgbalance0

blackhole default

# ip route show table rt_wgbalance1

blackhole default

# ip route show table rt_wgbalance2

blackhole default

```

Next, I tried to ping 1.1.1.1 to see if it works. It should fail, since all packages to 1.1.1.1 will get a mark that should be blackhole routed.

```
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.

64 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=38.1 ms

--- 1.1.1.1 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 38.144/38.144/38.144/0.000 ms

```

The fwmark is present in the traffic flow, as indicated by conntrack:

```
# conntrack -L --dst 1.1.1.1

icmp     1 28 src=172.16.0.XX dst=1.1.1.1 type=8 code=0 id=51153 src=1.1.1.1 dst=172.16.0.XX type=0 code=0 id=51153 mark=391 use=1

conntrack v1.4.6 (conntrack-tools): 1 flow entries have been shown.

```

This means the ip routing rule is not applying correctly. I can also verfiy this by adding an IP, adding an ip rule and pinging from that:

```
ip add add 192.168.200.1/24 dev enp5s0

ip rule add from 192.168.200.1/24 table rt_wgbalance0

# ping -I 192.168.200.1 1.1.1.1 -c 1

PING 1.1.1.1 (1.1.1.1) from 192.168.200.1 : 56(84) bytes of data.

--- 1.1.1.1 ping statistics ---

1 packets transmitted, 0 received, 100% packet loss, time 0ms

```

I am fairly certain that I am missing something in the kernel - something like "support for packet marks in routing filters".

I also checked my config for the word "mark", but everything looks okay...

```
# cat /proc/config.gz | gzip -d | grep -i "MARK"

# CONFIG_GUP_BENCHMARK is not set

CONFIG_NETWORK_SECMARK=y

CONFIG_NF_CONNTRACK_MARK=y

CONFIG_NF_CONNTRACK_SECMARK=y

CONFIG_NETFILTER_XT_MARK=y

CONFIG_NETFILTER_XT_CONNMARK=y

CONFIG_NETFILTER_XT_TARGET_CONNMARK=y

CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y

CONFIG_NETFILTER_XT_TARGET_HMARK=y

CONFIG_NETFILTER_XT_TARGET_MARK=y

CONFIG_NETFILTER_XT_TARGET_SECMARK=y

CONFIG_NETFILTER_XT_MATCH_CONNMARK=y

CONFIG_NETFILTER_XT_MATCH_MARK=y

CONFIG_IP_SET_HASH_IPMARK=m

CONFIG_BRIDGE_EBT_MARK=y

CONFIG_BRIDGE_EBT_MARK_T=y

# CONFIG_NET_SCH_DSMARK is not set

# CONFIG_CLS_U32_MARK is not set

# CONFIG_NET_ACT_CONNMARK is not set

CONFIG_RAID6_PQ_BENCHMARK=y

# CONFIG_TRACEPOINT_BENCHMARK is not set

# CONFIG_RING_BUFFER_BENCHMARK is not set

# CONFIG_FIND_BIT_BENCHMARK is not set

```

Just in case, I also enabled logging of martian packets.

/etc/sysctl.conf

```
net.ipv4.conf.default.log_martians = 1

net.ipv4.conf.all.log_martians = 1

```

...but dmesg stays quiet, so that's probably not it either...

Has anyone ever had a similar issue? This has already cost me over 2 days, it works on Alpine, so this has to be something to do with my gentoo box...

All comments are appreciated, thank you for reading this far!Last edited by 39 on Fri Jun 11, 2021 9:11 pm; edited 1 time in total

----------

## user

Hi 39,

how about decimal vs hexadecimal?

hex

```
0:   from all fwmark 0x391 lookup rt_wgbalance1
```

dec

```
ip daddr 1.1.1.1 ct mark set numgen inc mod 3 offset 390

icmp     1 28 src=172.16.0.XX dst=1.1.1.1 type=8 code=0 id=51153 src=1.1.1.1 dst=172.16.0.XX type=0 code=0 id=51153 mark=391 use=1
```

----------

## 39

 *user wrote:*   

> how about decimal vs hexadecimal?
> 
> 

 

Thank you very much for your response. I wrote a short bash-script to quickly reproduce the issue I am facing.

```
# Routes

ip route add table rt_wgbalance0  blackhole default

ip route add table rt_wgbalance1  blackhole default

ip route add table rt_wgbalance2  blackhole default

nft add table inet classify

nft add chain inet classify output \{type filter hook output priority filter \; policy accept\;\}

nft add rule inet classify output ip daddr 1.1.1.1 ct mark set numgen inc mod 3 offset 390

# 390 in dec is 186...

ip rule add fwmark 186 lookup rt_wgbalance0

ip rule add fwmark 187 lookup rt_wgbalance1

ip rule add fwmark 188 lookup rt_wgbalance2

# HEX

ip rule add fwmark 0x390 lookup rt_wgbalance0

ip rule add fwmark 0x391 lookup rt_wgbalance1

ip rule add fwmark 0x392 lookup rt_wgbalance2

# 390 as dec

ip rule add fwmark 390 lookup rt_wgbalance0

ip rule add fwmark 391 lookup rt_wgbalance1

ip rule add fwmark 392 lookup rt_wgbalance2

# 0x390 in hex format is 912 in dec

ip rule add fwmark 912 lookup rt_wgbalance0

ip rule add fwmark 913 lookup rt_wgbalance1

ip rule add fwmark 914 lookup rt_wgbalance2

```

ip rule show

```
0:   from all fwmark 0x390 lookup rt_wgbalance0

0:   from all fwmark 0x391 lookup rt_wgbalance0

0:   from all fwmark 0x392 lookup rt_wgbalance0

0:   from all fwmark 0x391 lookup rt_wgbalance1

0:   from all fwmark 0x392 lookup rt_wgbalance2

0:   from all fwmark 0x186 lookup rt_wgbalance0

0:   from all fwmark 0x187 lookup rt_wgbalance1

0:   from all fwmark 0x188 lookup rt_wgbalance2

0:   from all fwmark 0xba lookup rt_wgbalance0

0:   from all fwmark 0xbb lookup rt_wgbalance1

0:   from all fwmark 0xbc lookup rt_wgbalance2

```

This should rule out any format errors, right? Still the same issue, unfortunately.

----------

## user

next layer of issue

connection mark vs packet mark

you set connection mark (ct mark)

```
nft add rule inet classify output ip daddr 1.1.1.1 ct mark set numgen inc mod 3 offset 390
```

but expect packet mark (fwmark)

```
0:   from all fwmark 0x186 lookup rt_wgbalance0

0:   from all fwmark 0x187 lookup rt_wgbalance1

0:   from all fwmark 0x188 lookup rt_wgbalance2
```

try to set packet mark (meta mark) based on connection mark (ct mark) like

```
nft add rule inet classify output ip daddr 1.1.1.1 meta mark set ct mark
```

----------

## 39

 *user wrote:*   

> try to set packet mark (meta mark) based on connection mark (ct mark) like
> 
> ```
> nft add rule inet classify output ip daddr 1.1.1.1 meta mark set ct mark
> ```
> ...

 

I added that to my ruleset at the bottom, also checked it with nft trace monitor, it gets evaluated, but appears like it does not do the trick either.

```
trace id f46c5347 inet classify output packet: oif "XXXXXX" ip saddr 172.16.0.XX ip daddr 1.1.1.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 2478 ip protocol icmp ip length 84 icmp code net-unreachable icmp id 12770 icmp sequence 1 @th,64,96 XXXXXX 

trace id f46c5347 inet classify output rule ip daddr 1.1.1.1 meta nftrace set 1 (verdict continue)

trace id f46c5347 inet classify output rule ip daddr 1.1.1.1 ct mark set numgen inc mod 3 offset 390 (verdict continue)

trace id f46c5347 inet classify output rule ip daddr 1.1.1.1 meta mark set ct mark (verdict continue)

trace id f46c5347 inet classify output verdict continue meta mark 0x00000186 

trace id f46c5347 inet classify output policy accept meta mark 0x00000186 

trace id f46c5347 ip vmnat postrouting packet: oif "XXXXXX" ip saddr 172.16.0.XX ip daddr 1.1.1.1 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 2478 ip length 84 icmp code net-unreachable icmp id 12770 icmp sequence 1 @th,64,96 XXXXXX 

trace id f46c5347 ip vmnat postrouting verdict continue meta mark 0x00000186

trace id f46c5347 ip vmnat postrouting policy accept meta mark 0x00000186 

```

----------

## user

Dig you further? How packets flows in netfilter?

Modify packet mark at type filter hook output branch is too late for getting respected at ip rule fwmark logic.

Try to use nftables type route hook output branch so it will reroute if e.g. packet mark (which is desired) changed.

----------

## 39

 *user wrote:*   

> Dig you further? How packets flows in netfilter?
> 
> Modify packet mark at type filter hook output branch is too late for getting respected at ip rule fwmark logic.
> 
> Try to use nftables type route hook output branch so it will reroute if e.g. packet mark (which is desired) changed.

 

Boom, works!

Thank you so very much! I didn't even know there was an entirely different rule type for local traffic before routing...

For reference, this is what finally works:

```
# Routing table setup

ip route add table rt_wgbalance0  blackhole default

ip route add table rt_wgbalance1  blackhole default

ip route add table rt_wgbalance2  blackhole default

nft add table inet classify

nft add chain inet classify output \{type route hook output priority filter \; policy accept\;\}

# This needs to be route, not filter -----^^^^^

nft add rule inet classify output ip daddr 1.1.1.1 ct mark set numgen inc mod 3 offset 390

nft add rule inet classify output ip daddr 1.1.1.1 meta mark set ct mark

ip rule add fwmark 390 lookup rt_wgbalance0

ip rule add fwmark 391 lookup rt_wgbalance1

ip rule add fwmark 392 lookup rt_wgbalance2
```

----------

