Skip to content

F-Stack behavior at high CPS #1076

@Sai-Raveendra-Kandregula

Description

Hello maintainers. I have an application written with f-stack that works as a transparent proxy, with some modifications to the tcp payload if dictated by application rules.

I was able to get a very decent amount of connection rate (around 13k cps) with all the traffic being modified, on a single CPU core. At this point CPU utilization is at around 100%. On even slightly increasing the cps post that is resulting in delayed connection deletions and active connections keep piling up, eventually making my application un-responsive due to mbuf exhaustion (speculation, since we also experienced a case where ff_ipc tools like netstat stop responding too).

If I did hit the maximum performance of my application, I would like to drop packets instead of my app getting unresponsive.

Please let me know if my configuration is something that is causing this. We slightly modified F-Stack to support mem_if configuration (and had to disable the mandatory rss) and programming ipv6 ipfw rules.

F-Stack Version: 1.24
Configuration:

[dpdk]
# Hexadecimal bitmask of cores to run on.
lcore_mask=$lcore_mask

# Huge Page Memory to use
memory=$huge_page_memory

# Size of F-Stack internal packet dispatch ring (default is 2048)
dispatch_ring_size=16384

# Number of memory channels.
channel=4

# Promiscuous mode of nic, defualt: enabled.
promiscuous=1
numa_on=1

# TX checksum offload skip, default: disabled.
# We need this switch enabled in the following cases:
# -> The application want to enforce wrong checksum for testing purposes
# -> Some cards advertize the offload capability. However, doesn't calculate checksum.
tx_csum_offoad_skip=0

# TCP segment offload, default: disabled.
tso=0

# HW vlan strip, default: enabled.
vlan_strip=1

# Set [vlanN]'s addrs like [portN] later
# the format is same as port_list
# Set vlan filter id, to enable L3/L4 RSS below vlan hdr is not enable after f-stack-1.22.
vlan_filter=1,2,4-6

# sleep when no pkts incomming
# unit: microseconds
idle_sleep=1

# sent packet delay time(0-100) while send less than 32 pkts.
# default 100 us.
# if set 0, means send pkts immediately.
# if set >100, will delay 100 us.
# unit: microseconds
pkt_tx_delay=100

# use symmetric Receive-side Scaling(RSS) key, default: disabled.
symmetric_rss=0

port_list=$port_list

# Number of vdev.
nb_vdev=0

# Number of bond.
nb_bond=0

# Number of memif.
nb_memif=$nb_memif

[memif0]
id=13
socket_abstract=no
role=client
rsize=14

[port$index]
addr=$ip
netmask=$netmask
broadcast=$broadcast_ip
gateway=$gw_ip
promiscuous=1
rx_queue_size=4096
tx_queue_size=4096

# This expected to be static during the application lifecycle. Modifying with any internal commands might break user commands.
if_name=$if_name

# FreeBSD network performance tuning configurations.
# Most native FreeBSD configurations are supported.
[freebsd.boot]
# If use rack/bbr which depend HPTS, you should set a greater value of hz, such as 1000000 means a tick is 1us.
hz=1000000

# Block out a range of descriptors to avoid overlap
# with the kernel's descriptor space.
# You can increase this value according to your app.
fd_reserve=1024

kern.ipc.maxsockets=1048576

# With a 100K session burst, your SYN cache buckets will overflow immediately, causing F-Stack 
# to drop incoming SYNs. Increasing hashsize to 32,768 (always use powers of 2) drastically 
# reduces hash collisions during the burst.
net.inet.tcp.syncache.hashsize=32768
net.inet.tcp.syncache.bucketlimit=100

net.inet.tcp.tcbhashsize=262144

kern.ncallout=2097152

kern.features.inet6=1

# H-TCP Congestion Control for a more aggressive increase in speed on higher
# latency, high bandwidth networks with some packet loss.
cc_htcp_load=YES

# hostcache cachelimit is the number of ip addresses in the hostcache list.
# Setting the value to zero(0) stops any ip address connection information from
# being cached and negates the need for "net.inet.tcp.hostcache.expire". We
# find disabling the hostcache increases burst data rates by 2x if a subnet was
# incorrectly graded as slow on a previous connection. A host cache entry is
# the client's cached tcp connection details and metrics (TTL, SSTRESH and
# VARTTL) the server can use to improve future performance of connections
# between the same two hosts. When a tcp connection is completed, our server
# will cache information about the connection until an expire timeout. If a new
# connection between the same client is initiated before the cache has expired,
# the connection will use the cached connection details to setup the
# connection's internal variables. This pre-cached setup allows the client and
# server to reach optimal performance significantly faster because the server
# will not need to go through the usual steps of re-learning the optimal
# parameters for the connection. To view the current host cache stats use
# "sysctl net.inet.tcp.hostcache.list"
net.inet.tcp.hostcache.cachelimit=0

# 30 second host cache
# net.inet.tcp.hostcache.expire=30000

# Enable the optimized version of soreceive() for stream (TCP) sockets.
# soreceive_stream() only does one sockbuf unlock/lock per receive independent
# of the length of data to be moved into the uio compared to soreceive() which
# unlocks/locks per *mbuf*. soreceive_stream() can significantly reduced CPU
# usage and lock contention when receiving fast TCP streams. Additional gains
# are obtained when the receiving application is using SO_RCVLOWAT to batch up
# some data before a read (and wakeup) is done.
# (default 0)
net.inet.tcp.soreceive_stream=1

[freebsd.sysctl]
kern.ipc.somaxconn=1048576
kern.ipc.maxsockbuf=16777216

net.add_addr_allfibs=1

net.link.ether.inet.maxhold=5

net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
net.inet.tcp.nolocaltimewait=1
net.inet.tcp.cc.algorithm=cubic
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_auto=1
net.inet.tcp.recvbuf_auto=1
net.inet.tcp.sendbuf_inc=16384
#net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.sack.enable=1
net.inet.tcp.blackhole=2
net.inet.tcp.msl=5000
net.inet.tcp.delayed_ack=1
net.inet.tcp.rfc1323=1
# Force close idle TCP connections
net.inet.tcp.always_keepalive=1
# Start keepalive probes after : 5 seconds
net.inet.tcp.keepidle=5000
# Keepalive probes interval : 5 seconds
net.inet.tcp.keepintvl=5000
# Number of probes before dropping conn
net.inet.tcp.keepcnt=3
# Handling Zombie connections FIN_WAIT_2
net.inet.tcp.fast_finwait2_recycle=1
# Duration (in milliseconds) before the socket is forcibly closed.
net.inet.tcp.finwait2_timeout=5000
# Prevent SYN-FIN flood attacks
net.inet.tcp.drop_synfin=1

net.inet.tcp.syncookies=1
net.inet.tcp.keepinit=5000
net.inet.tcp.syncache.rexmtlimit=1

net.inet.udp.blackhole=1
net.inet.ip.redirect=0
net.inet.ip.forwarding=1

net.inet6.ip6.auto_linklocal=0
net.inet6.ip6.accept_rtadv=0
net.inet6.icmp6.rediraccept=1
net.inet6.ip6.forwarding=1

# set default stacks:freebsd, rack or bbr, may be you need increase the value of parameter 'freebsd.boot.hz' while use rack or bbr.
net.inet.tcp.functions_default=freebsd
# need by bbr, should enable it.
net.inet.tcp.hpts.skip_swi=0
# Interval between calls to hpts_timeout_dir. default min 250us, max 256-512ms, default 512ms.
net.inet.tcp.hpts.minsleep=10
# [25600-51200]
net.inet.tcp.hpts.maxsleep=25600

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions