0

Server with linux version 5.5.0-050500-generic and OS version Ubuntu 20.04 LTS I have two interface which are connected to ovs bridge. As per normal flow packets are redirected to one interfcae to another on the bridge while pinging from external traffic generator[statndard nic 2 interfacec , each one on different namespace.]. It works fine. While running iperf/iperf3 its kernel get crashed. Kernel log at that time as follows.

[  589.827773] kernel BUG at ./include/linux/skbuff.h:4470!
[  589.827812] invalid opcode: 0000 [#1] SMP NOPTI
[  589.827818] CPU: 49 PID: 0 Comm: swapper/49 Tainted: G           OE     5.5.0-050500-generic #202001262030
[  589.827820] Hardware name: Dell Inc. PowerEdge R740/0WXD1Y, BIOS 2.6.4 04/09/2020
[  589.827881] Code: 28 89 47 2c e9 66 ff ff ff 48 8d 5f 50 48 89 df e8 ee a3 45 fa 84 c0 0f 84 52 ff ff ff 48 89 df e8 ae f6 45 fa e9 45 ff ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 48
[  589.827889] RSP: 0018:ffffb1e0872cc660 EFLAGS: 00010202
[  589.827896] RAX: 0000000000000008 RBX: ffff935334acd300 RCX: 0000000000000001
[  589.827899] RDX: 37815ffd09b20000 RSI: ffff934b67091000 RDI: ffff935334acd300
[  589.827901] RBP: ffffb1e0872cc698 R08: ffff9357175114ac R09: 0000000000000001
[  589.827904] R10: 0000000000000128 R11: 0000000000000178 R12: ffff934b67091000
[  589.827906] R13: ffff934b67094000 R14: ffff93571578c480 R15: 0000000000000001
[  589.827909] FS:  0000000000000000(0000) GS:ffff93571fc00000(0000) knlGS:0000000000000000
[  589.827914] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  589.827917] CR2: 00007f3cac01b468 CR3: 000000097160a001 CR4: 00000000007606e0
[  589.827920] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  589.827922] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  589.827924] PKRU: 55555554
[  589.827927] Call Trace:
[  589.827930]  <IRQ>
[  589.827942]  dev_hard_start_xmit+0x91/0x1f0
[  589.827953]  ? validate_xmit_skb+0x2f0/0x340
[  589.827965]  sch_direct_xmit+0x113/0x340
[  589.827976]  __dev_queue_xmit+0x57e/0x9d0
[  589.827986]  ? reweight_entity+0x16d/0x1b0
[  589.827995]  dev_queue_xmit+0x10/0x20
[  589.828007]  ovs_vport_send+0xa3/0x140 [openvswitch]
[  589.828014]  do_output+0x59/0x170 [openvswitch]
[  589.828022]  do_execute_actions+0x9ae/0x9d0 [openvswitch]
[  589.828031]  ? timerqueue_add+0x9b/0xb0
[  589.828044]  ? enqueue_hrtimer+0x3d/0x90
[  589.828054]  ? ktime_get+0x3e/0xa0
[  589.828062]  ? __update_load_avg_cfs_rq+0x1eb/0x2c0
[  589.828066]  ? attach_entity_load_avg+0x132/0x1a0
[  589.828071]  ? kmem_cache_alloc_node+0x1b3/0x260
[  589.828079]  ovs_execute_actions+0x48/0x110 [openvswitch]
[  589.828086]  ovs_dp_process_packet+0x99/0x1c0 [openvswitch]
[  589.828101]  ? netdev_create+0x40/0x40 [openvswitch]
[  589.828114]  ? ovs_ct_update_key+0x4d/0x110 [openvswitch]
[  589.828122]  ? netdev_create+0x40/0x40 [openvswitch]
[  589.828130]  ovs_vport_receive+0x77/0xd0 [openvswitch]
[  589.828135]  ? __update_load_avg_cfs_rq+0x1eb/0x2c0
[  589.828139]  ? account_entity_enqueue+0xa7/0xd0
[  589.828149]  ? __enqueue_entity+0x96/0xa0
[  589.828161]  ? enqueue_entity+0x116/0x660
[  589.828170]  ? record_times+0x1b/0x90
[  589.828179]  ? native_smp_send_reschedule+0x2a/0x40
[  589.828190]  netdev_frame_hook+0xca/0x190 [openvswitch]
[  589.828196]  __netif_receive_skb_core+0x2db/0xf70
[  589.828210]  ? get_page_from_freelist+0x1dc/0x390
[  589.828218]  ? tcp4_gro_receive+0x136/0x1a0
[  589.828225]  __netif_receive_skb_list_core+0x126/0x2c0
[  589.828231]  netif_receive_skb_list_internal+0x1d5/0x300
[  589.828237]  gro_normal_list.part.0+0x1e/0x40
[  589.828247]  napi_complete_done+0x91/0x140
[  589.828273]  efx_poll+0x282/0x580 [sfc]
[  589.828280]  net_rx_action+0x147/0x3b0
[  589.828289]  __do_softirq+0xe1/0x2d6
[  589.828297]  irq_exit+0xae/0xb0
[  589.828302]  do_IRQ+0x5a/0xf0
[  589.828306]  common_interrupt+0xf/0xf
[  589.828308]  </IRQ>
[  589.828316] RIP: 0010:cpuidle_enter_state+0xca/0x3e0
1
  • 1
    What kernel is that? Where did you get it? Why are you not using an official Ubuntu kernel? Feb 27, 2021 at 14:26

1 Answer 1

1

Go back to standard Ubuntu kernels (currently, v5.4):

sudo apt update && sudo apt install linux-generic
sudo apt-get autoremove "linux-image-unsigned-5.5.0-*"

Or, if you really do need a later version, you can fetch a reasonably modern & supported (currently, v5.8) kernel by installing the hardware-enablement branch:

sudo apt-get install linux-generic-hwe-20.04

The kernel that caused this is likely a canonical-provided "mainline" build: a one-off binary only meant to help you diagnosing kernel problems. Do not run unsupported mainline builds in production and stop running them right after figuring out whatever bug you were tracing using them.

OVS has been broken many times and will be broken again, and the issue you ran into is likely fixed in all (distribution or upstream) supported versions.

However, do try to ask the person who caused this.

Sure its bad to have a server with an abandoned kernel that does not receive any attention for a year, but the issue that lead to the decision to do that might have also have had the potential for grave business impact and if you are unable to test after the kernel switch if you reintroduced an old bug that might end badly.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .