Hey guys,
we have some issues with a SLES 11SP4 machine. In the /var/log/messages and the dmesg as well, of course, there are some kernel deadlock messages. See code below.
These deadlocks are mostly pointing to the kworkerprocess. Sometimes they affect other processes as well.
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.784995] warn_alloc_failed: 2 callbacks suppressed
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.784998] kworker/0:1: page allocation failure: order:1, mode:0x20
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785008] Pid: 0, comm: kworker/0:1 Tainted: G X 3.0.101-77-default #1
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785009] Call Trace:
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785033] [<ffffffff81004b95>] dump_trace+0x75/0x300
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785045] [<ffffffff81466bb3>] dump_stack+0x69/0x6f
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785053] [<ffffffff81102ff6>] warn_alloc_failed+0xc6/0x170
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785056] [<ffffffff81104c0f>] __alloc_pages_slowpath+0x56f/0x820
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785059] [<ffffffff811050a9>] __alloc_pages_nodemask+0x1e9/0x200
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785070] [<ffffffff81147be3>] kmem_getpages+0x53/0x190
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785073] [<ffffffff81148a16>] fallback_alloc+0x196/0x270
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785075] [<ffffffff81149607>] kmem_cache_alloc+0x207/0x2a0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785087] [<ffffffff813995e2>] sk_prot_alloc+0x42/0x1b0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785090] [<ffffffff8139ac3a>] sk_clone+0x1a/0x320
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785098] [<ffffffff813e9220>] inet_csk_clone+0x10/0xb0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785102] [<ffffffff814027e1>] tcp_create_openreq_child+0x21/0x4b0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785104] [<ffffffff81400b37>] tcp_v4_syn_recv_sock+0x47/0x240
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785106] [<ffffffff8140264f>] tcp_check_req+0x2df/0x450
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785108] [<ffffffff81401727>] tcp_v4_hnd_req+0x67/0x130
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785110] [<ffffffff8140197b>] tcp_v4_do_rcv+0x18b/0x200
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785112] [<ffffffff81401f87>] tcp_v4_rcv+0x597/0x7f0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785115] [<ffffffff813dea38>] ip_local_deliver_finish+0x118/0x200
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785118] [<ffffffff813a760a>] __netif_receive_skb+0x60a/0x640
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785124] [<ffffffff813acf30>] netif_receive_skb+0x90/0xa0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785134] [<ffffffffa02036f1>] vmxnet3_rq_rx_complete+0x4b1/0xa60 [vmxnet3]
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785144] [<ffffffffa0203e04>] vmxnet3_poll_rx_only+0x44/0xd0 [vmxnet3]
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785147] [<ffffffff813adfda>] net_rx_action+0x10a/0x2b0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785157] [<ffffffff8106925f>] __do_softirq+0x11f/0x260
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785163] [<ffffffff81472fdc>] call_softirq+0x1c/0x30
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785165] [<ffffffff81004695>] do_softirq+0x65/0xa0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785168] [<ffffffff81069025>] irq_exit+0xc5/0xe0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785171] [<ffffffff81003a5f>] do_IRQ+0x5f/0xe0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785174] [<ffffffff8146a193>] common_interrupt+0x13/0x13
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785181] [<ffffffff810300a2>] native_safe_halt+0x2/0x10
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785186] [<ffffffff8100b195>] default_idle+0x145/0x150
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785189] [<ffffffff81002116>] cpu_idle+0x66/0xb0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785196] kworker/0:1: page allocation failure: order:1, mode:0x20
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785197] Pid: 0, comm: kworker/0:1 Tainted: G X 3.0.101-77-default #1
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785198] Call Trace:
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785200] [<ffffffff81004b95>] dump_trace+0x75/0x300
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785202] [<ffffffff81466bb3>] dump_stack+0x69/0x6f
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785204] [<ffffffff81102ff6>] warn_alloc_failed+0xc6/0x170
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785206] [<ffffffff81104c0f>] __alloc_pages_slowpath+0x56f/0x820
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785208] [<ffffffff811050a9>] __alloc_pages_nodemask+0x1e9/0x200
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785211] [<ffffffff81147be3>] kmem_getpages+0x53/0x190
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785213] [<ffffffff81148a16>] fallback_alloc+0x196/0x270
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785215] [<ffffffff81149607>] kmem_cache_alloc+0x207/0x2a0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785218] [<ffffffff813995e2>] sk_prot_alloc+0x42/0x1b0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785219] [<ffffffff8139ac3a>] sk_clone+0x1a/0x320
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785222] [<ffffffff813e9220>] inet_csk_clone+0x10/0xb0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785224] [<ffffffff814027e1>] tcp_create_openreq_child+0x21/0x4b0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785226] [<ffffffff81400b37>] tcp_v4_syn_recv_sock+0x47/0x240
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785227] [<ffffffff8140264f>] tcp_check_req+0x2df/0x450
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785229] [<ffffffff81401727>] tcp_v4_hnd_req+0x67/0x130
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785231] [<ffffffff8140197b>] tcp_v4_do_rcv+0x18b/0x200
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785233] [<ffffffff81401f87>] tcp_v4_rcv+0x597/0x7f0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785235] [<ffffffff813dea38>] ip_local_deliver_finish+0x118/0x200
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785238] [<ffffffff813a760a>] __netif_receive_skb+0x60a/0x640
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785240] [<ffffffff813acf30>] netif_receive_skb+0x90/0xa0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785243] [<ffffffffa02036f1>] vmxnet3_rq_rx_complete+0x4b1/0xa60 [vmxnet3]
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785247] [<ffffffffa0203e04>] vmxnet3_poll_rx_only+0x44/0xd0 [vmxnet3]
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785250] [<ffffffff813adfda>] net_rx_action+0x10a/0x2b0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785252] [<ffffffff8106925f>] __do_softirq+0x11f/0x260
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785255] [<ffffffff81472fdc>] call_softirq+0x1c/0x30
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785257] [<ffffffff81004695>] do_softirq+0x65/0xa0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785259] [<ffffffff81069025>] irq_exit+0xc5/0xe0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785261] [<ffffffff81003a5f>] do_IRQ+0x5f/0xe0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785263] [<ffffffff8146a193>] common_interrupt+0x13/0x13
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785266] [<ffffffff810300a2>] native_safe_halt+0x2/0x10
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785268] [<ffffffff8100b195>] default_idle+0x145/0x150
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.785270] [<ffffffff81002116>] cpu_idle+0x66/0xb0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984156] kworker/0:1: page allocation failure: order:1, mode:0x20
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984159] Pid: 0, comm: kworker/0:1 Tainted: G X 3.0.101-77-default #1
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984161] Call Trace:
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984171] [<ffffffff81004b95>] dump_trace+0x75/0x300
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984176] [<ffffffff81466bb3>] dump_stack+0x69/0x6f
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984180] [<ffffffff81102ff6>] warn_alloc_failed+0xc6/0x170
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984183] [<ffffffff81104c0f>] __alloc_pages_slowpath+0x56f/0x820
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984186] [<ffffffff811050a9>] __alloc_pages_nodemask+0x1e9/0x200
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984190] [<ffffffff81147be3>] kmem_getpages+0x53/0x190
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984193] [<ffffffff81148a16>] fallback_alloc+0x196/0x270
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984196] [<ffffffff81149607>] kmem_cache_alloc+0x207/0x2a0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984201] [<ffffffff813995e2>] sk_prot_alloc+0x42/0x1b0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984203] [<ffffffff8139ac3a>] sk_clone+0x1a/0x320
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984207] [<ffffffff813e9220>] inet_csk_clone+0x10/0xb0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984210] [<ffffffff814027e1>] tcp_create_openreq_child+0x21/0x4b0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984212] [<ffffffff81400b37>] tcp_v4_syn_recv_sock+0x47/0x240
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984214] [<ffffffff8140264f>] tcp_check_req+0x2df/0x450
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984216] [<ffffffff81401727>] tcp_v4_hnd_req+0x67/0x130
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984218] [<ffffffff8140197b>] tcp_v4_do_rcv+0x18b/0x200
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984220] [<ffffffff81401f87>] tcp_v4_rcv+0x597/0x7f0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984223] [<ffffffff813dea38>] ip_local_deliver_finish+0x118/0x200
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984226] [<ffffffff813a760a>] __netif_receive_skb+0x60a/0x640
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984229] [<ffffffff813acf30>] netif_receive_skb+0x90/0xa0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984240] [<ffffffffa02036f1>] vmxnet3_rq_rx_complete+0x4b1/0xa60 [vmxnet3]
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984250] [<ffffffffa0203e04>] vmxnet3_poll_rx_only+0x44/0xd0 [vmxnet3]
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984253] [<ffffffff813adfda>] net_rx_action+0x10a/0x2b0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984258] [<ffffffff8106925f>] __do_softirq+0x11f/0x260
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984262] [<ffffffff81472fdc>] call_softirq+0x1c/0x30
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984264] [<ffffffff81004695>] do_softirq+0x65/0xa0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984267] [<ffffffff81069025>] irq_exit+0xc5/0xe0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984270] [<ffffffff81003a5f>] do_IRQ+0x5f/0xe0
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984273] [<ffffffff8146a193>] common_interrupt+0x13/0x13
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984277] [<ffffffff810300a2>] native_safe_halt+0x2/0x10
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984280] [<ffffffff8100b195>] default_idle+0x145/0x150
Nov 24 12:49:27 SOME-SYSTEM-NAME kernel: [11737893.984283] [<ffffffff81002116>] cpu_idle+0x66/0xb0
Anyone here, that could give us some clues where are we locking at?
I know that a deadlock is nothing necessary to take care about, but in this increasing amount i want to do something
Thank everyone for helping us out.
Greetings from germany
Ulli