ifup eth0 sometimes hangs in 'setup-in-progress' status

Observed wickedd-dhcp4 in the system log
and saw that the eth0 was trapped in the validating DHCP lease state of the wicked FSM and cannot normally move to the “up” status,
while other interfaces can be successfully brought up.
Tried to set client-id by rfc2132 in the wicked config, but the issue still happened.

Any ideas what else I can check?
My wicked version is 0.6.54. Thanks.

Ken

	Line 19612: 2019-06-05T02:18:28.166208-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: Validating DHCPv4 address 20.10.12.1
	Line 19627: 2019-06-05T02:18:28.237436-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 19644: 2019-06-05T02:18:28.245048-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 19648: 2019-06-05T02:18:28.246817-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: Validating DHCPv4 address 20.10.11.254
	Line 19658: 2019-06-05T02:18:28.383760-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: timeout in state VALIDATING
	Line 19798: 2019-06-05T02:18:28.474400-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: timeout in state VALIDATING
	Line 20478: 2019-06-05T02:18:28.598274-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: timeout in state VALIDATING
	Line 20482: 2019-06-05T02:18:28.654751-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: timeout in state VALIDATING
	Line 20490: 2019-06-05T02:18:28.762583-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 20492: 2019-06-05T02:18:28.763486-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 20507: 2019-06-05T02:18:28.769886-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 20509: 2019-06-05T02:18:28.770268-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 20515: 2019-06-05T02:18:28.771407-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: Validating DHCPv4 address 20.10.11.244
	Line 20518: 2019-06-05T02:18:28.784714-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: timeout in state VALIDATING
	Line 20534: 2019-06-05T02:18:28.857638-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: timeout in state VALIDATING
	Line 20558: 2019-06-05T02:18:28.983572-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: timeout in state VALIDATING
	Line 20565: 2019-06-05T02:18:28.986531-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 20582: 2019-06-05T02:18:29.183763-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: timeout in state VALIDATING
	Line 20596: 2019-06-05T02:18:29.384072-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: timeout in state VALIDATING
	Line 20677: 2019-06-05T02:18:29.906379-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: Validating DHCPv4 address 20.10.12.2
	Line 20688: 2019-06-05T02:18:29.985307-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 20699: 2019-06-05T02:18:29.987548-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 20703: 2019-06-05T02:18:29.988367-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: Validating DHCPv4 address 20.10.11.114
	Line 20709: 2019-06-05T02:18:30.131474-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: timeout in state VALIDATING
	Line 20716: 2019-06-05T02:18:30.177395-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 20718: 2019-06-05T02:18:30.178347-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 20719: 2019-06-05T02:18:30.227965-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: timeout in state VALIDATING
	Line 20736: 2019-06-05T02:18:30.331468-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: timeout in state VALIDATING
	Line 20740: 2019-06-05T02:18:30.427701-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: timeout in state VALIDATING
	Line 20769: 2019-06-05T02:18:30.530706-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: timeout in state VALIDATING
	Line 21193: 2019-06-05T02:18:31.151315-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21197: 2019-06-05T02:18:31.371304-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21238: 2019-06-05T02:18:31.889481-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21240: 2019-06-05T02:18:31.890418-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21244: 2019-06-05T02:18:32.614952-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21248: 2019-06-05T02:18:32.891311-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21250: 2019-06-05T02:18:33.226612-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21312: 2019-06-05T02:18:35.853143-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21316: 2019-06-05T02:18:36.132594-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21343: 2019-06-05T02:18:36.254098-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21347: 2019-06-05T02:18:36.551462-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21351: 2019-06-05T02:18:37.741350-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21353: 2019-06-05T02:18:37.804173-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21355: 2019-06-05T02:18:37.829830-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	Line 21357: 2019-06-05T02:18:37.859369-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: ignoring dhcp4 packet from 8c:dc:d4:b3:60:78 arrived in state VALIDATING
	... (eth0 cannot stop validating the lease)

Hi and welcome to the Forum :slight_smile:
Looking at the code, seems it’s output that is ignored since it’s already received an ACK…

See line 81 here;
https://github.com/openSUSE/wicked/blob/master/src/dhcp4/fsm.c

Does it come up manually with;

ip link set eth0 up

[QUOTE=malcolmlewis;58033]Hi and welcome to the Forum :slight_smile:
Looking at the code, seems it’s output that is ignored since it’s already received an ACK…

See line 81 here;
https://github.com/openSUSE/wicked/blob/master/src/dhcp4/fsm.c

Does it come up manually with;

ip link set eth0 up [/QUOTE]

malcolmlewis thanks for your suggestion!
I tried the command, but eth0 is still trapped in the setup-in-progress status.

DL380G10R3S10:~ # ip link set eth0 up
DL380G10R3S10:~ # ifstatus eth0
eth0            setup-in-progress
      link:     #2, state up, mtu 1500
      type:     ethernet, hwaddr 98:f2:b3:ef:07:0e
      config:   compat:suse:/etc/sysconfig/network/ifcfg-eth0
      leases:   ipv4 dhcp requesting
      leases:   ipv6 dhcp requesting

There’re 5 interfaces (eth0, eth4, eth5, eth6, eth7) requesting dhcp4 ip in my system.
In normal cases (eth4 to eth7), I can find 3 timeout in state VALIDATING for each of them,
while in the case of eth0, only 2 timeout in state VALIDATING can be seen: the last timeout doesn’t happen.
Below are lines in the system messages by grep “timeout in state”:

2019-06-05T02:18:24.426602-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: timeout in state INIT
2019-06-05T02:18:24.605244-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: timeout in state INIT
2019-06-05T02:18:25.018864-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: timeout in state INIT
2019-06-05T02:18:25.267331-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: timeout in state INIT
2019-06-05T02:18:26.296874-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: timeout in state INIT
2019-06-05T02:18:28.383760-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: timeout in state VALIDATING
2019-06-05T02:18:28.474400-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: timeout in state VALIDATING
2019-06-05T02:18:28.598274-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: timeout in state VALIDATING
2019-06-05T02:18:28.654751-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: timeout in state VALIDATING
2019-06-05T02:18:28.784714-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: timeout in state VALIDATING
2019-06-05T02:18:28.857638-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: timeout in state VALIDATING
2019-06-05T02:18:28.983572-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: timeout in state VALIDATING
2019-06-05T02:18:29.183763-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: timeout in state VALIDATING
2019-06-05T02:18:29.384072-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: timeout in state VALIDATING
2019-06-05T02:18:30.131474-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: timeout in state VALIDATING
2019-06-05T02:18:30.227965-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: timeout in state VALIDATING
2019-06-05T02:18:30.331468-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: timeout in state VALIDATING
2019-06-05T02:18:30.427701-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: timeout in state VALIDATING
2019-06-05T02:18:30.530706-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: timeout in state VALIDATING

Wondering the reason that makes eth0 lose its last timeout in state VALIDATING…?

BTW, I tried to commented out line#843 that calls ni_dhcp4_fsm_validate_lease() to workaround the problem, and it seems workable.
https://github.com/openSUSE/wicked/blob/master/src/dhcp4/fsm.c#L843
Do you know how to turn off the DHCP4_DO_ARP flag in the right config way? Any side-effect to disable it? Thank you.

Ken

[QUOTE=kenjlin;58049]malcolmlewis thanks for your suggestion!
I tried the command, but eth0 is still trapped in the setup-in-progress status.

DL380G10R3S10:~ # ip link set eth0 up
DL380G10R3S10:~ # ifstatus eth0
eth0            setup-in-progress
      link:     #2, state up, mtu 1500
      type:     ethernet, hwaddr 98:f2:b3:ef:07:0e
      config:   compat:suse:/etc/sysconfig/network/ifcfg-eth0
      leases:   ipv4 dhcp requesting
      leases:   ipv6 dhcp requesting

There’re 5 interfaces (eth0, eth4, eth5, eth6, eth7) requesting dhcp4 ip in my system.
In normal cases (eth4 to eth7), I can find 3 timeout in state VALIDATING for each of them,
while in the case of eth0, only 2 timeout in state VALIDATING can be seen: the last timeout doesn’t happen.
Below are lines in the system messages by grep “timeout in state”:

2019-06-05T02:18:24.426602-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: timeout in state INIT
2019-06-05T02:18:24.605244-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: timeout in state INIT
2019-06-05T02:18:25.018864-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: timeout in state INIT
2019-06-05T02:18:25.267331-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: timeout in state INIT
2019-06-05T02:18:26.296874-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: timeout in state INIT
2019-06-05T02:18:28.383760-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: timeout in state VALIDATING
2019-06-05T02:18:28.474400-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: timeout in state VALIDATING
2019-06-05T02:18:28.598274-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: timeout in state VALIDATING
2019-06-05T02:18:28.654751-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: timeout in state VALIDATING
2019-06-05T02:18:28.784714-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth5: timeout in state VALIDATING
2019-06-05T02:18:28.857638-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth4: timeout in state VALIDATING
2019-06-05T02:18:28.983572-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: timeout in state VALIDATING
2019-06-05T02:18:29.183763-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: timeout in state VALIDATING
2019-06-05T02:18:29.384072-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth7: timeout in state VALIDATING
2019-06-05T02:18:30.131474-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: timeout in state VALIDATING
2019-06-05T02:18:30.227965-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: timeout in state VALIDATING
2019-06-05T02:18:30.331468-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: timeout in state VALIDATING
2019-06-05T02:18:30.427701-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth0: timeout in state VALIDATING
2019-06-05T02:18:30.530706-05:00 DL380G10R3S15 wickedd-dhcp4[1387]: eth6: timeout in state VALIDATING

Wondering the reason that makes eth0 lose its last timeout in state VALIDATING…?

BTW, I tried to commented out line#843 that calls ni_dhcp4_fsm_validate_lease() to workaround the problem, and it seems workable.
https://github.com/openSUSE/wicked/blob/master/src/dhcp4/fsm.c#L843
Do you know how to turn off the DHCP4_DO_ARP flag in the right config way? Any side-effect to disable it? Thank you.

Ken[/QUOTE]
Hi Ken
From the man 5 ifcfg shows a /etc/sysconfig/network/ifcfg-* option of;

SEND_GRATUITOUS_ARP=auto/yes/no

Try no and see what happens.

Can you try bringing up just one interface, wait a few seconds and bring the next one up etc with the ip command?

[QUOTE=malcolmlewis;58055]Hi Ken
From the man 5 ifcfg shows a /etc/sysconfig/network/ifcfg-* option of;

SEND_GRATUITOUS_ARP=auto/yes/no

Try no and see what happens.

Can you try bringing up just one interface, wait a few seconds and bring the next one up etc with the ip command?[/QUOTE]

Hi malcolmlewis,

If I bring up just one interface, it looks normal.
The issue happens when all interfaces are set up at the same time, and I think it should be a wicked-dhcp4 bug that restricts to bring up interfaces simultaneously.
Tried SEND_GRATUITOUS_ARP=no, but it cannot disable arp validation. Thank you.

Ken

[QUOTE=kenjlin;58058]Hi malcolmlewis,

If I bring up just one interface, it looks normal.
The issue happens when all interfaces are set up at the same time, and I think it should be a wicked-dhcp4 bug that restricts to bring up interfaces simultaneously.
Tried SEND_GRATUITOUS_ARP=no, but it cannot disable arp validation. Thank you.

Ken[/QUOTE]
Hi Ken
Are you in a position to raise a Service Request (SR)? If so I think it might be time…

[QUOTE=malcolmlewis;58062]Hi Ken
Are you in a position to raise a Service Request (SR)? If so I think it might be time…[/QUOTE]

Hi malcolmlewis,

No, I have no permission to raise a SR.
I put my preliminary analysis in the openSUSE github:
https://github.com/openSUSE/wicked/issues/798

Ken