We’ve got a couple dozen SLES servers that we configured to authenticate with our Active Directory domain via the “Windows Domain Membership” tool in Yast. We primarily use it for SSH login, and we also have a line in our sudoers file allowing authorization via an AD group.
Lately, some servers have suddenly lost the ability to authenticate via AD. So far I’ve been unsuccessful in finding a common link defining why these servers in particular were affected.
[LIST]
[]The first affected server is SLES12 SP1, and it happened some weeks ago. I was busy at the time and can no longer remember what was done on the server around the time it happened.
[]The second server happened in the first step of a SLES12SP1->SP2->SP3 upgrade (so somewhere while I was upgrading between SP1 and SP2). It is now SP3 and still broken.
[*]The last server (so far) happened while I was preparing to upgrade SP1->SP2->SP3. The root partition was nearly full, so I extended the drive in VMWare, booted the VM to a SLES12 SP2 ISO, went into rescue mode, and extended the partition and filesystem using fdisk and resize2fs. After doing just that, not actually making any changes at the OS level, AD auth broke. I stopped there, and the server is still SLES12 SP1.
[/LIST]
The rest of my servers (mostly SLES11, but some SLES12 SP1 and SP3) are working just fine. And everything else on the affected servers appears to be working fine. In fact, applications on the affected servers can themselves successfully use AD authentication (via their own means).
I’ve turned on debugging in pam_winbind.conf and compared the logs. Both successful and unsuccessful attempts start the same way:
2018-05-23T08:37:19.621649-04:00 server01 sshd[20783]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=172.16.xx.xx user=domain\\username
2018-05-23T08:37:19.622328-04:00 server01 sshd[20783]: pam_winbind(sshd:auth): [pamh: 0x55c99b087a10] ENTER: pam_sm_authenticate (flags: 0x0001)
2018-05-23T08:37:19.622694-04:00 server01 sshd[20783]: pam_winbind(sshd:auth): getting password (0x000000d1)
2018-05-23T08:37:19.623049-04:00 server01 sshd[20783]: pam_winbind(sshd:auth): pam_get_item returned a password
2018-05-23T08:37:19.623381-04:00 server01 sshd[20783]: pam_winbind(sshd:auth): Verify user 'domain\\username'
2018-05-23T08:37:19.648066-04:00 server01 sshd[20783]: pam_winbind(sshd:auth): CONFIG file: require_membership_of 'LinuxServerUsers'
2018-05-23T08:37:19.648505-04:00 server01 sshd[20783]: pam_winbind(sshd:auth): enabling krb5 login flag
2018-05-23T08:37:19.648876-04:00 server01 sshd[20783]: pam_winbind(sshd:auth): no sid given, looking up: LinuxServerUsers
Then they diverge. A successful login looks like:
2018-05-23T09:02:27.658635-04:00 server02 sshd[12842]: pam_winbind(sshd:auth): request wbcLogonUser succeeded
2018-05-23T09:02:27.658967-04:00 server02 sshd[12842]: pam_winbind(sshd:auth): user 'domain\\username' granted access
2018-05-23T09:02:27.659244-04:00 server02 sshd[12842]: pam_winbind(sshd:auth): Returned user was 'DOMAIN\\username'
2018-05-23T09:02:27.659586-04:00 server02 sshd[12842]: pam_winbind(sshd:auth): [pamh: 0x55895157bc20] LEAVE: pam_sm_authenticate returning 0 (PAM_SUCCESS)
2018-05-23T09:02:27.659912-04:00 server02 sshd[12842]: pam_winbind(sshd:account): [pamh: 0x55895157bc20] ENTER: pam_sm_acct_mgmt (flags: 0x0000)
2018-05-23T09:02:27.667852-04:00 server02 sshd[12842]: pam_winbind(sshd:account): user 'DOMAIN\\username' granted access
2018-05-23T09:02:27.668264-04:00 server02 sshd[12842]: pam_winbind(sshd:account): [pamh: 0x55895157bc20] LEAVE: pam_sm_acct_mgmt returning 0 (PAM_SUCCESS)
While on one of the affected servers, it continues:
2018-05-23T08:37:19.726146-04:00 kmicontract01 sshd[20783]: pam_winbind(sshd:auth): request wbcLogonUser failed: WBC_ERR_AUTH_ERROR, PAM error: PAM_AUTH_ERR (7), NTSTATUS: NT_STATUS_LOGON_FAILURE, Error message was: Logon failure
2018-05-23T08:37:19.726621-04:00 kmicontract01 sshd[20783]: pam_winbind(sshd:auth): user 'domain\\username' denied access (incorrect password or invalid membership)
2018-05-23T08:37:19.726993-04:00 kmicontract01 sshd[20783]: pam_winbind(sshd:auth): [pamh: 0x55c99b087a10] LEAVE: pam_sm_authenticate returning 7 (PAM_AUTH_ERR)
2018-05-23T08:37:21.738222-04:00 kmicontract01 sshd[20781]: error: PAM: Authentication failure for domain\\\\username from 172.16.xx.xx
I’m using the same user, which obviously has the same group memberships in the domain in both cases. I’m at a loss for where to go from here for troubleshooting. As far as I can tell these attempts are not reaching a domain controller, as my user has not been locked in AD as it should be after 5 failed login attempts. Is there a way I can tell what domain controller a server is trying to authenticate with?
Any other suggestions? This one has me really scratching my head. Here are a couple of config files for reference (same on working and non-working servers):
/etc/security/pam_winbind.conf (commented sections removed)
[global]
cached_login = no
krb5_auth = yes
krb5_ccache_type =
require_membership_of = LinuxServerUsers
debug = yes
/etc/samba/smb.conf
[global]
workgroup = DOMAIN
passdb backend = tdbsam
printing = cups
printcap name = cups
printcap cache time = 750
cups options = raw
map to guest = Bad User
include = /etc/samba/dhcp.conf
logon path = \\\\%L\\profiles\\.msprofile
logon home = \\\\%L\\%U\\.9xprofile
logon drive = P:
usershare allow guests = No
idmap gid = 10000-20000
idmap uid = 10000-20000
kerberos method = secrets and keytab
realm = DOMAIN.COM
security = ADS
template homedir = /home/%D/%U
template shell = /bin/bash
#winbind offline logon = yes
#winbind refresh tickets = yes
[homes]
comment = Home Directories
valid users = %S, %D%w%S
browseable = No
read only = No
inherit acls = Yes
[profiles]
comment = Network Profiles Service
path = %H
read only = No
store dos attributes = Yes
create mask = 0600
directory mask = 0700
[users]
comment = All users
path = /home
read only = No
inherit acls = Yes
veto files = /aquota.user/groups/shares/
[groups]
comment = All groups
path = /home/groups
read only = No
inherit acls = Yes
[printers]
comment = All Printers
path = /var/tmp
printable = Yes
create mask = 0600
browseable = No
[print$]
comment = Printer Drivers
path = /var/lib/samba/drivers
write list = @ntadmin root
force group = ntadmin
create mask = 0664
directory mask = 0775