This morning I resolved a problem that I have been trying to resolve, on and off, for several weeks now. Please bear with me as I describe the problem and what I determined.
I have four SLES 11 SP4 servers that host DB2 under z/VM 6.4 on a z13s. In order to minimize the application outage time to upgrade from SLES 11 to SLES 12 SP2 I followed the following process for each of the four servers:
- Clone the boot volume (a 3390 mod9) for each server to a new dasd volume.
- Used a test server to upgrade each boot volume, in place, to SLES 12 SP2. This included copying or merging info from the various .rpmnew files into the production files. I used a non-production IP address so that I could play with the SLES 12 system without affecting the production server.
I booted each SLES 12 volume several times during my testing. Note that the VM guest I was upgrading from only had the boot volume defined.
At implementation time I shut down the running server, backed up its SLES 11 boot volume and its SLES 12 boot volume to tape from a z/OS system. I then restored each backup copy to the other disk device (the disk that was SLES 11 became SLES 12Â ) so that I didnÂt have to modify the z/VM guest definition nor the production backup/restore jobs.
I successfully upgraded three of the servers to SLES 12 with no issues. The fourth one would only boot into recovery mode.
The fourth server is an exact clone of one of the other servers (same number and types of 3390 physical volumes). There are over 500 dasd volumes that comprise about 20 filesystems, where each are an LV. This SLES 12 server would only boot into recovery mode, even when I modified /etc/fstab in order to remove all filesystems except for the boot filesystem.
While in recovery mode, I was able to run a filesystem check against each filesystem (most are ext3 while several are reiser). I was then able to manually mount each filesystem. When I rebooted, the system again went into recovery mode.
I ran the suggested Âjournalctl command and parsed thru the boot messages. At a point in the boot process, after all of the dasd volumes were detected, I saw that SLES 12 was doing some sort of PV interrogation of the 527 or so volumes. It issued a message for the 28th volume (changed during each boot) that whatever it was doing timed out. It then started issuing [ERROR] messages and booted into recovery mode.
I spent hours and hours searching the forums and Google to no avail.
It dawned on me late yesterday that this is a Âclone of a production database server. When I checked the VM directory I found that the production server has access to five IFL engines while this server was restricted to one IFL (I thought they were set up the same at the z/VM level).
This morning I modified the VM directory to give my troubled server access to five IFLÂs and it booted properly. Performance Toolkit showed that this server was using over 350% of the z/VM system (i.e. over 3.5 of the IFLÂs) just to boot.
It appears that SLES 12 is doing far more in regards to interrogating the PVÂs attached to the server than SLES 11 did. The SLES 11 copy of this server boots in less than 2 minutes. The SLES 12 copy takes at least 4 minutes for the prompt to appear on the console but takes several more minutes before you can login via PuTTY.
I believe that this problem is somehow related to systemd being used instead of SysVinit. It should not take take that much more CPU cycles to boot the system in SLES 12 than it does with SLES 11.
Is there a parameter to make SLES 12 act like SLES 11 when interrogating the attached dasd devices?
Harley