My environment, where the problem is occurring, consists of nine zLinux
servers running on a mainframe under a single copy of z/VM. The
database server is running SLES 10 SP4. The other eight servers are
running SLES 11 SP1. There are four servers running WebSphere
Application Server (standalone) and four running IBM HTTP Server. The
non-database servers are paired up where one HTTP server communicates
with one WAS server (i.e. Prod HTTP 1 is paired with Prod WAS 1). Prod
WAS 1 hosts a filesystem that is shared with the other non-database
servers via NFS. All of the non-database servers have full read/write
access to this filesystem. The http and WAS servers all serve up data
from this filesystem to the end users.
I’ve been experiencing a problem that appears to be caused when
transferring a large file, 3.1GB, via scp. An SQL is run on the
database server to create a text report which is saved on the database
server in a specified directory. Once an hour, a script is run on the
database server to copy all files in that directory over to a specific
directory on Prod WAS 1.
When the problem occurs, the three WAS servers (Prod WAS 2 - 4) stop
processing for several minutes and Prod WAS 1 processes maybe 10% of the
requests that it normally does. Yesterday, while this problem was
occuring, one of the http servers crashed (IBM HTTP Server not zLinux).
It appears that scp is locking the filesystem for the duration of the
copy. Is this true? If so, any idea why it would lock the entire
filesystem and not the file that is being created?
If it is true that the filesystem is being locked, I will modify the
process to have scp copy the file to a different filesystem and then
move the file where it needs to reside. I’d rather not jump thru this
hoop if I don’t have to.
I have spent over an hour Googling for an answer to this, so I’d
appreciate any insight that you can provide.