Dropped telnet connections

We have recently upgraded our server hardware (IBM X3100 M4) and OS (SLES 11 SP2). Since this upgrade, we have had issues with dropped telnet connections.

We have approxiamately 130 filed offices, all of which have their own local server. Users use Glink terminal emulator (which we have used for 15+ years) on a PC running Windows XP (which we have been running for several years) to establish a telnet connection to the local server. When the user has logged in, Glink passes a command to the server in the form “program_name; exit”, and the session is associated with a pts.
Previous to the upgrade (and even now, for the most part), this performs as expected; program_name executes and, upon successful termination of the program, the user is logged out.

What we are seeing now, at least once a day and at all field offices, a single workstation (and not always the same one) will lose the telnet connection. A ps -ef still shows program_name running (still associated with the workstation), but no longer associated with a pts. Currently, for lack of a better solution, we have a cron job that runs every other minute that checks for and, if found, kills this orphaned session.

This Glink / Windows XP arrangement has worked well forn us in the past under various hardware configurations using AIX 5.0, SLES8, and SLES 9 SP2. Are there some TCP parameters that perhaps need to be tweaked under SLES 11 that are causing this issue?

On 12/18/2013 11:24 AM, jsatlar wrote:[color=blue]

We have recently upgraded our server hardware (IBM X3100 M4) and OS
(SLES 11 SP2). Since this upgrade, we have had issues with dropped
telnet connections.[/color]

Does this happen if you use SSH? I know, you’re not using it now, but if
it works while telnet does not then that would be an interesting data
point, and maybe good for the business since putty and other SSH clients
are free (no idea if ‘Glink’ is or not).
[color=blue]

We have approxiamately 130 filed offices, all of which have their own
local server. Users use Glink terminal emulator (which we have used for
15+ years) on a PC running Windows XP (which we have been running for
several years) to establish a telnet connection to the local server.
When the user has logged in, Glink passes a command to the server in the
form “-program_name-; exit”, and the session is associated with a pts.
Previous to the upgrade (and even now, for the most part), this performs
as expected; -program_name -executes and, upon successful termination of
the program, the user is logged out.[/color]

When does this termination happen? I am assuming when the user decides to
exit the program, so at the end of the day, or the end of a task, or
something. Does the length of the connection seem to matter?
[color=blue]

What we are seeing now, at least once a day and at all field offices, a
single workstation (and not always the same one) will lose the telnet
connection. A ps -ef still shows program_name running (still associated
with the workstation), but no longer associated with a pts. Currently,
for lack of a better solution, we have a cron job that runs every other
minute that checks for and, if found, kills this orphaned session.[/color]

Please post the contents of the /etc/xinetd.d/telnet (or whatever is
setting up your telnet service) file here within Code tags (# symbol in
the web interface toolbar) so we can try to duplicate things. Also,
anything in /var/log/messages or /var/log/firewall from the time of the
disconnect may be useful.
[color=blue]

This Glink / Windows XP arrangement has worked well forn us in the past
under various hardware configurations using AIX 5.0, SLES8, and SLES 9
SP2. Are there some TCP parameters that perhaps need to be tweaked under
SLES 11 that are causing this issue?[/color]

If nothing else shows up, get a LAN trace from at least the server side,
if not both the server and client sides, to see what is really happening
on the wire. Currently we do not know which side is causing the
disconnect, so while it is likely SLES (since nothing else has changed) we
do not know why SLES would do that, but we may see that reason (RST packet
from a firewall or router that does not like long connections, for
example) in a trace.

Code:

sudo /usr/sbin/tcpdump -n -s 0 -i any -w /tmp/telnet.cap port 23


Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…

contents of /etc/xinetd.d/telnet:

[X]# default: off

description: Telnet is the old login server which is INSECURE and should \

therefore not be used. Use secure shell (openssh).

If you need telnetd not to “keep-alives” (e.g. if it runs over a ISDN \

uplink), add “-n”. See ‘man telnetd’ for more details.

service telnet
{
disable = no
socket_type = stream
protocol = tcp
wait = no
user = root
server = /usr/sbin/in.telnetd
}[/x]

There does not seem to be any specific time of day this occurs. We have seen instances shortly after the beginning of business, sometimes the first occurrence is mid-day or later. We have even had some offices go a day or two without this issue. As far as when does it happen: we haven’t spoken with all users, but some we have spoken with indicate that from their perspective, the program ends by itself (meaning that the terminal window closes, but they did not actually exit the program), forcing them to log back in; other times it appears this may be happening at the end of the business day, perhaps because users are simply shutting down their workstation without actually exiting the program (?).

Unable to test the ssh solution. Users workstations are designed to come up with a custom start screen that has icons which launch Glink. The version of Glink we are using does not support ssh.

We haven’t noticed anything unusual in /var/log/messages; we know the time the orphan process is killed, and checking /var/log/messages around this timeframe does not show anything unusual.

Looking at one of our servers today where this has occurred once. Checking /var/log/xinetd.log are the following two lines associated with the orphan process was killed:

13/12/18@09:32:48: START: telnet pid=17800 from=10.100.108.29
13/12/18@09:33:56: EXIT: telnet status=1 pid=17800 duration=68(sec)

Another site, which has had four such occurences today, has the follwowing two lines associated with one of the workstations:

13/12/18@08:06:10: START: telnet pid=23406 from=10.100.109.56
13/12/18@10:52:40: EXIT: telnet status=1 pid=23406 duration=9990(sec)

I’d be interested in seeing the LAN trace next, then. Also, can you
duplicate this if you telnet to the system and run anything else and just
let it run forever? Perhaps telnet in and run ‘top’ just to see the
system’s top processes for fun.


Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…