Mini-condor official image not working in rancher

Configuration :

  • official Image untouched : htcondor/mini: Docker Hub documentation related to this image is here
  • Kubernetes - rancher
  • tag : 9.2-el7
  • yaml file :
apiVersion: apps/v1
kind: Deployment
metadata:
  name: htcondor-mini--all-in-one
  namespace: grafana-exporter
    spec:
      containers:
      - image: htcondor/mini:9.2-el7
        imagePullPolicy: Always
        name: htcondor-mini--all-in-one
        resources: {}
        securityContext:
          capabilities: {}
        stdin: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        tty: true
      dnsConfig: {}
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

The Error:

the container is running BUT it fails to start the htcondor collector:
PS : it is working on local env or minikube
PS 2 : condor_status and condor_master do not work
here is the logs when the container starts in rancher:

2021-09-15 09:26:36,908 INFO supervisord started with pid 1
2021-09-15 09:26:37,911 INFO spawned: 'condor_master' with pid 20
2021-09-15 09:26:37,912 INFO spawned: 'condor_restd' with pid 21
2021-09-15 09:26:37,917 INFO exited: condor_restd (exit status 127; not expected)
2021-09-15 09:26:37,924 INFO exited: condor_master (exit status 4; not expected)
2021-09-15 09:26:38,926 INFO spawned: 'condor_master' with pid 22
2021-09-15 09:26:38,928 INFO spawned: 'condor_restd' with pid 23
2021-09-15 09:26:38,932 INFO exited: condor_restd (exit status 127; not expected)
2021-09-15 09:26:38,936 INFO exited: condor_master (exit status 4; not expected)
2021-09-15 09:26:40,939 INFO spawned: 'condor_master' with pid 24
2021-09-15 09:26:40,943 INFO spawned: 'condor_restd' with pid 25
2021-09-15 09:26:40,947 INFO exited: condor_restd (exit status 127; not expected)
2021-09-15 09:26:40,948 INFO exited: condor_master (exit status 4; not expected)
2021-09-15 09:26:43,953 INFO spawned: 'condor_master' with pid 26
2021-09-15 09:26:43,955 INFO spawned: 'condor_restd' with pid 27
2021-09-15 09:26:43,959 INFO exited: condor_restd (exit status 127; not expected)
2021-09-15 09:26:43,968 INFO gave up: condor_restd entered FATAL state, too many start retries too quickly
2021-09-15 09:26:43,969 INFO exited: condor_master (exit status 4; not expected)
2021-09-15 09:26:44,970 INFO gave up: condor_master entered FATAL state, too many start retries too quickly

Cause of the issue

The issue is due to :

PSP policy (Pod security policy)
By default escalation is not permit for my condor user.

THE BEST SOLUTION I found at the moment is the following :

  • In the supervisord.conf : Run supervisor as condor user
  • In the supervisord.conf : run log and socket in /tmp
  • In the Dockerfile : Change the owner of most of folder by condor
  • In the deployment.yamlset the ID of the yaml directly to 64 (condor user)

Dockerfile

FROM htcondor/mini:9.2-el7

# SET WORKDIR
WORKDIR /home/condor/
RUN chown condor:condor /home/condor

# COPY SUPERVISOR
COPY supervisord.conf /etc/supervisord.conf

# Need to run the cmd to create all dir
RUN condor_master

# FIX PERMISSION ISSUES FOR RANCHER
RUN chown -R condor:condor /var/log/ /tmp &&\
 chown -R restd:restd /home/restd &&\
 chmod 755 -R /home/restd

supervisord.conf:

[supervisord]
user=condor
nodaemon=true
logfile = /tmp/supervisord.log
directory = /tmp
pidfile = /tmp/supervisord.pid
childlogdir = /tmp

# next 3 sections contain using supervisorctl to manage daemons
[unix_http_server]
file=/tmp/supervisord.sock
chown=condor:condor
chmod=0777
user=condor

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///tmp/supervisor.sock

[program:condor_master]
user=condor
command=/usr/sbin/condor_master -f
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile = /var/log/condor_master.log
stderr_logfile = /var/log/condor_master.error.log

deployment.yaml

apiVersion: apps/v1
kind: Deployment
spec:
      containers:
      - image: <condor-image>
        imagePullPolicy: Always
        name: htcondor-exporter
        ports:
        - containerPort: 8080
          name: myport
          protocol: TCP
        resources: {}
        securityContext:
          capabilities: {}
          runAsNonRoot: false
          runAsUser: 64
        stdin: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        tty: true