Numerous Issues with new 2021.10 installation

Deployment: Local Node
EaaSI Version: v2021.10, HEAD detached at a2d2e9e
Browser: Firefox (though not relevant

Description:
I’m following the new installation instructions for the EaaSI system based on the v2021.10. The target server is running Ubuntu 18.04, the eaasi user has passwordless sudo privileges and most of the ansible installation appears to work.

Attached files include:

  1. eaasi_csuci_ansible_20220531.txt, the output of the ansible script with the issue describe below
  2. eaasi_csuci_permissions_20220531.txt, output of ls -al to show the occasional issue with eaasi-ui being assigned to root (which prevents the ansible script from accessing it)
  3. eaasi_csuci_service_20220531.txt, the output of a direct call to the docker compose up as it appears in eaas.service

The current issue occurs when the ansible script restarts the eaas.service and tries to connect to http://eaasi.csuci.edu/emil/environment-repository/actions/prepare. Nothing apparently happens. The snippet from the above ansible script output is:

TASK [wait for eaas-server to start up] *************************************************
fatal: [eaas-gateway]: FAILED! => changed=false
attempts: 1
content: ''
elapsed: 0
msg: 'Status code was -1 and not [200]: Request failed: <urlopen error [Errno -2] Name or service not known>'
redirected: false
status: -1
url: http://eaasi.csuci.edu/emil/environment-repository/actions/prepare

The server is being configured with an ssl certificate chain. I tried to run the install script again and removed the ssl from eaasi.yaml. The issue still appeared. I’m getting a 405 error when trying to connect to the above url from the browser.

Currently my docker ps looks like this:

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
038d924afcc8 eaas/eaas-appserver:v2021.10-eaasi "/init" About an hour ago Up About an hour eaas
f3a3b5018e48 nginx:stable "/docker-entrypoint.…" About an hour ago Up About an hour 0.0.0.0:80->80/tcp eaasi-nginx
a3c38de86c57 registry.gitlab.com/eaasi/eaasi-client-pub/eaasi-web-api:v2021.10 "docker-entrypoint.s…" About an hour ago Up About an hour 0.0.0.0:8081->8081/tcp eaasi-web-api
e942ddcbfcbb minio/minio:RELEASE.2021-11-03T03-36-36Z "/usr/bin/docker-ent…" About an hour ago Up About an hour 9000/tcp minio
7094e9a7e9f2 registry.gitlab.com/eaasi/eaasi-client-pub/eaasi-front-end:v2021.10 "/docker-entrypoint.…" About an hour ago Up About an hour 0.0.0.0:8080->80/tcp eaasi-front-end
f0c334889ee1 registry.gitlab.com/eaasi/eaasi-client-pub/eaasi-database:v2021.10 "docker-entrypoint.s…" About an hour ago Up About an hour 0.0.0.0:5432->5432/tcp eaasi-database
a4052b4ab51c nginx "/docker-entrypoint.…" About an hour ago Up About an hour 80/tcp nginx
a9498903a188 jboss/keycloak:15.0.2 "/opt/jboss/tools/do…" About an hour ago Up About an hour 8080/tcp, 8443/tcp keycloak

This is the latest issue. I also had a previous issue with a failure to restart the eaas.service during installation due to the eaasi user not being able to connect to docker. I added the user to the docker group permissions and that appeared to fix it.

I also have a recurring issue where a fresh installation will fail due to the eaasi/eaasi-ui folder being created with root root privileges, which apparently prevents the eaasi user from accessing that folder.

Are you able to reproduce the issue or did it happen once? What steps can you take to repeat the issue? What did you expect to occur and what was the actual outcome?

I expected the installation script to execute without an error.

Yes, I can reproduce this after clearing the installation from the server and re-running scripts/deploy.sh.

I remove EaaSI from the server with the following commands (maybe I’m missing something):
rm -rf /eaasi
rm /etc/systemd/system/eaas.service
docker container stop $(docker ps -aq)
docker container prune -f

Urgency: If possible, please give an indication of how urgently the issue needs to be addressed - is there a timeline or deadline (e.g. upcoming demo, researcher request, etc.) that EaaSI support staff should be aware of?

I was planning on starting to use EaaSI with a research cohort of students next week (starting June 6th). If I can’t fix this I’ll need to reorganize the project to not use EaaSI (at least initially). So this is rather urgent at this time.
eaas_csuci_service_log_20220531.txt (46.4 KB)
eaasi_csuci_ansible_20220531.txt (94.0 KB)
eaasi_csuci_permissions_20220531.txt (942 Bytes)

Hi @ekaltman, this is indeed a new Ansible error to us, so we’ll take a look ASAP (and the permissions issues in the setup). For my part, everything you’re doing sounds right; running rm -rf on the configured install directory and stopping or removing the eaas.service should do the trick in terms of re-attempting the install fresh, I don’t think the Docker steps are necessary but also am not aware that would hurt anything.

Would you be able to share the eaasi.yaml you’re using (either in a DM with @oooleg and myself or here, with the superadmin credentials redacted?)

Also just to 100% clarify, is this deploying to locally-owned infrastructure or to a cloud machine? (if the latter, which service?)

Sure thing @ethan.gates . I’ve attached the eaasi.yaml configuration file with the super admin credentials redacted.

I’m installing to a local VM in our data center directly and not to a cloud provider.

I’m going to take a look for a little bit more into the 405 error, I’m not sure why http://eaasi.csuci.edu/emil/environment-repository/actions/prepare is not responding at this step. I don’t think it’s a cert issue on my end but I’ll try to look into the docker container logs if I can locate them.
eaasi-redacted.yaml (1.1 KB)

Another thing is although the ansible script failed. I can access the eaasi login screen at http://eaasi.csuci.edu but not https://. The ssl certificates are installed in the correct location and the cert chain should be valid. I don’t know if this is just because the ansible script failed before setting up ssl properly?

I’m also trying to just see if the setup continues from this point to try things out. The eaas-orgctl had another error in that it expected the python3 module requests to be present. I usually keep my base python instance totally clean and use virtualenv.

UPDATE: It appears that while the certificate I was using is valid, the chain I was using is not. I don’t know if that would cause the error in the ansible script. I’ll grab a valid full chain tomorrow hopefully and see if that changes anything. However, it might not since the issue was also present without the ssl cert anyway.

Hi @ekaltman!

The installer log looks fine to me. But it seems like your issues might be related to your cert or name resolution. I can’t ping eaasi.csuci.edu from here. Is the server still running?

Could you also provide backend’s log located at /eaasi/log/server/server.log, it should contain useful information for debugging.

The wrong permissions for eaasi-ui look indeed strange. The installer log tells that the directory already existed when it was run, so it might have been created by root externally? Do permissions still mismatch even if you remove eaasi-ui directory and then rerun the installer again?

Please also note, that you should always use systemctl for managing EaaS containers:

$ sudo systemctl stop eaas      # to stop eaas
$ sudo systemctl start eaas     # to start eaas
$ sudo systemctl restart eaas   # to restart eaas

Hence, to remove EaaSI from your server, you should execute the following in that order:

$ sudo systemctl stop eaas
$ sudo rm -rf /eaasi
$ sudo rm /etc/systemd/system/eaas.service

Thanks for the info! I’ll follow that re-installation procedure, I was inconsistently stopping the service during installation. My initial issue was the service stalling out and failing on the docker compose up due to docker user permissions.

I’ll follow that removal procedure to see if the service was mucking with things, though unlikely.

Eaasi.csuci.edu is not publicly accessible right now, I wanted to finish the installation before opening up access to it.

I’ll send the server.log once I get in in the morning (900 PT), as I inconveniently do not have VPN access to the machine at my home at the moment. I think the cert is probably an issue so I’ll need to resolve that as well.

As for the eaasi-ui permissions, I’ll see when it crops up but I think it happened on a fresh installation (with the /eaasi folder deleted.) If it happens again I’ll grab the ansible log.

It is important to note here, that we are using docker-compose to manage containers internally (systemd-unit named eaas.service is basically just a wrapper around it). Hence, any calls like docker container stop bypass docker-compose’s state management and might trigger unexpected behaviour as a result. To avoid that, you should always use systemctl or the commandline as it appears in the eaas.service unit, e.g.:

$ cd /eaasi
$ sudo docker-compose --project-name eaas --file ./docker-compose.yaml --file ./eaasi-ui/docker-compose.yaml down

Everything else might break your installation.

Server log is attached.
server.log (79.9 KB)

The backend booted and initialized without any issues. What does the following command return:

$ curl -v -X POST 'http://eaasi.csuci.edu/emil/environment-repository/actions/prepare'

from my local office machine in network:
* Trying 172.30.240.23...
* TCP_NODELAY set
* Connected to eaasi.csuci.edu (172.30.240.23) port 80 (#0)
> POST /emil/environment-repository/actions/prepare HTTP/1.1
> Host: eaasi.csuci.edu
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.22.0
< Date: Thu, 02 Jun 2022 19:30:16 GMT
< Content-Type: application/json
< Content-Length: 69
< Connection: keep-alive
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Headers: content-type, authorization
< Access-Control-Allow-Methods: POST, GET, DELETE, OPTIONS
<
* Connection #0 to host eaasi.csuci.edu left intact
{"status":"0","message":"Preparing environment-repository finished!"}* Closing connection 0

However, the installation I ran this morning (that is still running) had the same error as before.

I can connect to the login at http:// but not https://, is there a log location to check for a cert issue?

This looks good. What does the same command return when called with https instead of http?

eaasi curl -v -X POST 'https://eaasi.csuci.edu/emil/environment-repository/actions/prepare'
* Trying 172.30.240.23...
* TCP_NODELAY set
* Connection failed
* connect to 172.30.240.23 port 443 failed: Connection refused
* Failed to connect to eaasi.csuci.edu port 443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to eaasi.csuci.edu port 443: Connection refused'

I think something might have happened with the firewall. Port 443 is not open right now though I thought it was externally. I have to communicate with IT since I can’t change that on my end.

It doesn’t appear that eaasi is listening for 443? Apparently the port should be reachable according to IT.

Ran nmap on the eaasi server.

Starting Nmap 7.60 ( https://nmap.org ) at 2022-06-02 21:10 UTC
Nmap scan report for localhost (127.0.0.1)
Host is up (0.0000040s latency).
Not shown: 994 closed ports
PORT STATE SERVICE
22/tcp open ssh
25/tcp open smtp
80/tcp open http
5432/tcp open postgresql
8080/tcp open http-proxy
8081/tcp open blackice-icecap

Nmap done: 1 IP address (1 host up) scanned in 1.63 seconds

State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 127.0.0.53%lo:53 0.0.0.0:* users:((“systemd-resolve”,pid=957,fd=13))
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:((“sshd”,pid=1597,fd=3))
LISTEN 0 100 0.0.0.0:25 0.0.0.0:* users:((“master”,pid=1896,fd=13))
LISTEN 0 128 *:80 : users:((“docker-proxy”,pid=67145,fd=4))
LISTEN 0 128 *:8080 : users:((“docker-proxy”,pid=66704,fd=4))
LISTEN 0 128 :8081 : users:((“docker-proxy”,pid=66856,fd=4))
LISTEN 0 128 [::]:22 [::]:
users:((“sshd”,pid=1597,fd=4))
LISTEN 0 128 :5432 : users:((“docker-proxy”,pid=66302,fd=4))
LISTEN 0 100 [::]:25 [::]:
users:((“master”,pid=1896,fd=14))

I can login to the system with eaas-orgctl, but I’m not sure if the lack of https will mess things up with the node? If not I can proceed to try and stand up some test instances I guess.

Have you tried to re-run the installer with ssl enabled again?

Yes, SSL has been enabled throughout all the current installation attempts. Current eaasi.yaml is attached.
eaasi.yaml (1.1 KB)

The ansible script appears to be skipping it?

^[[0;36mskipping: [eaas-gateway] => (item={'src': '', 'dst': '/eaasi//certificates/server.crt'})  => changed=false ^[[0m
^[[0;36m  ansible_loop_var: item^[[0m
^[[0;36m  item:^[[0m
^[[0;36m    dst: /eaasi//certificates/server.crt^[[0m
^[[0;36m    src: ''^[[0m
^[[0;36m  skip_reason: Conditional result was False^[[0m
^[[0;36mskipping: [eaas-gateway] => (item={'src': '', 'dst': '/eaasi//certificates/server.key'})  => changed=false ^[[0m
^[[0;36m  ansible_loop_var: item^[[0m
^[[0;36m  item:^[[0m
^[[0;36m    dst: /eaasi//certificates/server.key^[[0m
^[[0;36m    src: ''^[[0m
^[[0;36m  skip_reason: Conditional result was False^[[0m```

I realized that the ssl settings were incorrectly nested in the eaasi.yaml I sent above. I now have https access to the server. However, the initial failure to call
http://eaasi.csuci.edu/emil/environment-repository/actions/prepare
is still occurring. I don’t know if I should be concerned with that?

Yes, nesting is significant in YAML and the installer just used defaults (SSL disabled). Currently, backend is also configured either with SSL enabled or disabled, listening on either port 80 or 443. I’ll check whether we properly follow redirects from http to https in that failing prepare call.

But, could you just check manually again like here?