This issue first occurred when I tried following the EaaSI user guide to install a FreeDOS 1.2 resource. However, it also happens with both Windows environments I’ve tested. I think the issue may be related to me manually importing qemu before saving the environments to my local node.
The environments that do not boot are:
Windows 95 Test Env
Windows XP Professional 2002 SP3 32 Bit + Microsoft Office Home and Student 6994
Are you able to reproduce the issue or did it happen once? What steps can you take to repeat the issue? What did you expect to occur and what was the actual outcome?
This happens whenever I try to boot either environment. I know the Windows 95 Test Env was previously working on my earlier installation.
Urgency:If possible, please give an indication of how urgently the issue needs to be addressed - is there a timeline or deadline (e.g. upcoming demo, researcher request, etc.) that EaaSI support staff should be aware of?
I’m working on a summer research project with EaaSI and trying to get it functioning for student evaluative use. I can try a fresh installation again if needed.
Hey @ekaltman - the process of saving environments to your node should copy over any necessary QEMU container automatically, if that Public environment is configured to run a particular emulator/version that you don’t yet have in your node. And it seems like QEMU is running here (we’ve seen issues with the emulators not copying over properly in the past and in such cases when trying to run the copied environment you get nothing in the Access interface at all and probably an error thrown by the UI) - it just can’t find the bootable device (system drive), so something may have gone wrong in copying over the underlying QCOW images of the environment.
We could confirm that by taking QEMU/the emulators off the table - does a SheepShaver or Basilisk-based Mac environment work? E.g. “Apple Mac OS 7.5”?
Either way - could you also try saving a different PC/QEMU-based environment to your node and then immediately download a server log from Manage Node → Troubleshooting → Download Server Log and provide?
I downloaded “Mac OS 7.5 + Ready Set GO 4.5a” and it failed to load.
When I try to download the server.log file through the Web UI nothing happens, it goes to the error reporting URL and hangs. It does correctly return the front-end logs however.
I’m attaching the current server.log file as well. I initiated a save operation for “Windows 98 SE + Microsoft Works 8” around two hours ago that has still not finished. I can send another update early tomorrow once I’m back in the office.
@ekaltman, according to the provided log, all of your attempts to import and replicate environments were successful. Your specific “Windows 98” image is about 20GB in size, so depending on your internet connection it might take longer to import. Please also make sure, that you have enough storage space available.
But, starting emulation sessions with imported images indeed fail. The following output might contain more information:
$ sudo docker logs nginx > nginx.log
Generating error-reports fails because of file permission problems, which should not happen. Could you please post the output of the following command here too:
@oooleg it appears that the permissions for some of the tree are not owned by the eaasi user. I’m going to try changing the permissions to eaasi to see if that fixes anything. I guess let me know if the logs show any other potential issues.
@ekaltman, what is the output of the following command executed on your server:
$ sudo id eaasi ekaltman
It looks like some of the data was created under different users. We normally do not change filesystem ownership during installation. But it might also be caused by a mismatch of host vs. container users, under which the backend is run inside of the containers.
OK, thanks. This looks like a user-id mismatch. Backend is always running under user 1000 in the container, which maps to ekaltman user on the host. Then, processes inside of the container running under user 1000 fail to read some files owned by 1002 (eaasi) user on the host. That is the reason why generating error-reports fails.
If possible, could you try to install EaaSI as ekaltman user? It should be enough to run:
Currently, replication of environments is automatically aborted after a timeout of 1 hour. According to the log, importing “Windows 98” images seems to take longer on your server and you are running into that timeout there.
I have increased the timeout for you to try. Just change your eaasi.yaml to:
Any environment I attempt to download claims to have finished and “Saved Locally” but still has the “Run in Emulator” faded. When I go into details I can try to run the emulation but for Qemu is just boots to a “no bootable device” screen. Mac environments immediately fail before boot, which seems similar to the CMU issue.
There is definitely some issue with downloading the environments correctly, don’t know if that is on my end or the eaasi system end however. @ethan.gates I will probably try just installing and configuring environments locally, I’ll look out for office hours.
At this point, I don’t really know what else to try, if we want to schedule some live debugging let me know.
I ran a tail on the server.log when I try to initialize the Windows 98 run-time. It appears to not be finding the correct data. eaasi_environment_download_error_20220616.log (40.6 KB)
The Windows 98 image is downloaded to the server, as are the other images, so I’m not sure what the issue is. I’m doing a fresh reinstallation on a fresh VM to see if there are other issues.
Hi @oooleg, yes, the environments were downloading but they loading into a “no bootable device” qemu screen. I tried to wipe the server and reinstall everything again but am now encountering a new issue. The setup fails again at:
TASK [wait for eaas-server to start up] ***************************************************************************************************************************************************************************************************
fatal: [eaas-gateway]: FAILED! => changed=false
attempts: 1
content: ''
elapsed: 0
msg: 'Status code was -1 and not [200]: Request failed: <urlopen error [Errno -2] Name or service not known>'
redirected: false
status: -1
url: https://eaasi.csuci.edu/emil/environment-repository/actions/prepare
However, last time the EaaSI system was still reachable, it appears that now the nginx docker container (not eaasi-nginx) is failing to start. From docker ps:
ekaltman@eaasi:/$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
321719e6fd07 nginx:stable "/docker-entrypoint.…" 5 minutes ago Restarting (1) 26 seconds ago eaasi-nginx
da9ff1d72056 eaas/eaas-appserver:v2021.10-eaasi "/init" 5 minutes ago Up 5 minutes eaas
0fab64605bac registry.gitlab.com/eaasi/eaasi-client-pub/eaasi-web-api:v2021.10 "docker-entrypoint.s…" 5 minutes ago Up 5 minutes 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp eaasi-web-api
5dfa2a6059b7 jboss/keycloak:15.0.2 "/opt/jboss/tools/do…" 5 minutes ago Up 5 minutes 8080/tcp, 8443/tcp keycloak
1c29c3068b13 registry.gitlab.com/eaasi/eaasi-client-pub/eaasi-front-end:v2021.10 "/docker-entrypoint.…" 5 minutes ago Up 5 minutes 0.0.0.0:8080->80/tcp, :::8080->80/tcp eaasi-front-end
0fe466dd0a0c registry.gitlab.com/eaasi/eaasi-client-pub/eaasi-database:v2021.10 "docker-entrypoint.s…" 5 minutes ago Up 5 minutes 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp eaasi-database
638e6d1c543c minio/minio:RELEASE.2021-11-03T03-36-36Z "/usr/bin/docker-ent…" 5 minutes ago Up 5 minutes 9000/tcp minio
I can start a new ticket for this error if you like. docker logs nginx produces:
2022/06/17 19:03:09 [emerg] 1#1: unknown directive "js_include" in /etc/nginx/nginx.conf:20
nginx: [emerg] unknown directive "js_include" in /etc/nginx/nginx.conf:20
Actually, images seem to be correctly replicated but are then not found when setting up an emulation session. I’m not yet sure what is the root cause for this in your case, especially if the server is constantly tweaked and modified
Is there any chance to get ssh-access to your machine? Or, alternatively, we could schedule a zoom call and share screen? That might be simpler to find the root cause for your issues.
Okay, that fixed the nginx issue and I appear to have access to the interface again. Just a note that the initial pull did not work:
➜ eaasi-installer git:(a2d2e9e) cd eaas/ansible
➜ ansible git:(60885d4) git fetch origin
➜ ansible git:(60885d4) git checkout origin master
error: pathspec 'master' did not match any file(s) known to git
So I just ignored the origin for both commands and the pull worked.
@oooleg the ssh access is currently limited to our VPN, I could open up the ssh access publicly but I think a screenshare might be the most effective. I’m free for most of this afternoon until 4pm PT and again on Monday or Tuesday for middle of the day times (10am-3pm PT).
And just an edit to say thank you for working through this with me!