Hello,
yesterday I was affected by a power outrage will deploying a machine with propellor. Once restarted, I discovered that my git repository containing the propello (aka. ~/.propellor) was corrupted some objects were empty). So I decided to make a clean clone of my central propellor repository.
what I did exactly, is mv ~/.propello ~/.propellor.orig mkdir ~/.propellor cd ~/.propellor && git init cp ~/.propellor.orig/.git/config ~/.propellor/.git/
then I did
git pull --all
but now when I run propellor, I get this error message.
:~$ propellor warning: Your /home/picca/.propellor is out of date.. A newer upstream version is available in /usr/src/propellor/propellor.git To merge it, run: git merge upstream/master
Building propellor-2.7.3... Preprocessing library propellor-2.7.3... In-place registering propellor-2.7.3... Preprocessing executable 'propellor' for propellor-2.7.3... Preprocessing executable 'propellor-config' for propellor-2.7.3... Propellor build ... done
Pull from central git repository ... done git branch origin/master gpg signature verified; merging Already up-to-date. Building propellor-2.7.3... Preprocessing library propellor-2.7.3... In-place registering propellor-2.7.3... Preprocessing executable 'propellor' for propellor-2.7.3... Preprocessing executable 'propellor-config' for propellor-2.7.3... Propellor build ... done
Une phrase secrète est nécessaire pour déverrouiller la clef secrète de l'utilisateur : « Picca Frédéric-Emmanuel picca@debian.org » clef RSA de 4096 bits, identifiant 4696E015, créée le 2011-02-14
[master dc8fbd3] propellor spin Git commit ... done Décompte des objets: 1, fait. Écriture des objets: 100% (1/1), 862 bytes | 0 bytes/s, fait. Total 1 (delta 0), reused 0 (delta 0) To git+ssh://xxxxxxxx/propellor 8b1647f..dc8fbd3 master -> master Push to central git repository ... done Shared connection to xxxxx closed.
I would like to know if you think this could be problem in propellor (I do not know all the magic involved in the deployment process of propellor).
thanks
Propellor uses a shared ssh connection to the remote host to avoid the overhead of multiple ssh connections. ssh will sometimes say "shared connection ... closed" when taking down such a connection.
Propellor only reuses such a shared connection for up to 10 minutes; it it finds an old one (perhaps from a previous run of propellor), it will ask ssh to close the old connection.
I don't think it's anything to worry about unless propellor is failing to work for some reason.
Ok I understand but in my case, the connection was closed when propellor should have connect as root to the host and execute the spin action.
So it only push to the central repository (ok) but did not proceed to the host installation.
Well, I'd expect propellor would exit nonzero if it fails to connect to the host to spin it. Did this not happen?
[2015-09-17 08:57:49 CEST] call: ssh ["-o","ControlPath=/home/picca/.ssh/propellor/xxxx.sock","-o","ControlMaster=auto","-o","ControlPersist=yes","-t","root@xxx","sh -c 'cd /usr/local/propellor && ./propellor --continue '\"'\"'SimpleRun \"xxx\"'\"'\"''"] Shared connection to xxxxx closed. picca@ORD03037:~/.propellor$ echo $? 0
so it seems thaht there is no error but the share connection was closed before running propellor on the host.
So here's the code that runs that ssh command:
I'm surprised it didn't fail with the error. This seems to say that ssh exited 0, but without running the command.
Also, ssh seems to have decided to take down the shared connection of its own accord, which seems strange. Normally it should leave the shared connection open.
If you're able to reproduce this reliably, look into whether making
sshCachingParams
return [] and thus get rid of this ssh connection caching somehow avoids the problem?I did the test without the cache system and it was not better (same error message). Then I removed /usr/local/propellor and run propellor from scratch-> it worked
So now I have a working propellor directory and a non working one. I checked that putting back the old /usr/local/propellor cause the same trouble.
If you want I can send you in private both version of the directory.
Ah, that makes much more sense; rather than a strange ssh problem, propellor is apparently exiting 0 w/o doing anything when run with
./propellor --continue 'SimpleRun "xxx"'
or something close to that.So, this might have to do with the old propellor not supporting SimpleRun, which was added back in 0.4.0.
Or, more likely, it's broken in some way that makes it not do anything when asked to so a SimpleRun for a particular host.
You can probably try running the old propellor with that SimpleRun parameter and the command line and get a better feel for what it's doing, and if desired, bisect or otherwise instrument the program to see why it behaved this way.
Good root@xxxx:/usr/local/propellor.good/dist/build/propellor-config# ls -l -rwxr-xr-x 1 root root 6359256 sept. 23 09:34 propellor-config drwxr-xr-x 4 root root 4096 sept. 23 09:34 propellor-config-tmp
Bad root@xxxx:/usr/local/propellor.bad/dist/build/propellor-config# ls -l total 4 -rwxr-xr-x 1 root root 0 sept. 14 14:05 propellor-config drwxr-xr-x 4 root root 4096 sept. 14 14:05 propellor-config-tmp
ok so it seems that propellor-config is empty ??? for the bad version.
Now I understand better why nothing happened So maybe the power outrage was done during the compilation of the propellor-config executable. Maybe something can be done in order to avoid producing this empty file. Maybe just keep the old version of propellor-config until the new propellor-config is ready to replace it.
Ah, I remember seeing this once myself. ghc and cabal don't get the binary updated atomically.
I've got a fix on the joeyconfig branch, it makes sure the binary is built and then atomically updates a copy that's used to run propellor.