For some reason (and I'm not sure exactly when this started), when I propellor --spin a host when the control master socket does not yet exist, none of the output from the remote host comes back. The remote run works fine, I just have to ^C the local propellor once I see that the remote run is done (by watching top on the remote host or something). If the socket does already exist (eg. spinning again immediately), then everything is fine.
I assume this is some issue with my local SSH version or config, but I have no clue what. Anyone have any ideas?
I think I've been seeing this too, recently.
I had not put together that it involves the ssh control socket. And am not 100% sure it does yet.
It also seemed to affect the first spin and not the second one when I was seeing it. But that was 1-2 weeks ago, and I am not currently reproducing the issue.
If you can reproduce it consistently, it would be good to check if the concurrent output layer, which involves intercepting all command output and serializing it, might be involved. If you edit
src/Utility/Process/Shim.hs
and make it simplyimport System.Process as X
and remove the other import, that will bypass the concurrent output layer.Added some debugging, I found that processes run by concurrent-output tend to alternate between running foreground and background. So, when the socket exists and is old, it will run one more process than otherwise to stop ssh on that socket, and this will change which run method is used for subsequent processes.
However, it really shouldn't matter if a process starts in the background; concurrent-output shoud notice when the output lock frees up, and start displaying the processes's output.
So, this theory explains why the ssh socket seems to be involved, perhaps, but it doesn't really explain what's happening to prevent the remote propellor output from being shown.
Unless some other foreground process is hanging around and keeping the output lock. Or some bug in concurrent-output..
Managed to reproduce the hang once, and the ssh was indeed being run as bgProcess. However, I didn't manage to see what foreground process, if any, was running when that happened. And had no luck reproducing it again.
I added some more
PROPELLOR_DEBUG
output around this, so it'll tell when a process is being run by fgProcess or bgProcess.I don't see how that could be relevant, but if you want to, edit src/Propellor/Ssh.hs and make sshCachingParams return [] and see if that changes it.
I modified spin', adding this before its final ssh:
This more or less replicates the problem reliably; the remote propellor runs but nothing gets displayed for 500 seconds until the sleep process is done. At which point the whole buffered output appears. Use cat instead and it'll hang forever.
Of course, that means
ps fax
shows propellor with sleep and ssh as child processes. If only ssh shows as a child process and nothing else when the problem naturally occurs, then that's a different problem than what I was able to replicate.Anyway, this seems too fragile to leave like this even though nothing run on the way to a ssh should run for very long. So, I'm making the ssh be run forced to the foreground, which will certianly avoid all such problems.