splunkd, Y U NO FOREGROUND?!?
Posted: Mon, 15 April 2013 | permalink | 8 Comments
I am led to believe that splunkd (some agent for feeding log entries into the Grand Log Analysis Tool Of Our Age™) has no capability for running itself in the foreground. This is stupid. Do not make these sorts of assumptions about how the user will want to run your software. Some people use sane service-management systems that are capable of handling the daemonisation for you and automatically restart the managed process on crash. These systems are typically much easier to configure and debug, and they don’t need bloody PID files and the arguments about where to put them (tmpfs, inside or outside chroots… oh my) and who should update them and how to reliably detect that they’re out of date when they crash without causing race conditions and whether non-root-running processes should place their PID files in the same place and how do you deal with the permissions issues and… bugger that for a game of skittles.
In short, if you provide a service daemon and do not provide some well-documented means of saying “don’t background”, I will hurt you. This goes double if your shitware is not open source.
8 Comments
From: Brian Stengaard
2013-04-15 18:01
Which sane service-management system? The description you gave doesn’t seem to describe start-stop-daemon, initctl or monit - and we are looking for something quite a lot saner than monit (the fact that start/stop is asynchronous is killing us).
/Brian
From: amrit
2013-04-16 02:55
Try: splunk start splunkd –nodaemon
There are other methods as well, but this is the most straight forward/least error prone.
From: Matt Palmer
2013-04-16 07:36
Brian: We’re using daemontools. I’ve used monit heavily in the past, and it convinced me that the whole concept of trying to use PID files and backgrounded daemons was an inherently racy and buggy approach to the problem. So I classify anything that works that way under the heading of “insane”.
Amrit: Oh wow, dude, if that works, you’re a lifesaver. I’m assuming this is a relatively new feature? All we could find with appropriate search terms were posts from people saying “I asked Splunk support and they said it couldn’t be done”, so we didn’t spend a huge amount of time trying to dig into the problem.
From: amrit
2013-04-16 07:55
Matt: It’s basically been there forever and is something that should generally be known, as it is occasionally used when we’re trying to do advanced debugging.
I’m not sure how the information failed to make its way to the appropriate places, but I at least found the Splunk Answers question on the topic and answered there as well.
Sorry it was so difficult to find this info… If you’d managed to make your way to the IRC channel (EFnet/#splunk), I’m sure a few people there would have been able to help out. :)
From: mobinmob
2013-04-17 03:57
For daemons that don’t provide a way tο run on the foreground but can be given a pidfile to use, pidsig is nice. https://github.com/chexum/pidsig It amazes me how many people still use daemontools. They work (TM), but they are much more advanced alternatives. Ι use perp: http://b0llix.net/perp/
From: Matt Palmer
2013-04-17 07:58
Amrit: Thanks for the pointer to the IRC channel; I’ll keep it in mind if we have any more annoyances.
mobinmob: For daemons that don’t provide a way to run in the foreground, I prefer to use a lead pipe, but thanks for the pointer to pidsig. I’ll add it to the list of possibilities for the future. We use daemontools because it is just about too simple to fail. For me, the word “advanced” is code for “complicated”, “complicated” begats “buggy”, “buggy” begats “frustration”, and “frustration” begats “insanity”. I have precious little sanity to spare.
From: mobinmob
2013-04-17 16:07
Perp is an offspring of daemontools (like runit,s6 and daemontools-encore). It is dead simple and IMHO a significant improvement over its ancestor.
From: sadig
2013-04-18 21:49
Matt,
splunkd (I think you wrote about the forwarder part of it, like universeforwarder) is totally broken anyhow.
start the forwarder kill -9
the leftover is the pidfile
then wait some time, until some other process is using the very same pids in the pidfile start splunkforwarder and wonder why the other process died
strace this forwarder stuff, and you’ll see it reads the pidfile, tries to kill -0 if the process is still running, and then trying to do kill -1 (which is somewhat shutdown gracefully)
after 10 or 15 times trying that and not succeeding it will just silently kill -9
Oh wow, not even a check on the processname just kill -9
took me 3 hours of debugging this one
Awesome…we filed an ticket already.
Post a comment
All comments are held for moderation; markdown formatting accepted.