NAME
ControlFreak Tutorial
INSTALLATION
ControlFreak should work on all unixes and maybe more, only Mac OS X and Linux have been tested.
Requirements
Notable ControlFreak requirements are:
AnyEvent
(recent versions 5.202+)
EV (libev interface)
Though it's not stricly necessary, this is really recommended, ControlFreak hasn't been tested thoroughly with other Event loops.
Log4perl
This is the logging backend of ControlFreak.
JSON::XS
Object::Tiny
Try::Tiny
Params::Util
cpanm
Other instructions will come later, when ControlFreak will be on CPAN.
# install cpanm (see App::cpanminus documentation for details)
cd ~/bin
wget http://xrl.us/cpanm
chmod +x cpan
## install cpan version
cpanm ControlFreak
## install bleeding edge
cpanm http://github.com/yannk/ControlFreak/tarball/master
BASICS
When ControlFreak daemon cfkd
is started, it opens a management socket that allows operators and programs to sennd instructions to the daemon.
The daemon duty is to fork and exec services, making sure that they are running or stopped according to commands received. cfkd
is also configured with logging capabilities, in such way that STDOUT and STDERR of the services it has the responsibities of, aren't lost.
SIMPLE EXAMPLE
Start cfkd with a config file and use cfkctl
# run in the background:
$ cfkd -d
# alternatively, run cfkd in it's own shell/term in the foreground:
$ cfkd
You can know use cfkctl
to inspect cfkd
status. This control script connects by default to a unix socket at /tmp/cfkd.sock
.
$ perl cfkctl status
# nothing! Expectingly there is no service declared in cfkd
# declare a new svc1 service with a 'sleep 100' as the command
$ cfkctl load - <<END
service svc1 cmd=sleep 100
END
$ perl cfkctk status
stopped svc1
# let's start the service we have
$ cfkctl start svc1
$ cfkctl status
running svc1 2 seconds ago (Wed Nov 11 16:16:09 2009)
$ cfkctl stop svc1
$ cfkctl status
stopped svc1 3 seconds ago (Wed Nov 11 16:17:07 2009)
How it works
Let's connect directly to the management port, it will give you a glimpse of the internals
$ socat readline unix:/tmp/cfkd.sock
(Note that you can configure a tcp port instead so that you can telnet into it)
Now type "command status
", the server will respond with OK or ERROR (in the rest of this extract the line after OK or ERROR is a line we type in the telnet/socat session).
command status
svc1 stopped 1257985027
OK
The management port takes the input stream of your commands as litteral configurations. There is no difference if you were typing this in the previous config file. So let's declare a new service svc2
:
service svc2 cmd=sleep 10
OK
command start service svc2
done 1
OK
command status
svc2 starting 1257985558
svc1 stopped 1257985027
OK
If you wait a little (the time for sleep 10 to complete) you'll see:
command status
svc2 stopped 1257985567
svc1 stopped 1257985027
OK
Both services have now completed their task. Of course there are options to make a service restart automatically once it finishes. But note that if a service exists abnormally it is restarted unless you specify otherwise. (See rest of documentation for all the options of services lifecycle management).
When a service dies or exit abnormally
$ cfkctl up svc1 # make sure svc1 is up
# kill it!
$ kill -9 `cfkctl pid svc1`
$ cfkctl status
running svc1 2 seconds ago (Wed Nov 11 16:45:43 2009
As you can see the service is running for 2 seconds. It obviously has been restarted.
Logging
By default ControlFreak creates a $HOME/.controlfreak directory in which it logs the main events that happened:
$ tail ~/.controlfreak/cfkd.log
- INFO 947 ControlFreak.Logger - child svc1 exited
- INFO 75 ControlFreak.Logger - new connection to admin from unix/:/Users/yann/.controlfreak/sock
- INFO 846 ControlFreak.Logger - starting svc1
- INFO 114 ControlFreak.Logger - Console exiting
- INFO 75 ControlFreak.Logger - new connection to admin from unix/:/Users/yann/.controlfreak/sock
- INFO 114 ControlFreak.Logger - Console exiting
- ERROR 966 ControlFreak.Logger - child terminated abnormally 9: Received signal 9
- INFO 846 ControlFreak.Logger - starting svc1
The fact that svc1 was abruptly killed was logged as well as the information that ControlFreak restarted it (behaviour you can alter via config, of course).
# default config file
$ cat ~/.controlfreak/log.config
log4perl.rootLogger=INFO, ALL
log4perl.appender.ALL=Log::Log4perl::Appender::File
log4perl.appender.ALL.filename=sub { $ENV{CFKD_HOME} . "/cfkd.log" }
log4perl.appender.ALL.mode=append
log4perl.appender.ALL.layout=PatternLayout
# %S = service pid
log4perl.appender.ALL.layout.ConversionPattern=%S %p %L %c - %m%n
You can alter this configuration as much as you want and send USR1 signal to cfkd
to instruct it to reload this configuration and alter the logging behavior.
The pattern layout configuration is better understood if you refer to this page: http://search.cpan.org/~mschilli/Log-Log4perl-1.25/lib/Log/Log4perl/Layout/PatternLayout.pm
Note that %S
is a custom placeholder representing the pid of the service if it exists.
Logging is as flexible as Log4perl allows, which means it's very flexible. Also, you can log STDERR and STDOUT of each services independently (see rest of documentation), so that you never miss something that allows you to better understand why something is not working the way it should.
SHARING SOCKETS (use ControlFreak as a prefork server)
Another strength of ControlFreak is the ability it has to open a local socket and by mean of fork-and-exec, share that socket with multiple services (of the same type most likely).
The classical situation is a bunch of web workers all accepting connections on 0.0.0.0:8080, the kernel efficiently distribute the connections to these workers who don't have to worry about managing this socket at all.
In an environment where you have a lot of web nodes behind a light proxy (like Perlbal or many others) it can greatly simplify the maintenance of your web cluster. You just have to declare in Perlbal's nodefile one node per server. (10.0.0.100:8080, 10.0.0.101:8080, ...) which hides a number or actual workers. Of course you manage the number of active workers using ControlFreak.
(TODO: example)
SHARING MEMORY - Benefit from Unix Copy-On-Write Effect
ControlFreak ships with a Perl proxy only. If you want to share memory for ruby/python you have to implement a proxy in this language (which is not very complicated).
Because we don't want to load a tons of stuff in cfkd
process, and because we want to keep cfkd
very stable anyway, we use an intermediary process: a Proxy, whose job is to transparently manage a bunch of children services as if they were directly under cfkd
control.
(TODO: example)
SECURITY
By default ControlFreak management port creates a unix domain socket, but you can open a TCP socket as well. In both cases, but especially in the later you have to be careful with the security implications:
You don't want to allow anyone to create a svc 0wned cmd="rm -rf /"
service...
Similarly be careful not to expose the log configuration in a way that would allow untrusted users to alter it.
TAGGING SERVICES
The config file and commands issued to the management port are intentionally kept very simple. There is no loop mechanism allowing you to declare: "I want 10 of these web workers". The rationale is that if you really need that you can build it yourself on top of ControlFreak
. To help in the process, and to help managing all these similar services, a tag system is provided.
The idea is very simple, you can attach a number of tags to services, and can refer to services using those tags.
Here is a very simple example:
service web1 cmd=sleep 100
service web1 tags=web,prod
service web2 cmd=sleep 100
service web2 tags=web
service web3 cmd=sleep 100
service web3 tags=web,stage
# the leading '@' refers to services by tag
$ cfkctl start @web
$ cfkctl status @prod
running web1 6 seconds ago (Wed Nov 11 17:30:17 2009)