So this time I had to take a look at what Facebook uses for logging: Scribe.
"Scribe is a server for aggregating log data that's streamed in real
time from clients. It is designed to be scalable and reliable. "
The way it works (or at least the way I configure it) is quite simple: you have a central logging server and a local logging instance in each node. The way it works is roughtly the following:
- You log against your local instance that acts as a proxy delivering log messages to the central server.
- If the central server goes down, your local instance will keep logging messages and saving them to filesystem (configurable).
- Your local server will retry to send logs to the central server from time to time (also configurable)
- When the central server comes back up the local server will send all pending log messages and clean your FS up.
Logs are organized by Category which helps filtering.
Note that the way described the central server is again a bottleneck, it will not scale. That's why you should think about hitting a load balancer instead of the central server.
There's no admin console, no alarms for the server status, nothing. If the central server goes down: you won't notice it. If the local server goes down ... well, most likely your box went down so you won't be logging to that server anyway.
In any case, most likely you'd like to write something to check the status, and in order to to that the class com.facebook.fb303.FacebookService.Client provides a couple of methods to check the server status. It's a generic interface for a Thrift service but will give you the basic information.
Scribe also comes with 2 sample scripts for controlling the servers, you can find them in $SCRIBE_SRC_HOME/examples
If you plan to compile scribe, please use Boost <1.46 version. If you use a newer version Boost, they changed the default filesystem to v3 and scribe uses v2. Workaround: After runing ./bootstrap.sh edit config.status and add -DBOOST_FILESYSTEM_VERSION=2 where all the other params are set (ie: search for -DHAVE_BOOST). I currently have Scribe working with Boost 1.47 and made the client libs with Thrift 0.7.0 .
I wrote a simple MBean to check the status of Scribe servers, and modify/clean up the log4j appender described in this Article. You can take a look at the code here.
Note that the way described the central server is again a bottleneck, it will not scale. That's why you should think about hitting a load balancer instead of the central server.
There's no admin console, no alarms for the server status, nothing. If the central server goes down: you won't notice it. If the local server goes down ... well, most likely your box went down so you won't be logging to that server anyway.
In any case, most likely you'd like to write something to check the status, and in order to to that the class com.facebook.fb303.FacebookService.Client provides a couple of methods to check the server status. It's a generic interface for a Thrift service but will give you the basic information.
Scribe also comes with 2 sample scripts for controlling the servers, you can find them in $SCRIBE_SRC_HOME/examples
If you plan to compile scribe, please use Boost <1.46 version. If you use a newer version Boost, they changed the default filesystem to v3 and scribe uses v2. Workaround: After runing ./bootstrap.sh edit config.status and add -DBOOST_FILESYSTEM_VERSION=2 where all the other params are set (ie: search for -DHAVE_BOOST). I currently have Scribe working with Boost 1.47 and made the client libs with Thrift 0.7.0 .
I wrote a simple MBean to check the status of Scribe servers, and modify/clean up the log4j appender described in this Article. You can take a look at the code here.
No comments:
Post a Comment