In a recent project, the project logs were saved in JSON format for easy analysis. Previously, the logs were stored directly in files, but MongoDB broke into my sight at the right time, so I saved the logs in MongoDB. It is meaningless to just save logs. The most important thing is to discover business trends and system performance loopholes from logs. Previously there was an analysis module written in Java and running under Tomcat. The implementation is quite heavyweight, the process of adding a new indicator is also cumbersome, and analysis fails due to NFS. I've always wanted to rewrite it, and initially wanted to use Ruby On Rails, but I never had the time to learn and develop (I'm looking for excuses!). I met Node.js again at QCon 2011 in Hangzhou. Although I had heard of it before, I didn't study it in depth. After listening to Taobao Su Qian's speech, I immediately had the idea of using Node.js to implement this log analysis system. The front-end uses JS, the server uses JS, and even the database shell is JS. It’s cool when you think about it—of course the most important thing is that the code size is small.
1. Use Node.js to implement server-side code
In order to have good style and fast code writing, it is inevitable to adopt a simple framework. Express implements most of the functions, but it takes some time to get familiar with it, and it seems a bit heavyweight for this project. There is a Chat Demo on the official website of Node.js. This code is simply moved and encapsulates the processing of URLs and the return of JSON. So I used fu.js directly and rewrote server.js:
var fu = require("./fu"),
sys = require("util"),
url = require("url"),
mongo = require("./request_handler");
fu.listen(Number(process.env.PORT || PORT), HOST);
fu.get("/", fu.staticHandler("index.html"));
Isn’t it too simple? ! But it is indeed the case, a server has been established.
Let’s look at the request_handler.js code that handles requests:
// TOP 10 user Action
fu.get("/userActionTop10", function(req, res){
mongodb.connect('mongodb://localhost:27017/log', function(err, conn){
conn.collection('action_count', function(err, coll){
coll.find({"value.action":{$in:user_action}}).sort({"value.count":-1}).limit(10).toArray(function(err, docs){
if(!err){
var action = [];
var count = [];
for(var i = 0; i < docs.length; i ){
//console.log(docs[i]);
action.push(docs[i].value.action);
Count.push(docs[i].value.count);
}
res.simpleJSON(200, {action:action, count:count});
//Be sure to remember to close the database connection
conn.close();
}
});
});
});
});
2. Client
The most important thing about the log system is the visual display. A plug-in of JQuery jqPlot Chart is used here. First use a static HTML page as a container for graphic display:
几乎是jqPlot的示例中的完整拷贝,好吧,我承认我太懒了。
下面是看用来显示生成图形的chart.js:
/****************************** TOP 10 User Action Start *********************************/
document.write('
var drawUserActionTop10Chart = function(){
if(!$("#userActionTop10Chart").attr('class')){
$("#userActionTop10Chart").attr('class', 'small_chart');
}
$.ajax({
async:false,
url: '/userActionTop10',
dataType:'json',
cache: false,
success:function(data){
try{
$('#userActionTop10Chart').html('');
$.jqplot('userActionTop10Chart', [data.count], {
title: "TOP 10 User Action",
seriesDefaults:{
renderer:$.jqplot.BarRenderer,
rendererOptions: {fillToZero: true},
pointLabels: {
show:true,
ypadding:1
}
},
axesDefaults:{
tickRenderer:$.jqplot.CanvasAxisTickRenderer,
tickOptions: {
angle: -30,
fontSize: '12px'
}
},
axes: {
xaxis: {
renderer: $.jqplot.CategoryAxisRenderer,
ticks: data.action
},
yaxis: {
pad: 1.05
}
}
});
}catch(e){
//alert(e.message);
}
}
});
}
draws.push('drawUserActionTop10Chart');
/******************************* TOP 10 User Action End ************************************/
/*********** Chart Start *****************/
//Put your chart drawing function here
//1. insert a div for the chart
//2. implement the function drawing chart
//3. push the function name into the array draws
/*********** Chart End *******************/
// Draw all charts
var drawAllCharts = function(){
for(var i = 0; i < draws.length; i ){
eval(draws[i] "()");
}
//Recall itself in 5 minute.
window.setTimeout(drawAllCharts, 5 * 60 * 1000);
}
//
$(function(){
drawAllCharts();
});
服务器端和客户端的代码都有了,那就跑起来看效果吧:
好像忘了什么?日志的分析代码。
三、使用MongoDB 增量式MapReduce实现日志分析
在MongoDB的文档中有关于Incremental MapReduce的介绍。刚开始一直以为MongoDB实现Streaming处理,可以自动执行增量式的MapReduce。最后发现原来是我理解有误,文档里并没有写这一点,只是说明了如何设置才能增量执行MapReduce。
为了方便,我把MapReduce使用MongoDB的JavaScript写在了单独的js文件中,然后通过crontab定时执行。stats.js的代码:
var action_count_reduce = function(key, values){
var count = 0;
values.forEach(function(value){
Count = value.count;
});
Return {action:key, count : count};
}
db.log.mapReduce(action_count_map, action_count_reduce, {query : {'action_count' : {$ne:1}},out: {reduce:'action_count'}});
db.log.update({'action_count':{$ne:1}}, {$set:{'action_count':1}}, false, true);
The idea is very simple:
1. Set the number of accesses for each action in the map to 1
2. In reduce, count the number of visits to the same action
3. Execute mapReduce. The query is specified as 'action_count' is not equal to 1, that is, the statistics have not been executed; the results are stored in the 'action_count' collection, and the reduce option is used to indicate that the result set is used as the input of the next reduce.
4. Set the value of 'action_count' to 1 in all current log records, indicating that the statistics have been performed. I wonder if this will cause records that have not yet been counted to be updated? ? I hope experienced heroes can give me some advice!
Scheduled execution of stats.js shell:
Okay, this is all the code, there is nothing particularly mysterious, but Node.js is really a good thing.