-
Website
http://www.brodwall.com/johannes/blog/ -
Original page
http://brodwall.com/johannes/blog/2008/10/28/verbose-logging-will-disturb-your-sleep/ -
Subscribe
All Comments -
Community
-
Top Commenters
-
eirikma
2 comments · 1 points
-
Affordable SEO Services
1 comment · 1 points
-
andreb
1 comment · 1 points
-
thommyb
1 comment · 1 points
-
Andy Palmer
1 comment · 1 points
-
-
Popular Threads
The problem with tailing logfiles is that your context is wrong, you're spend braintime trying to find that particular logging statement (at least if you're trying to figure out the symptoms of an issue), instead of trying to figure out what actually is wrong when all the cogs run together.
Logging to a system that allows you to filter and match timestamps against other systems' logs seems like a much nicer way of spending "braintime". I keep hearing splunk.com is good, and there's a few other solutions whose names escape me right now.
Granted, I haven't used such an approach too much, but frankly I'm just tired of shifting and grepping through all those log files. I'd much rather be in a situation where I could log more and filter more afterwards/meanwhile, than sacrificing (useful) log output in order to ease my what-feels-like-stone-age approach to logging (tail + grep/sed).
I *do* tail-f logfiles with great success. But not in Rails. :-)
But sure, it all depends on what exactly you're looking for in those logs...
And my point is that actually sitting there watching those logs feels like an anti-pattern in the first place (information overload is just the symptom/consequence of that), when you really should be collecting and filtering that data from a bigger perspective in order to find correlations and causes, just like you would with any other system related data (performance graphs over time being a good example).
I think we agree that sed/grep etc on logs is not good. I would add that if I want stats, I don't think the debug-logs in where I'd get it.
What I do today is manually count number of errors from each service and store them in an excel sheet. Over time I judge which error is critical enough to do something about. Since we started doing this we've gone from 500 log-errors/day to 50/day.
It would be nice to have some better reporting based on exception type and cron-jobs that could fill in the excel sheet for me, but cost/value (I'm a lousy scripter and our exception structure is kind of chaotic) still makes excel/manual counting the best choice. What you think I should do?
What is the consequence of one such hiccup? If there is some chance that manual intervention is needed, I would log this as ERROR. If there is a sufficient compensation (a retry algorithm, say), I would log this as INFO, and log a permanent failure as ERROR. ERROR messages "wake me up in the middle of the night".
If it's in between (e.g. the user got an error message, but no manual intervention is needed) I would log as WARN and collect errors like you do. Knowing me, I would perhaps even keep track of this in the application database, but that's probably gold plating. WARN messages don't wake me up in the middle of the night.
New record yesterday, only 17 errors in the log! \o/
All in all it's very important that every team decides how they are going to use the log levels, and follow these conventions throughout project and maintenance.