DISQUS

DISQUS Hello! Thinking inside a bigger box is using DISQUS, a powerful comment system, to manage its comments. Learn more.

Community Page

Jump to original thread »
Author

Verbose logging will disturb your sleep

Started by jhannes · 8 months ago

When I encounter a system that has already been in development or production for a while, the first sign of real trouble is always a dirty log. You know what I’m talking about. When clicking a single link on a normal flow on a web page results in a deluge of messages in the only [... ... Continue reading »

12 comments

  • The other side of the issue may very well be that your actual approach to viewing the log is flawed. I definitely see your point, but I want to raise it with a "don't tail -f logfiles".

    The problem with tailing logfiles is that your context is wrong, you're spend braintime trying to find that particular logging statement (at least if you're trying to figure out the symptoms of an issue), instead of trying to figure out what actually is wrong when all the cogs run together.
    Logging to a system that allows you to filter and match timestamps against other systems' logs seems like a much nicer way of spending "braintime". I keep hearing splunk.com is good, and there's a few other solutions whose names escape me right now.

    Granted, I haven't used such an approach too much, but frankly I'm just tired of shifting and grepping through all those log files. I'd much rather be in a situation where I could log more and filter more afterwards/meanwhile, than sacrificing (useful) log output in order to ease my what-feels-like-stone-age approach to logging (tail + grep/sed).
  • From Rails, the world looks very different. The production.log is useless for tailing. This is basically debug-logging. I would like logging at info and error levels as well (to different files).

    I *do* tail-f logfiles with great success. But not in Rails. :-)
  • Doesn't matter if it's rails, postgres, tomcat or apache, it's the idea of actually doing the mundane task of sitting there, watching all those little things scroll by, while possibly missing the bigger picture; the overall system components interacting.

    But sure, it all depends on what exactly you're looking for in those logs...
  • My whole point is that logs scrolling by with lots of little things you don't care about (as your only log) is an anti-pattern whether you watch those logs or not.
  • :)

    And my point is that actually sitting there watching those logs feels like an anti-pattern in the first place (information overload is just the symptom/consequence of that), when you really should be collecting and filtering that data from a bigger perspective in order to find correlations and causes, just like you would with any other system related data (performance graphs over time being a good example).
  • I'm not watching the logs. I'm having them sent via mail. That is: if they only contain pertinent info.

    I think we agree that sed/grep etc on logs is not good. I would add that if I want stats, I don't think the debug-logs in where I'd get it.
  • My problem is that I have many different kind of errors. Since I'm lucky enough to live in a SOA world, I'm using 10 different service providers that each have a handful of hickups per day, and therefore my log-file has about 25-50 new lines each day (still small enough to manage with tail, I think).

    What I do today is manually count number of errors from each service and store them in an excel sheet. Over time I judge which error is critical enough to do something about. Since we started doing this we've gone from 500 log-errors/day to 50/day.

    It would be nice to have some better reporting based on exception type and cron-jobs that could fill in the excel sheet for me, but cost/value (I'm a lousy scripter and our exception structure is kind of chaotic) still makes excel/manual counting the best choice. What you think I should do?
  • Very cool case. Another nail in the SOA coffin. ;-)

    What is the consequence of one such hiccup? If there is some chance that manual intervention is needed, I would log this as ERROR. If there is a sufficient compensation (a retry algorithm, say), I would log this as INFO, and log a permanent failure as ERROR. ERROR messages "wake me up in the middle of the night".

    If it's in between (e.g. the user got an error message, but no manual intervention is needed) I would log as WARN and collect errors like you do. Knowing me, I would perhaps even keep track of this in the application database, but that's probably gold plating. WARN messages don't wake me up in the middle of the night.
  • Good idea. I'll try to get something like that in. Currently our use of WARN is more or less the same as DEBUG :)

    New record yesterday, only 17 errors in the log! \o/
  • I'm starting to wonder: Is it just me, or are we all neglecting the "SEVERE/FATAL" error category. Should we use it more?
  • Well, it is an alternative for us to use SEVERE for waking us up in the night. I feel FATAL is like the last thing that might get logged before a java.lang.Error gets thrown and the JVM dies.

    All in all it's very important that every team decides how they are going to use the log levels, and follow these conventions throughout project and maintenance.
  • Cool. 'Nuff said.

Add New Comment

Returning? Login