DISQUS

Thinking inside a bigger box: Verbose logging will disturb your sleep

  • Johan Sørensen · 1 year ago
    The other side of the issue may very well be that your actual approach to viewing the log is flawed. I definitely see your point, but I want to raise it with a "don't tail -f logfiles".

    The problem with tailing logfiles is that your context is wrong, you're spend braintime trying to find that particular logging statement (at least if you're trying to figure out the symptoms of an issue), instead of trying to figure out what actually is wrong when all the cogs run together.
    Logging to a system that allows you to filter and match timestamps against other systems' logs seems like a much nicer way of spending "braintime". I keep hearing splunk.com is good, and there's a few other solutions whose names escape me right now.

    Granted, I haven't used such an approach too much, but frankly I'm just tired of shifting and grepping through all those log files. I'd much rather be in a situation where I could log more and filter more afterwards/meanwhile, than sacrificing (useful) log output in order to ease my what-feels-like-stone-age approach to logging (tail + grep/sed).
  • jhannes · 1 year ago
    From Rails, the world looks very different. The production.log is useless for tailing. This is basically debug-logging. I would like logging at info and error levels as well (to different files).

    I *do* tail-f logfiles with great success. But not in Rails. :-)
  • Johan Sørensen · 1 year ago
    Doesn't matter if it's rails, postgres, tomcat or apache, it's the idea of actually doing the mundane task of sitting there, watching all those little things scroll by, while possibly missing the bigger picture; the overall system components interacting.

    But sure, it all depends on what exactly you're looking for in those logs...
  • jhannes · 1 year ago
    My whole point is that logs scrolling by with lots of little things you don't care about (as your only log) is an anti-pattern whether you watch those logs or not.
  • Johan Sørensen · 1 year ago
    :)

    And my point is that actually sitting there watching those logs feels like an anti-pattern in the first place (information overload is just the symptom/consequence of that), when you really should be collecting and filtering that data from a bigger perspective in order to find correlations and causes, just like you would with any other system related data (performance graphs over time being a good example).
  • jhannes · 1 year ago
    I'm not watching the logs. I'm having them sent via mail. That is: if they only contain pertinent info.

    I think we agree that sed/grep etc on logs is not good. I would add that if I want stats, I don't think the debug-logs in where I'd get it.
  • tfnico · 1 year ago
    My problem is that I have many different kind of errors. Since I'm lucky enough to live in a SOA world, I'm using 10 different service providers that each have a handful of hickups per day, and therefore my log-file has about 25-50 new lines each day (still small enough to manage with tail, I think).

    What I do today is manually count number of errors from each service and store them in an excel sheet. Over time I judge which error is critical enough to do something about. Since we started doing this we've gone from 500 log-errors/day to 50/day.

    It would be nice to have some better reporting based on exception type and cron-jobs that could fill in the excel sheet for me, but cost/value (I'm a lousy scripter and our exception structure is kind of chaotic) still makes excel/manual counting the best choice. What you think I should do?
  • jhannes · 1 year ago
    Very cool case. Another nail in the SOA coffin. ;-)

    What is the consequence of one such hiccup? If there is some chance that manual intervention is needed, I would log this as ERROR. If there is a sufficient compensation (a retry algorithm, say), I would log this as INFO, and log a permanent failure as ERROR. ERROR messages "wake me up in the middle of the night".

    If it's in between (e.g. the user got an error message, but no manual intervention is needed) I would log as WARN and collect errors like you do. Knowing me, I would perhaps even keep track of this in the application database, but that's probably gold plating. WARN messages don't wake me up in the middle of the night.
  • tfnico · 1 year ago
    Good idea. I'll try to get something like that in. Currently our use of WARN is more or less the same as DEBUG :)

    New record yesterday, only 17 errors in the log! \o/
  • jhannes · 1 year ago
    I'm starting to wonder: Is it just me, or are we all neglecting the "SEVERE/FATAL" error category. Should we use it more?
  • tfnico · 1 year ago
    Well, it is an alternative for us to use SEVERE for waking us up in the night. I feel FATAL is like the last thing that might get logged before a java.lang.Error gets thrown and the JVM dies.

    All in all it's very important that every team decides how they are going to use the log levels, and follow these conventions throughout project and maintenance.
  • jhannes · 1 year ago
    Cool. 'Nuff said.