Skip to content

Improve logging by splitting into multiple streams for easier cross-section analysis #1623

Open
@achimnol

Description

@achimnol

To avoid too high volume of debug logs, currently Backend.AI Manager and Agent provides [debug].log-xxxx boolean options to enable/disable specific type of log messages, as shown in:

[debug]
# Enable or disable the debug-level logging.
enabled = false
# If set true, it does not actually delete the containers after they terminate or are terminated
# so that developers can inspect the container logs.
# This is useful for debugging errors that make containers to terminate immediately after kernel
# launches, due to bugs in initialization steps such as jail.
skip-container-deletion = false
# Enable or disable the asyncio debug mode.
asyncio = false
# Use the custom task factory to get more detailed asyncio task information; this may have performance penalties
enhanced-aiomonitor-task-info = false
# Enable the debug mode of the kernel-runner.
kernel-runner = false
# Include debug-level logs for internal events.
log-events = false
# Include debug-level logs for detailed kernel creation configs and their resource spec.
log-kernel-config = false
# Include debug-level logs for allocation maps.
log-alloc-map = false
# Include debug-level logs for statistics.
log-stats = false
# Include debug-level logs for heartbeats
log-heartbeats = false
# Set the interval of agent heartbeats in seconds.
heartbeat-interval = 20.0
# Include debug-level logs for docker event stream.
log-docker-events = false

However, this approach makes it harder to perform a postmortem anaylsis on customer sites because we usually turn off many of these log "sections" by default.

Let's split out them to multiple different log streams with higher log levels. For example, as most container engine events are logged in the DEBUG level with log-docker-events = true currently, let's promote them to the INFO level using a separate "agent-container-events.log" output stream.

This will allow easier cross-section, postmortem analysis if combined with additional log viewer tools like #1138.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions