github.com/jasei/check_docker_oomkiller

nagios/icinga check of OOM killed containers


Keywords
docker, golang, icinga, nagios, oom-killed-containers
License
MIT
Install
go get github.com/jasei/check_docker_oomkiller

Documentation

check_docker_OOMkiller

SYNOPSIS

$ check_docker_OOMkiller -l /tmp/check_docker_oomkiller
WARNING: Container 1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f (stress) was killed by OOM killer
exit status 1

DESCRIPTION

this is plugin for nagios, icinga or compatible monitoring

this plugin list non-running containers and check which was killed by OOM killer (means container have less memory then need)

because I don't each run of this plugin report same OOMKilled containers, is possible save last checked container (id) to file

this plugin comunicate with docker API via unix:///var/run/docker.sock and must have right rights (root/docker-group)

how it works?

this plugin list do inspect to all non-running container and check State.OOMKilled flag

this plugin do same like this command:

docker ps -a -q --filter=status=exited --filter=status=dead --filter=since=$LAST_CHECKED_CONTAINER_ID | xargs docker inspect --format "{{.State.OOMKilled}} {{.ID}} {{.Config.Image}}" | grep -E '^true'

OPTIONS

  • -l - file path to persist last checked container id
  • -w - OOM killed container is report as warning (default)
  • -c - OOM killed container is report as critical
  • --format - Format of output use go-templates like docker inspect (default "Container {{.ID}} ({{.Config.Image}}) was killed by OOM killer")
  • slack - slack token
  • slackChannel - slack (fallback) channel(s) to post message
  • slackUser - slack bot custom username (default are OOM killer)
  • debug - Enable debug mode. Debug prints are print to STDERR
  • debugFile - Redirect debug prints to file (must be set debug option too)

slack support

since version 1.1.0 this check supports slack

since version 2.0.0 we have breaking changes of slack support

  1. create new bot via https://YOUR-SLACK.slack.com/apps/manage/custom-integrations

  2. insert your token to option --slack YOUR_TOKEN

  3. use option slackChannel as fallback channel (if docker/container don't have set SLACK_CONTACT) to send message (support multiple channels) --slackChannel A --slackChannel B

  4. to send DM or docker specific channel, use docker image or container label SLACK_CONTACT

  • for send to DM use @user syntax
  • for send to channel use #channel syntax
  • is possible send to more DM/channel - use coma (,) as separator

example image label (Dockerfile)

FROM ...
LABEL SLACK_CONTACT "@user,@other_user"
...

example container label

docker run --label SLACK_CONTACT="@user,@otheruser"

WHY EXISTS THIS PLUGIN?

of course is possible find OOM killer in /var/log/messages (dmesg)

Dec 17 00:34:31 fanatica kernel: stress invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), order=0, oom_score_adj=0
Dec 17 00:34:31 fanatica kernel: stress cpuset=docker-1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f.scope mems_allowed=0
Dec 17 00:34:31 fanatica kernel: CPU: 1 PID: 12906 Comm: stress Tainted: P           OE   4.8.11-200.fc24.x86_64 #1
Dec 17 00:34:31 fanatica kernel: Hardware name: Dell Inc. Precision T1600/06NWYK, BIOS A07 10/17/2011
Dec 17 00:34:31 fanatica kernel: 0000000000000286 00000000250c8593 ffffa232b5bb3c50 ffffffffbb3e5f4d
Dec 17 00:34:31 fanatica kernel: ffffa232b5bb3d30 ffffa23554033d00 ffffa232b5bb3cb8 ffffffffbb24c308
Dec 17 00:34:31 fanatica kernel: ffffa2359d219580 ffffa23474e29e80 ffffffffbb1bcd86 0000000000080000
Dec 17 00:34:31 fanatica kernel: Call Trace:
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb3e5f4d>] dump_stack+0x63/0x86
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb24c308>] dump_header+0x5c/0x1d5
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb1bcd86>] ? find_lock_task_mm+0x36/0x80
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb1bd97c>] oom_kill_process+0x20c/0x3d0
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb23ccb5>] ? mem_cgroup_iter+0x105/0x2d0
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb23f2be>] mem_cgroup_out_of_memory+0x2ce/0x310
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb24022b>] mem_cgroup_oom_synchronize+0x33b/0x350
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb23abb0>] ? get_mem_cgroup_from_mm+0xa0/0xa0
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb1be01c>] pagefault_out_of_memory+0x4c/0xc0
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb062274>] mm_fault_error+0x94/0x190
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb0627f4>] __do_page_fault+0x484/0x4d0
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb062870>] do_page_fault+0x30/0x80
Dec 17 00:34:31 fanatica kernel: [<ffffffffbb804dc8>] page_fault+0x28/0x30
Dec 17 00:34:31 fanatica kernel: Task in /system.slice/docker-1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f.scope killed as a result of limit of /system.slice/docker-1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f.scope
Dec 17 00:34:31 fanatica kernel: memory: usage 1048576kB, limit 1048576kB, failcnt 70428
Dec 17 00:34:31 fanatica kernel: memory+swap: usage 2097080kB, limit 2097152kB, failcnt 17
Dec 17 00:34:31 fanatica kernel: kmem: usage 4708kB, limit 9007199254740988kB, failcnt 0
Dec 17 00:34:31 fanatica kernel: Memory cgroup stats for /system.slice/docker-1da1ac35179e4a431244118025a4806f88ca172ffdba9e035076add06e912e6f.scope: cache:0KB rss:1043868KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:82044KB swap:1048504KB inactive_anon:521952KB active_anon:521912KB inactive_file:0KB active_file:0KB unevictable:0KB
Dec 17 00:34:31 fanatica kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 17 00:34:31 fanatica kernel: [12870]     0 12870     1866        0       9       3       23             0 stress
Dec 17 00:34:31 fanatica kernel: [12906]     0 12906   526155   239002    1031       5   284061             0 stress
Dec 17 00:34:31 fanatica kernel: Memory cgroup out of memory: Kill process 12906 (stress) score 999 or sacrifice child
Dec 17 00:34:31 fanatica kernel: Killed process 12906 (stress) total-vm:2104620kB, anon-rss:956008kB, file-rss:0kB, shmem-rss:0kB

but you see only name of application in docker (for java application you see java invoked oom-killer: ...)

if you can pair with container, you find docker id (for example cpuset=docker-<ID>.scope) and then run docker inspect $ID

this plugin do this without the need parsing /var/log/messages file (and without parsing related problems like logrotate, format, localization, ...)