Hadoop metrics filtering

Окт 31 2014 Published by under Uncategorized

Hadoop and hbase is a complicated software with lots of stuff inside. It’s essential to keep an eye on them to be sure your installation is heavy and not going to break in next five minutes.

To do this, hadoop has a subsystem, called metrics which provides a way to send some value outside. Usually, this is just a number, representing something — count of requests performed, size of some buffer, etc.

The most common thing is to drop these values into ganglia to build nice charts out of them. There are lots of tutorials how to do this on the net — just google for «ganglia hadoop».

This article describes less known feature — filtering of metrics, which becomes unavoidable for mid-range HBase installations.

I suppose some prior knowledge about hadoop and ganglia.

Why filter?

Some time ago, HBase developers decided that emitting metrics about regionservers is not enough and started to produce metrics of regions itself. As regions tend to migrate from node to node, new ganglia chart will eventually be created for every region on every machine in cluster.

Let’s calculate. It’s about 30 metrics for region. If you have 1000 regions live on 100 machines (pretty moderate), it gives us 3 millions of fresh, completely useless RRD files on your ganglia servers. They are useless, as on every region migrate, partly-built chart will migrate to another rrd file.

I faced this problem right after CDH4 migration and it was disasterous, as we had much larger cluster than on this toy example (300 machines, 20k regions). At that time, the only way to resolve this was patch hbase to stop emitting these values. After HBase 0.94 there is better solution — metrics2 filters.

Filter syntax

To start filter events, several decisions must be made:

  • filter class
  • level of filtering
  • what to filter

Filter class

There are two classes implemented which provide actual filtering with different filtering syntax:

  • org.apache.hadoop.metrics2.filter.GlobFilter
  • org.apache.hadoop.metrics2.filter.RegexFilter


*.source.filter.class=org.apache.hadoop.metrics2.filter.GlobFilter
*.record.filter.class=${*.source.filter.class}
*.metric.filter.class=${*.source.filter.class}

Filter syntax

The filtering rule has the following syntax:


subsystem.[sink|source].sink_name.[sources|record|metric].filter.[include|exclude]

  • subsystem – kind of daemon: hbase, yarn, hdfs, etc
  • sink|source – have no idea what it is, just used sink and it works
  • sink_name – arbitrary name of sink used
  • sources|record|metric – level of filter to operate
  • include|exclude – will filter exclude or include metrics. If all rules are exclude, it works on a black list logic, if all are include, white list logic used.

Filter level

There are three levels to perform filtering:

  • sources – large group of metrics, usually subsystem (see next)
  • record – set of metrics grouped together. By default, class name taken as a record name
  • metric – name of emitted metric, for example blockCacheHitCount (please note that this is short name, not the full metric’s name appeared in ganglia. So, filter won’t get ‘regionserver.Server.blockCacheHitCount’, only ‘blockCacheHitCount’)

Names to filter

It’s a bit tricky to get list of metrics groups to do filtering, as they are hardcoded in sources. The simplest way to find all metrics provided by daemon is to get it from ‘Metrics dump’ tab in web interface in master or RS. It returns json with all metrics with their group and values. For example, small part of master’s dump:

}, {
"name" : "Hadoop:service=HBase,name=Master,sub=AssignmentManger",
"modelerType" : "Master,sub=AssignmentManger",
"tag.Context" : "master",
"tag.Hostname" : "dhcp-21-64",
"ritOldestAge" : 0,
"ritCount" : 0,
"BulkAssign_num_ops" : 1,
"BulkAssign_min" : 232,
"BulkAssign_max" : 232,
"BulkAssign_mean" : 232.0,
"BulkAssign_median" : 232.0,
"BulkAssign_75th_percentile" : 232.0,
"BulkAssign_95th_percentile" : 232.0,
"BulkAssign_99th_percentile" : 232.0,
"ritCountOverThreshold" : 0,
"Assign_num_ops" : 1,
"Assign_min" : 82,
"Assign_max" : 82,
"Assign_mean" : 82.0,
"Assign_median" : 82.0,
"Assign_75th_percentile" : 82.0,
"Assign_95th_percentile" : 82.0,
"Assign_99th_percentile" : 82.0
}, {

Under key ‘name’ we get source and record of these set of metrics (master’s assignment manager which does region assignment). ‘Master’ is a source (top-level) and ‘AssignmentManger’ (note the typo) is a record. The final metric’s name will be dot-combination of these parts (with random lower-case transform): «master.AssignmentManger.Assign_max»

So, to filter out AssignmentManager metrics from ganglia, you can write something like this:

hbase.sink.ganglia.record.filter.exclude=AssignmentManger

Rules to grab

These rules filters out not very interesting or just useless metrics (from my point of view):


hbase.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
hbase.sink.ganglia.period=2
# Warning: this must be an address of gmond mentioned in gmetad's sources directive
hbase.sink.ganglia.servers=ganglia-server:8649
# select glob filter for everything
*.source.filter.class=org.apache.hadoop.metrics2.filter.RegexFilter
*.record.filter.class=${*.source.filter.class}
*.metric.filter.class=${*.source.filter.class}
# remove these messy useless pseudo-statistical metrics
hbase.sink.ganglia.metric.filter.exclude=.*_(max|min|mean|median|percentile)
# filter out regions metrics completely, as Ganglia have no idea how to separate them from hosts
hbase.sink.ganglia.record.filter.exclude=Regions

Warning notes

There are several things which must be kept in mind when you configure filters:

  • in hadoop-2.3.0, metrics2 supports only one filter expression for include/exclude rules per filtering level – it takes only the first one and ignores the rest. As it’s not clear from the documentation (maybe, this fixed already), it’s a bit confusing.
  • there is no good documentation and schema definition for config file, so, typos in config do not cause warning messages in log – type carefully, check twice. Personally, I wasted about three days due to this — small typo, no warnings.
  • in sink address you should put address of one of gmond your gmetad collecting data from

Links


6 комментариев

  • tudor lapusan:

    Hi Shmuma,

    I’m the organizer of the BigData/DataScience meetup from my town, Cluj-Napoca.
    In this moment, I’m struggle with Ganglia, especially with filters and your tutorial is fitting like a glove for my use case.

    Can I share your article on our community site ? bigdataromania.ro

    Thanks,
    Tudor.

  • akmal:

    Hi, thank you for a great article. I was wondering is there any way to exclude several records? Lets say I want exclude both UgiMetrics and MetricsSystem. What to do in this case?
    I’ve tried
    datanode.sink.file.record.filter.exclude=UgiMetrics
    datanode.sink.file.record.filter.exclude=MetricsSystem
    but as you’ve mentioned in this case only UgiMetrics is being excluded.

    • Hi! I think it’s possible by combining these two filters into one regex (by using RegexFilter class). Unfortunatelly, those filters configuration file is horrible from operational point of view — with any minor mistake, all filtering just stops working silently. So, configuration of filters should be done carefully :).

  • Sanders Zhao:

    Shmuma,

    I have a HBase/Hadoop cluster with CDH 5.2, I added the following configuration items on CM on both Master Default Group-> Advanced and Region Server Group -> Advanced.

    I could get the metrics show on ganglia web, however it seems the filter doesn’t work at all. This problem bothers me a lot, any idea about this?

    *.period=10
    *.sink.ganglia.period=10
    *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
    *.source.filter.class=org.apache.hadoop.metrics2.filter.RegexFilter
    *.record.filter.class=${*.source.filter.class}
    *.metric.filter.class=${*.source.filter.class}
    hbase.sink.ganglia.metric.filter.exclude=.*_(max|min|mean|median|percentile)
    hbase.sink.ganglia.record.filter.exclude=Regions
    hbase.sink.ganglia.source.filter.exclude=.*Regions.*

    hbase.sink.ganglia.period=10
    hbase.sink.ganglia.servers=239.2.11.71:8649

    Thanks,
    Sanders Zhao

  • This object corresponds to the record created in metrics sources e.g., the «MyStat» in previous example.

Добавить комментарий