Metrics¶
Metrics are used by Clusterman to record state about clusters that can be used later for autoscaling or simulation.
Clusterman uses a metrics interface API to ensure that all metric values are stored in a consistent format that can be
used both for autoscaling and simulation workloads. At present, all metric data is stored in DynamoDB, and accessed
using the ClustermanMetricsBotoClient
. In the future, the interface layer allows us to transparently change
backends if necessary.
Interacting with the Metrics Client¶
Metric Types¶
Metrics in Clusterman can be classified into one of three different types. Each metric type is stored in a separate namespace. Within each namespace, metric values are uniquely identified by their key and timestamp.
-
clusterman_metrics.
APP_METRICS
¶ metrics collected from client applications (e.g., number of application runs)
-
clusterman_metrics.
METADATA
¶ metrics collected about the cluster (e.g., current spot prices, instance types present)
-
clusterman_metrics.
SYSTEM_METRICS
¶ metrics collected about the cluster state (e.g., CPU, memory allocation)
Application metrics are designed to be read and written by the application owners to provide input into their autoscaling signals. System metrics and metadata can be read by application owners, but are written by batch jobs inside the Clusterman code base. Metadata metrics cannot be read by application owners and are only used for monitoring and simulation purposes.
Metric Keys¶
Metric keys have two components, a metric name and a set of dimensions. The metric key format is:
metric_name|dimension1=value1,dimension2=value2
This allows for metrics to be easily converted into SignalFX datapoints, where the metric name is used as the timeseries
name, and the dimensions are converted to SignalFX dimensions. The generate_key_with_dimensions()
helper
function will return the full metric key in its proper format. Use it to get the correct key when reading or writing
metrics.
Reading Metrics¶
The metrics client provides a function called ClustermanMetricsBotoClient.get_metric_values()
which can be
used to query the metrics datastore.
Note
In general, signal authors should not need to read metrics through the metrics client, because the
BaseSignal
takes care of reading metrics for the signal.
Writing Metrics¶
The metrics client provides a function called ClustermanMetricsBotoClient.get_writer()
; this function returns
an “enhanced generator” or coroutine (not an asyncio coroutine) which can be used to write metrics data into the
datastore. The generator pattern is used to allow writing to be batched together and reduce throughput capacity into
DynamoDB. See the API documentation for how to use this generator.
Example and Reference¶
DynamoDB Example Tables¶
The following tables show examples of how our data is stored in DynamoDB:
Application Metrics |
||
---|---|---|
metric name |
timestamp |
value |
app_A,my_runs |
1502405756 |
2 |
app_B,my_runs |
1502405810 |
201 |
app_B,metric2 |
1502405811 |
1.3 |
System Metrics |
||
---|---|---|
metric name |
timestamp |
value |
cpus_allocated|cluster=norcal-prod,pool=appA_pool |
1502405756 |
22 |
mem_allocated|cluster=norcal-prod,pool=appB_pool |
1502405810 |
20 |
Metadata |
||||
---|---|---|---|---|
metric name |
timestamp |
value |
<c3.xlarge, us-west-2a> |
<c3.xlarge, us-west-2c> |
spot_prices|aws_availability_zone=us-west-2a,aws_instance_type=c3.xlarge |
1502405756 |
1.30 |
||
spot_prices|aws_availability_zone=us-west-2c,aws_instance_type=c3.xlarge |
1502405756 |
5.27 |
||
fulfilled_capacity|cluster=norcal-prod,pool=seagull |
1502409314 |
4 |
20 |
Metric Name Reference¶
The following is a list of metric names and dimensions that Clusterman collects:
System Metrics¶
cpus_allocated|cluster=<cluster name>,pool=<pool>
mem_allocated|cluster=<cluster name>,pool=<pool>
disk_allocated|cluster=<cluster name>,pool=<pool>
Metadata Metrics¶
cpus_total|cluster=<cluster name>,pool=<pool>
disk_total|cluster=<cluster name>,pool=<pool>
fulfilled_capacity|cluster=<cluster name>,pool=<pool>
(separate column per InstanceMarket)mem_total|cluster=<cluster name>,pool=<pool>
spot_prices|aws_availability_zone=<availability zone>,aws_instance_type=<AWS instance type>
target_capacity|cluster=<cluster name>,pool=<pool>