Chapter 4. Reports

4.1. About Reports
Copy link

A Report is an API object that provides a method to manage periodic ETL (Extract Transform and Load) jobs using SQL queries. They are composed using other Metering resources such as ReportQueries, which provide the actual SQL query to run, and ReportDataSources, which are what define the data available to the ReportQueries and Reports.

Many use cases are addressed out-of-the-box with the predefined ReportQueries and ReportDataSources that come installed with metering, so you do not need to define your own unless you have a use-case not covered by what is predefined.

4.1.1. Reports
Copy link

The Report custom resource is used to manage the execution and status of reports. Metering produces reports derived from usage data sources, which can be used in further analysis and filtering.

A single Report resource represents a job that manages a database table and updates it with new information according to a schedule. The Report exposes the data in that table via the reporting-operator HTTP API. Reports with a spec.schedule field set are always running and track what time periods it has collected data for. This ensures that if metering is shutdown or unavailable for an extended period of time, it will backfill the data starting where it left off. If the schedule is unset, then the Report will run once for the time specified by the reportingStart and reportingEnd. By default, reports wait for ReportDataSources to have fully imported any data covered in the reporting period. If the Report has a schedule, it will wait to run until the data in the period currently being processed has finished importing.

4.1.1.1. Example Report with a Schedule
Copy link

The following example Report will contain information on every Pod’s CPU requests, and will run every hour, adding the last hours worth of data each time it runs.

apiVersion: metering.openshift.io/v1
kind: Report
metadata:
  name: pod-cpu-request-hourly
spec:
  query: "pod-cpu-request"
  reportingStart: "2019-07-01T00:00:00Z"
  schedule:
    period: "hourly"
    hourly:
      minute: 0
      second: 0

apiVersion: metering.openshift.io/v1
kind: Report
metadata:
  name: pod-cpu-request-hourly
spec:
  query: "pod-cpu-request"
  reportingStart: "2019-07-01T00:00:00Z"
  schedule:
    period: "hourly"
    hourly:
      minute: 0
      second: 0

Copy to Clipboard

Toggle word wrap

4.1.1.2. Example Report without a Schedule (Run-Once)
Copy link

The following example Report will contain information on every Pod’s CPU requests for all of July. After completion, it does not run again.

apiVersion: metering.openshift.io/v1
kind: Report
metadata:
  name: pod-cpu-request-hourly
spec:
  query: "pod-cpu-request"
  reportingStart: "2019-07-01T00:00:00Z"
  reportingEnd: "2019-07-31T00:00:00Z"

apiVersion: metering.openshift.io/v1
kind: Report
metadata:
  name: pod-cpu-request-hourly
spec:
  query: "pod-cpu-request"
  reportingStart: "2019-07-01T00:00:00Z"
  reportingEnd: "2019-07-31T00:00:00Z"

Copy to Clipboard

Toggle word wrap

4.1.1.3. query
Copy link

Names the ReportQuery used to generate the report. The report query controls the schema of the report as well as how the results are processed.

query is a required field.

Use the oc CLI to obtain a list of available ReportQuery objects:

oc -n openshift-metering get reportqueries

$ oc -n openshift-metering get reportqueries
NAME                                         AGE
cluster-cpu-capacity                         23m
cluster-cpu-capacity-raw                     23m
cluster-cpu-usage                            23m
cluster-cpu-usage-raw                        23m
cluster-cpu-utilization                      23m
cluster-memory-capacity                      23m
cluster-memory-capacity-raw                  23m
cluster-memory-usage                         23m
cluster-memory-usage-raw                     23m
cluster-memory-utilization                   23m
cluster-persistentvolumeclaim-request        23m
namespace-cpu-request                        23m
namespace-cpu-usage                          23m
namespace-cpu-utilization                    23m
namespace-memory-request                     23m
namespace-memory-usage                       23m
namespace-memory-utilization                 23m
namespace-persistentvolumeclaim-request      23m
namespace-persistentvolumeclaim-usage        23m
node-cpu-allocatable                         23m
node-cpu-allocatable-raw                     23m
node-cpu-capacity                            23m
node-cpu-capacity-raw                        23m
node-cpu-utilization                         23m
node-memory-allocatable                      23m
node-memory-allocatable-raw                  23m
node-memory-capacity                         23m
node-memory-capacity-raw                     23m
node-memory-utilization                      23m
persistentvolumeclaim-capacity               23m
persistentvolumeclaim-capacity-raw           23m
persistentvolumeclaim-phase-raw              23m
persistentvolumeclaim-request                23m
persistentvolumeclaim-request-raw            23m
persistentvolumeclaim-usage                  23m
persistentvolumeclaim-usage-raw              23m
persistentvolumeclaim-usage-with-phase-raw   23m
pod-cpu-request                              23m
pod-cpu-request-raw                          23m
pod-cpu-usage                                23m
pod-cpu-usage-raw                            23m
pod-memory-request                           23m
pod-memory-request-raw                       23m
pod-memory-usage                             23m
pod-memory-usage-raw                         23m

Copy to Clipboard

Toggle word wrap

ReportQueries with the -raw suffix are used by other ReportQueries to build more complex queries, and should not be used directly for reports.

namespace- prefixed queries aggregate Pod CPU/memory requests by namespace, providing a list of namespaces and their overall usage based on resource requests.

pod- prefixed queries are similar to namespace- prefixed queries but aggregate information by Pod rather than namespace. These queries include the Pod’s namespace and node.

node- prefixed queries return information about each node’s total available resources.

aws- prefixed queries are specific to AWS. Queries suffixed with -aws return the same data as queries of the same name without the suffix, and correlate usage with the EC2 billing data.

The aws-ec2-billing-data report is used by other queries, and should not be used as a standalone report. The aws-ec2-cluster-cost report provides a total cost based on the nodes included in the cluster, and the sum of their costs for the time period being reported on.

For a complete list of fields, use the oc CLI to get the ReportQuery as YAML, and check the spec.columns field:

For example, run:

oc -n openshift-metering get reportqueries namespace-memory-request -o yaml

$ oc -n openshift-metering get reportqueries namespace-memory-request -o yaml

Copy to Clipboard

Toggle word wrap

You should see output like:

apiVersion: metering.openshift.io/v1
kind: ReportQuery
metadata:
  name: namespace-memory-request
  labels:
    operator-metering: "true"
spec:
  columns:
  - name: period_start
    type: timestamp
    unit: date
  - name: period_end
    type: timestamp
    unit: date
  - name: namespace
    type: varchar
    unit: kubernetes_namespace
  - name: pod_request_memory_byte_seconds
    type: double
    unit: byte_seconds

apiVersion: metering.openshift.io/v1
kind: ReportQuery
metadata:
  name: namespace-memory-request
  labels:
    operator-metering: "true"
spec:
  columns:
  - name: period_start
    type: timestamp
    unit: date
  - name: period_end
    type: timestamp
    unit: date
  - name: namespace
    type: varchar
    unit: kubernetes_namespace
  - name: pod_request_memory_byte_seconds
    type: double
    unit: byte_seconds

Copy to Clipboard

Toggle word wrap

4.1.1.4. schedule
Copy link

The spec.schedule configuration block defines when the report runs. The main fields in the schedule section are period, and then depending on the value of period, the fields hourly, daily, weekly, and monthly allow you to fine-tune when the report runs.

For example, if period is set to weekly, you can add a weekly field to the spec.schedule block. The following example will run once a week on Wednesday, at 1 PM (hour 13 in the day).

...
  schedule:
    period: "weekly"
    weekly:
      dayOfWeek: "wednesday"
      hour: 13
...

...
  schedule:
    period: "weekly"
    weekly:
      dayOfWeek: "wednesday"
      hour: 13
...

Copy to Clipboard

Toggle word wrap

4.1.1.4.1. period
Copy link

Valid values of schedule.period are listed below, and the options available to set for a given period are also listed.

hourly
- minute
- second
daily
- hour
- minute
- second
weekly
- dayOfWeek
- hour
- minute
- second
monthly
- dayOfMonth
- hour
- minute
- second
cron
- expression

Generally, the hour, minute, second fields control when in the day the report runs, and dayOfWeek/dayOfMonth control what day of the week, or day of month the report runs on, if it is a weekly or monthly report period.

For each of these fields, there is a range of valid values:

hour is an integer value between 0-23.
minute is an integer value between 0-59.
second is an integer value between 0-59.
dayOfWeek is a string value that expects the day of the week (spelled out).
dayOfMonth is an integer value between 1-31.

For cron periods, normal cron expressions are valid:

expression: "*/5 * * * *"

4.1.1.5. reportingStart
Copy link

To support running a Report against existing data, you can set the spec.reportingStart field to a RFC3339 timestamp to tell the Report to run according to its schedule starting from reportingStart rather than the current time. One important thing to understand is that this will result in the reporting-operator running many queries in succession for each interval in the schedule that is between the reportingStart time and the current time. This could be thousands of queries if the period is less than daily and the reportingStart is more than a few months back. If reportingStart is left unset, the Report will run at the next full reportingPeriod after the time the report is created.

As an example of how to use this field, if you had data already collected dating back to January 1st, 2019, which you wanted to be included in your Report, you could create a report with the following values:

apiVersion: metering.openshift.io/v1
kind: Report
metadata:
  name: pod-cpu-request-hourly
spec:
  query: "pod-cpu-request"
  schedule:
    period: "hourly"
  reportingStart: "2019-01-01T00:00:00Z"

apiVersion: metering.openshift.io/v1
kind: Report
metadata:
  name: pod-cpu-request-hourly
spec:
  query: "pod-cpu-request"
  schedule:
    period: "hourly"
  reportingStart: "2019-01-01T00:00:00Z"

Copy to Clipboard

Toggle word wrap

4.1.1.6. reportingEnd
Copy link

To configure a Report to only run until a specified time, you can set the spec.reportingEnd field to an RFC3339 timestamp. The value of this field will cause the Report to stop running on its schedule after it has finished generating reporting data for the period covered from its start time until reportingEnd. Because a schedule will most likely not align with reportingEnd, the last period in the schedule will be shortened to end at the specified reportingEnd time. If left unset, then the Report will run forever, or until a reportingEnd is set on the Report.

For example, if you wanted to create a report that runs once a week for the month of July:

apiVersion: metering.openshift.io/v1
kind: Report
metadata:
  name: pod-cpu-request-hourly
spec:
  query: "pod-cpu-request"
  schedule:
    period: "weekly"
  reportingStart: "2019-07-01T00:00:00Z"
  reportingEnd: "2019-07-31T00:00:00Z"

apiVersion: metering.openshift.io/v1
kind: Report
metadata:
  name: pod-cpu-request-hourly
spec:
  query: "pod-cpu-request"
  schedule:
    period: "weekly"
  reportingStart: "2019-07-01T00:00:00Z"
  reportingEnd: "2019-07-31T00:00:00Z"

Copy to Clipboard

Toggle word wrap

4.1.1.7. runImmediately
Copy link

When runImmediately is set to true, the report will be run immediately. This behavior ensures that the report is immediately processed and queued without requiring additional scheduling parameters.

Note

When runImmediately is set to true you must set a reportingEnd and reportingStart value.

4.1.1.8. inputs
Copy link

The spec.inputs field of a Report can be used to override or set values defined in a ReportQuery’s spec.inputs field.

It is a list of name-value pairs:

spec:
  inputs:
  - name: "NamespaceCPUUsageReportName"
    value: "namespace-cpu-usage-hourly"

spec:
  inputs:
  - name: "NamespaceCPUUsageReportName"
    value: "namespace-cpu-usage-hourly"

Copy to Clipboard

Toggle word wrap

The name of an input must exist in the ReportQuery’s inputs list. The value of the input must be the correct type for the input’s type.

4.1.1.9. Roll-up Reports
Copy link

Report data is stored in the database much like metrics themselves, and therefore, can be used in aggregated or roll-up reports. A simple use case for a roll-up report is to spread the time required to produce a report over a longer period of time; instead of: requiring a monthly report to query and add all data over an entire month, the task can be split into daily reports that each run over a thirtieth of the data.

A custom roll-up report requires a custom report query. The ReportQuery template processor provides a function: reportTableName that can get the necessary table name from a Report’s metadata.name.

Below is a snippet taken from a built-in query:

Taken from pod-cpu.yaml

# Taken from pod-cpu.yaml
spec:
...
  inputs:
  - name: ReportingStart
    type: time
  - name: ReportingEnd
    type: time
  - name: NamespaceCPUUsageReportName
    type: Report
  - name: PodCpuUsageRawDataSourceName
    type: ReportDataSource
    default: pod-cpu-usage-raw
...

  query: |
...
    {|- if .Report.Inputs.NamespaceCPUUsageReportName |}
      namespace,
      sum(pod_usage_cpu_core_seconds) as pod_usage_cpu_core_seconds
    FROM {| .Report.Inputs.NamespaceCPUUsageReportName | reportTableName |}
...

Copy to Clipboard

Toggle word wrap

aggregated-report.yaml

# aggregated-report.yaml
spec:
  query: "namespace-cpu-usage"
  inputs:
  - name: "NamespaceCPUUsageReportName"
    value: "namespace-cpu-usage-hourly"

Copy to Clipboard

Toggle word wrap

4.1.1.9.1. Report Status
Copy link

The execution of a scheduled report can be tracked using its status field. Any errors occurring during the preparation of a report will be recorded here.

The status field of a Report currently has two fields:

conditions: Conditions is a list of conditions, each of which have a type, status, reason, and message field. Possible values of a condition’s type field are Running and Failure, indicating the current state of the scheduled report. The reason indicates why its condition is in its current state with the status being either true, false or, unknown. The message provides a human readable indicating why the condition is in the current state. For detailed information on the reason values see pkg/apis/metering/v1/util/report_util.go.
lastReportTime: Indicates the time Metering has collected data up to.

4.2. Storage Locations
Copy link

A StorageLocation is a custom resource that configures where data will be stored by the reporting-operator. This includes the data collected from Prometheus, and the results produced by generating a Report custom resource.

You only need to configure a StorageLocation if you want to store data in multiple locations, like multiple S3 buckets or both S3 and HDFS, or if you wish to access a database in Hive/Presto that was not created by metering. For most users this is not a requirement, and the documentation on configuring metering is sufficent to configure all necessary storage components.

4.2.1. StorageLocation examples
Copy link

This first example is what the built-in local storage option looks like. It is configured to use Hive, and by default data is stored wherever Hive is configured to use storage (HDFS, S3, or a ReadWriteMany PVC).

Local storage example

apiVersion: metering.openshift.io/v1
kind: StorageLocation
metadata:
  name: hive
  labels:
    operator-metering: "true"
spec:
  hive: 
    databaseName: metering 
    unmanagedDatabase: false

apiVersion: metering.openshift.io/v1
kind: StorageLocation
metadata:
  name: hive
  labels:
    operator-metering: "true"
spec:
  hive:


    databaseName: metering


    unmanagedDatabase: false

Copy to Clipboard

Toggle word wrap

1: If the hive section is present, then the StorageLocation will be configured to store data in Presto by creating the table using Hive server. Only databaseName and unmanagedDatabase are required fields.
2: The name of the database within hive.
3: If true, then this StorageLocation will not be actively managed, and the databaseName is expected to already exist in Hive. If false, this will cause the reporting-operator to create the database in Hive.

The next example uses an AWS S3 bucket for storage. The prefix is appended to the bucket name when constructing the path to use.

Remote storage example

apiVersion: metering.openshift.io/v1
kind: StorageLocation
metadata:
  name: example-s3-storage
  labels:
    operator-metering: "true"
spec:
  hive:
    databaseName: example_s3_storage
    unmanagedDatabase: false
    location: "s3a://bucket-name/path/within/bucket"

apiVersion: metering.openshift.io/v1
kind: StorageLocation
metadata:
  name: example-s3-storage
  labels:
    operator-metering: "true"
spec:
  hive:
    databaseName: example_s3_storage
    unmanagedDatabase: false
    location: "s3a://bucket-name/path/within/bucket"

Copy to Clipboard

Toggle word wrap

1: (optional) The filesystem URL for Presto and Hive to use for the database. This can be an hdfs:// or s3a:// filesystem URL.

There are some additional optional fields that can be specified in the hive section:

(optional) defaultTableProperties: Contains configuration options for creating tables using Hive.
(optional) fileFormat: The file format used for storing files in the filesystem. See the Hive Documentation on File Storage Format for a list of options and more details.
(optional) rowFormat: Controls the Hive row format. This controls how Hive serializes and deserializes rows. See the Hive Documentation on Row Formats and SerDe for more details.

4.2.2. Default StorageLocation
Copy link

If an annotation storagelocation.metering.openshift.io/is-default exists and is set to true on a StorageLocation resource, then that resource becomes the default storage resource. Any components with a storage configuration option where StorageLocation is not specified will use the default storage resource. There can be only one default storage resource. If more than one resource with the annotation exists, an error is logged because the Operator cannot determine the default.

Default storage example

apiVersion: metering.openshift.io/v1
kind: StorageLocation
metadata:
  name: example-s3-storage
  labels:
    operator-metering: "true"
  annotations:
    storagelocation.metering.openshift.io/is-default: "true"
spec:
  hive:
    databaseName: example_s3_storage
    unmanagedDatabase: false
    location: "s3a://bucket-name/path/within/bucket"

apiVersion: metering.openshift.io/v1
kind: StorageLocation
metadata:
  name: example-s3-storage
  labels:
    operator-metering: "true"
  annotations:
    storagelocation.metering.openshift.io/is-default: "true"
spec:
  hive:
    databaseName: example_s3_storage
    unmanagedDatabase: false
    location: "s3a://bucket-name/path/within/bucket"

Copy to Clipboard

Toggle word wrap

Chapter 4. Reports

4.1. About Reports
Copy link

4.1.1. Reports
Copy link

4.1.1.1. Example Report with a Schedule
Copy link

4.1.1.2. Example Report without a Schedule (Run-Once)
Copy link

4.1.1.3. query
Copy link

4.1.1.4. schedule
Copy link

4.1.1.4.1. period
Copy link

4.1.1.5. reportingStart
Copy link

4.1.1.6. reportingEnd
Copy link

4.1.1.7. runImmediately
Copy link

4.1.1.8. inputs
Copy link

4.1.1.9. Roll-up Reports
Copy link

4.1.1.9.1. Report Status
Copy link

4.2. Storage Locations
Copy link

4.2.1. StorageLocation examples
Copy link

4.2.2. Default StorageLocation
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 4. Reports

4.1. About ReportsCopy linkLink copied to clipboard!

4.1.1. ReportsCopy linkLink copied to clipboard!

4.1.1.1. Example Report with a ScheduleCopy linkLink copied to clipboard!

4.1.1.2. Example Report without a Schedule (Run-Once)Copy linkLink copied to clipboard!

4.1.1.3. queryCopy linkLink copied to clipboard!

4.1.1.4. scheduleCopy linkLink copied to clipboard!

4.1.1.4.1. periodCopy linkLink copied to clipboard!

4.1.1.5. reportingStartCopy linkLink copied to clipboard!

4.1.1.6. reportingEndCopy linkLink copied to clipboard!

4.1.1.7. runImmediatelyCopy linkLink copied to clipboard!

4.1.1.8. inputsCopy linkLink copied to clipboard!

4.1.1.9. Roll-up ReportsCopy linkLink copied to clipboard!

4.1.1.9.1. Report StatusCopy linkLink copied to clipboard!

4.2. Storage LocationsCopy linkLink copied to clipboard!

4.2.1. StorageLocation examplesCopy linkLink copied to clipboard!

4.2.2. Default StorageLocationCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

4.1. About Reports
Copy link

4.1.1. Reports
Copy link

4.1.1.1. Example Report with a Schedule
Copy link

4.1.1.2. Example Report without a Schedule (Run-Once)
Copy link

4.1.1.3. query
Copy link

4.1.1.4. schedule
Copy link

4.1.1.4.1. period
Copy link

4.1.1.5. reportingStart
Copy link

4.1.1.6. reportingEnd
Copy link

4.1.1.7. runImmediately
Copy link

4.1.1.8. inputs
Copy link

4.1.1.9. Roll-up Reports
Copy link

4.1.1.9.1. Report Status
Copy link

4.2. Storage Locations
Copy link

4.2.1. StorageLocation examples
Copy link

4.2.2. Default StorageLocation
Copy link