Skip to main content

Reporting Spec Draft 1

Version 1 draft (WIP 0.3), updated 2023-12-31

Prepared by Gennady Kovshenin

Abstract

The purpose of this document is to outline and specify an approach to gathering, normalizing, processing, indexing and presenting data from WordPress databases for technical and business reporting purposes, with performance, accuracy and reliability, flexibility in mind.

Motivation

There are many WordPress plugins available that generate reports for WordPress. These fall short of client expectations in many cases: performance, accuracy, flexibility. While some plugins offer some caching and indexing capabilities, none of them offer a full-fledged, customizable report-building framework that we propose to develop.

Inspiration

There is much inspiration to be drawn from plugins like wpDataTables, and PHP frameworks like KoolReport. BI analytics tools like Looker and Tableau also offer hints at UI/UX approaches and flexibility. ElasticSearch is also a source of inspiration on how to index, process and query data in a structured manner.

Terminology

The following terminology is used throughout this document:

LMS

Any learning management system compatible with WordPress (LearnDash, Sensei, etc.).

Report

A visual representation of processed and aggregated data, usually from the later stages in a pipeline.

Data

All the data available in the WordPress database that a report is built upon.

Template

An all-encompassing set of rules and configuration placeholders defining data sources, ingestion rules, indexing, caching, processing, transformation, aggregation, formatting, layout and presentation.

Framework

A structured, staged and pipelined end-to-end approach from configuration to presentation.

Pipeline

An ordered queue of stages that are processed synchronously feeding the output of one into the input of the next one.

Stage

A stage is a subset of processes that have an input and output. Stages can have dependent stages, the ordering of stages is done in a pipeline.

Memory

RAM and in some cases diskspace.

Load

CPU and IO loads.

Architecture

We propose a framework that has the following stages:

Sourcing

The sourcing stage is responsible for gathering the necessary data. There can be several sourcing stages in a report pipeline.

Transformation

The transformation stage prunes, cleans and alters the sourced data in no particular order. The transformation stage should not prepare the data for presentation by escaping, formatting, etc. The transformations at this stage should only be purposed and optimized for data querying (unserializing, decrypting, etc.).

Caching

The caching stage optimizes data source results into ephemeral and/or dynamic caches. While not necessary for work as templates can output data directly from uncached sources, performance can be highly improved to provide dynamic, filterable report presentations. This allows us to trade off memory consumption and real-timeness for performance.

Every stage can have dependencies on other stages to have been completed. Stages are reusable and can be referenced between reports. This allows for flexible pipeline compositions.

The building blocks for a pipeline are set in a schema.

Report Schema

The report schema is loosely based on JSON Schema with no formal dialect or vocabularies existing in version 1. It defines everything required to source, build, cache and render reports.

Find the following schema here: https://bitbucket.org/tangibleinc/tangible-reporting/src/trunk/schema/

Report

schema/v1/report.annotated.mjs
/**
* This file contains annotated JSON Schema for Tangible Reporting
*
* This file cannot be used as is. A `report.json` schema file
* can be produced from this one. The same goes for all the subschemas
* that this schema references.
*/

export default {
"$schema": "/schemas/tangible-reporting/v{version}/dialect",
"$id": "/schemas/tangible-reporting/v{version}/report",

// {version} is the reporting specification version number (1)

"type": "object",
"description": "The main report definition object",
"properties": {
"version": {
"type": "string",
"description": "The tangible-reporting schema version",
"required": true,
// A reference to $id: "/schemas/tangible-reporting/v1/report"
},

"sources": {
"type": "array",
"description": "Data sourcing definitions",
"required": true,
"items": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "A unique identifier for this source",
"required": true,
},
"type": {
"enum": ["query", "function"],
"description": "The source type",
"required": true,
},
"dependencies": {
"type": "array",
"description": "An array of dependencies (report#sources/id)",
"items": { "type": "string" }
},
"definition": {
"required": true,
"oneOf": [
{ "$ref": "/schemas/tangible-reporting/v{version}/report/query" },
{ "$ref": "/schemas/tangible-reporting/v{version}/report/function" },
]
},
},
},
},

"transforms": {
"type": "array",
"description": "Data transformation definitions",
"required": true,
"items": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "A unique identifier for this transformation",
"required": true,
},
"type": {
"enum": ["function"],
"description": "The transform type",
"required": true,
},
"dependencies": {
"type": "array",
"description": "An array of dependencies (report#transforms/id, report#sources/id, report#caches/id)",
"items": { "type": "string" }
},
"definition": {
"required": true,
"type": {
"$ref": "/schemas/tangible-reporting/v{version}/report/function",
},
},
},
},
},

"caches": {
"type": "array",
"description": "Data cache definitions",
"required": true,
"items": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "A unique identifier for this cache",
"required": true,
},
"dependencies": {
"type": "array",
"description": "An array of dependencies (report#transforms/id, report#sources/id)",
"items": { "type": "string" }
},
"definition": {
"required": true,
"type": {
"$ref": "/schemas/tangible-reporting/v{version}/report/cache",
},
},
},
},
},

"parameters": {
"type": "array",
"description": "Data parameters definitions",
"required": true,
"items": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "A unique identifier for this parameter",
"required": true,
},
"definition": {
"required": true,
"type": {
"$ref": "/schemas/tangible-reporting/v{version}/report/parameter" },
}
},
},
},
},

"template": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "A unique identifier for this template",
"required": true,
},
"definition": {
"required": true,
"type": {
"$ref": "/schemas/tangible-reporting/v{version}/report/template"
},
},
},
},
},
},
}

Query

schema/v1/report/sql.annotated.mjs
export default {
"$schema": "/schemas/tangible-reporting/v{version}/dialect",
"$id": "/schemas/tangible-reporting/v{version}/report/query",

"type": "object",
"description": "A SQL query source",
"properties": {
"id": {
"type": "string",
"description": "A unique identifier for this query",
"required": true,
},
"database": {
"type": "string",
"description": "A reference to the database to query",
"required": true,

// A database reference can be in the form of a DSN
// which has to start with "mysql://"
// Or "$wpdb" which will use the existing WordPress
// database connection
},
"statement": {
"type": "string",
"description": "A SQL parametrized $wpdb statement",
"required": true,
},
},
}

Function

schema/v1/report/function.annotated.mjs
export default {
"$schema": "/schemas/tangible-reporting/v{version}/dialect",
"$id": "/schemas/tangible-reporting/v{version}/report/function",

"type": "object",
"description": "A function to run",
"properties": {
"id": {
"type": "string",
"description": "A unique identifier for this function",
"required": true,
},
"callable": {
"type": "string",
"description": "The PHP function to call",
"required": true,
// A stringified PHP callable ("my_function" or "MyClass::my_method")
// The function should not return anything and operate on existing caches
},
"parameters": {
"type": "array",
"description": "Parameters for the function",
// Parameters can reference (report#parameters/id) via $ref
},
},
}

Parameter

schema/v1/report/parameter.annotated.mjs
export default {
"$schema": "/schemas/tangible-reporting/v{version}/dialect",
"$id": "/schemas/tangible-reporting/v{version}/report/parameter",

"type": "object",
"description": "A parameter to run",
"properties": {
"id": {
"type": "string",
"description": "A unique identifier for this parameter",
"required": true,

// Parameters are dynamic data are used throughout
// all stages of building and presenting a report
},
"type": {
"enum": ["function", "array", "literal"],
"description": "The parameter type",
"required": true,
},
"value": {
"oneOf": [
"$ref": "/schemas/tangible-reporting/v{version}/report/function",
"string",
"number",
{
"type": "array",
"items": { "$ref": "/schemas/tangible-reporting/v{version}/report/parameter" },
}
],
"description": "The parameter value"
"required": true,

// Parameter values can be functions (that get data from a request or the database),
// strings, numbers (integers, floats), or arrays of parameters.
},
},
}

Cache

schema/v1/report/cache.annotated.mjs
export default {
"$schema": "/schemas/tangible-reporting/v{version}/dialect",
"$id": "/schemas/tangible-reporting/v{version}/report/cache",

"type": "object",
"description": "A cache layer",
"properties": {
"id": {
"type": "string",
"description": "A unique identifier for this cache",
"required": true,
},
"type": {
"enum": ["db", "function"],
"description": "The cache type"
"required": true,
},
"definition": {
"oneOf": [
"$ref": "/schemas/tangible-reporting/v{version}/report/function",
"$ref": "/schemas/tangible-reporting/v{version}/report/cache/db",
],
"description": "The cache implementation"
"required": true,

// Function should cache the necessary data and return it, minding the TTL
// and invalidations on its own. The db layer does caching via Redbean.
},
"ttl": {
"type": "integer",
"description": "The amount of time to invalidate this cache in seconds."
"required": true,
},
"key": {
"type": "array",
"items": "string",
"description": "The parameters that make this cache unique.",
// Reference a report#parameter/id
},
},
}

Template

schema/v1/report/template.annotated.mjs
export default {
"$schema": "/schemas/tangible-reporting/v{version}/dialect",
"$id": "/schemas/tangible-reporting/v{version}/report/template",

"type": "object",
"description": "A template",
"properties": {
"id": {
"type": "string",
"description": "A unique identifier for this template",
"required": true,
},
"type": {
"enum": ["file", "function"],
},
"definition": {
"oneOf": [
"$ref": "/schemas/tangible-reporting/v{version}/report/function",
"string",
],
"description": "The template file location or callback"
"required": true,

// If a string is passed it's assumed to be a path.
// $abspath can be used to signal the current WordPress
// ABSPATH. This merely includes the file that has
// access to the reporting framework.
},
},
}

Database Cache

schema/v1/report/cache/db.annotated.mjs
export default {
"$schema": "/schemas/tangible-reporting/v{version}/dialect",
"$id": "/schemas/tangible-reporting/v{version}/report/cache/db",

"type": "object",
"description": "A database-driven cache layer",
"properties": {
"id": {
"type": "string",
"description": "A unique identifier for this database cache",
"required": true,
},
"database": {
"type": "string",
"description": "@see report#query/database",
"required": true,
// A MYSQL database connection to use for caching
},
},
}

Due to the lack of well-maintained JSON Schema processors in the PHP world we shall implement and maintain our minimal subset of the specification with no formal constraints beyond those implemented in the underlying business logic code.

Example

{
"$id": "/
"version": "/schemas/tangible-reporting/v1/report",

"sources": [
{
"id": "sample-report/course-progress/by-course-id",
"type": "query",
"definition": {
"id": "sample-report/course-progress/by-course-id/all",
"database": "$wpdb",

}
}
]
// TBD
}

It is highly advised to prefix id fields with the type of object to avoid conflicts when referencing objects by their id fields.

Developer API

We propose a flexible development API that provides primitives and entrypoints into schemas to trigger, modify, extend report processing, building and presentation. All classes and APIs are under the Tangible\Reporting namespace.

The API provides a set of classes which somewhat map to the JSON Schema.

Repository

The Repository class is responsible for maintaining and resolving Reports and report components (stages, templates, etc.). It is advised to start all interactions via a configured Repository. The provided Repository looks through the tangible-report WordPress Post Type, which have schemas attached in their post metas under the tr-schema meta key.

Repositories convert schemas into the relevant objects with the help of lower level classes (parsers, lexers, etc.).

Report

A Report class contains all the necessary methods to structure and output a report correctly.

$report = $repository->getReport( 'my-sample-report' );
$template = $report->getTemplate('template-name');
$template->render(); // Renders the template

A wrapper function is available get_report( $name_or_id ) that gracefully handles exceptions.

Developing Templates

Templates always have access to the $report instance they’ve been called from. Fetching data from a report has to be done via this instance.