Data Processing

Setup your machine to process your data.

Overview

Data coming from the datasource is piped through various processing nodes until it reaches the data store as final destination.

In the previous section, we learned how to set settings for datasources to connect and pull data from your database or another sources. The data will be then streaming through many processing nodes. Each processing node does a specific task upon the data stream. Together they will help to transform original data into meaningful information/data to be consumed in the report view.

Example


<?php
use \koolreport\processes\Filter;
use \koolreport\processes\Group;
class MyReport extends \koolreport\KoolReport
{
    ...
    public function setup()
    {
        $this->src('sales')
        ->query("SELECT customerName, productName,region, amount from tblPurchases")
        ->pipe(new Filter(array(
            array("region","=","asia")
        )))
        ->pipe(new Group(array(
            "by"=>"customerName",
            "sum"=>"amount"
        )))
        ->pipe($this->dataStore('sales'));
    }
}
                

In above example, we want to summarize total purchased amount per customer from Asia region. The data stream from SQL Query will filtered by region then grouped by customerName. In the end, the result of processing will be store in dataStore named "sales".

There are FOUR (4) major groups of processing nodes. They are Row, Column, Transformation and Analysis.

Table

Table process nodes contain process that apply changes to the whole data set.

Transpose

Transpose process help to transpose table, changing row to column and column to row. The new columns will be name as "c0", "c1", "c2" to "cn". The "c0" is a specical column which contains the name of column before being transposed.

Example


<?php
use \koolreport\processes\Transpose;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new Transpose())
        ...
    }
}                        
                        

Row

Row process nodes contain processes make changes to data rows. For example, you need to filter row by condition, limit row result and so on.

Filter

Filter process helps to filter your data based on condition. It is like the WHERE statement in SQL Query. If your data is from database, we encourage you to use the filtering feature of database. However, if your data coming from other sources such as CSV or Microsoft Excel then you need Filter.

Example


<?php
use \koolreport\processes\Filter;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new Filter(array(
            array("region","=","asia"),
            array("customerAge",">",50)
        )))
        ...
    }
}
                        

Operators

Below are the list of supported operators in Filter

Name description example
= Equal to array("age","=",32)
!= Not equal to array("age","!=",32)
> Greater than array("age",">",32)
< Less than array("age","<",32)
>= Greater than or equal to array("age",">=",32)
<= Less than or equal to array("age","<=",32)
contain Contain a string array("name","contain","John")
notContain Not contain a string array("name","notContain","John")
startWith Start with a string array("name","startWith","John")
notStartWith Not start with a string array("name","notStartWith","John")
endWith End with a string array("name","endWith","John")
notEndWith Not end with a string array("name","notEndWith","John")
between Between two given values array("name","between",24,32)
notBetween Not between two given values array("name","notBetween",24,32)

Or condition

By default the condition are joined with and operator, however we can change to or like below example:


...
->pipe(new Filter(array(
    array("region","=","asia"),
    'or',
    array("region","=","america")
)))
...
                        

Limit

Limit process will limit the rows returned. It works the same as LIMIT statement in SQL Query.

Example


<?php
use \koolreport\processes\Limit;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new Limit(array(20,10));//Limit 20 rows starting from offset 10
        ...
    }
}
                        

You may omit the offset value

Example: ->pipe(new Limit(array(10)) will return first 10 rows.

Sort

Sort process helps you to sort data based on columns. It works like the SORT BY in SQL Statement. The input array contains key=>value where key will be column name and the value is the direction of sort.

Example


<?php
use \koolreport\processes\Sort;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new Sort(array(
            "age"=>"asc",
            "name"=>"desc"
        ));
        ...
    }
}
                        

Sort by custom function

If you need sort differently rather than simple desc or asc, you can set your own comparison function


<?php
use \koolreport\processes\Sort;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new Sort(array(
            "age"=>function($a,$b){
                return $a<$b;
            },
        ));
        ...
    }
}
                        

Join

Join process help to join two data sources based on the matching of key columns.

Example


<?php
use \koolreport\processes\Join;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        $user_source = $this->src('user_source')->query("select id,name from users");
        $purchase_source = $this->src('purchase_source')->query("select user_id,item,amount from purchases");
        //Note that: user_source and purchase_source can be from different database
        $join = new Join($user_source,$purchase_source,array("id"=>"user_id"));
        $join->pipe($this->dataStore('together'));
    }
}
                        

In above example, we join data from different data sources. The output data will contains data from both sources that is match "user_id" to "id".

Columns

Columns processes are those causes changes to data column such as create new columns, remove columns or alternate column meta data.

CalculatedColumn

CalculatedColumn helps you to create new column from existed ones. For example, you have price and quantity column, you want to create new column named amount which is the price x quantity. CalculateColumn will take expression to create new column.

Example


<?php
use \koolreport\processes\CalculatedColumn;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new CalculateColumn(array(
            "amount"=>"{price}*{quantity}",
            "recievedAmount"=>"{amount}-{fee}",
            "power"=>"pow({number_column},10)"
        )))
        ...
    }
}
                        

You may define the new column by function:


...
->pipe(new CalculateColumn(array(
    "amount"=>function($data){
        return $data["price"]*$data["quantity"];
    },
)))
...
                        

Create new column store the row number:


...
->pipe(new CalculateColumn(array(
    "rowNum"=>"{#}"
)))
...
                        

You can set custom meta data for the new column


...
->pipe(new CalculateColumn(array(
    "amount"=>array(
        "exp"=>"{price}*{quantity}",
        "type"=>"number",
    ),
    "name"=>array(
        "exp"=>function($data){
            return $data["first_name"]." ".$data["last_name"];
        },
        "type"=>"string",
    )
)))
...
                        

AggregatedColumn

AggregatedColumn are specical process help you to calculate the aggregated result of an column and create a new column to store result. For example, you have a sale_amount column, the AggregatedColumn can help you to calculate the total sale_amount and put result in column name total_sale_amount

Example


...
->pipe(new AggregatedColumn(array(
    "total_sale_amount"=>array("sum","sale_amount")
)))
...
                        

The great thing is now you are capable of calculating percentage of each sale_amount over the total_sale_amount with CalculatedColumn process.


...
->pipe(new AggregatedColumn(array(
    "total_sale_amount"=>array("sum","sale_amount")
)))
->pipe(new CalculatedColumn(array(
    "percentage"=>"{sale_amount}*100/{total_sale_amount}"
)))
...
                        

Supported Operation

Beside "sum", AggregatedColumn supports "count", "avg", "min", "max" operation.

ColumnMeta

ColumnMeta does nothing to data except that it allows you to set meta data for columns. Meta data are extra description of columns. Meta data define the column type and extra settings depended on column type.

Common properties

Name type default description
name string Set new name for column
type string unknown Type of column data which can be "string", "number" or "datetime"
hidden bool false Whether the column is hidden
label string The column name The display name of column

Number settings ("type"=>"number")

Name type default description
decimals number 0 The number of zeros after decimal point
decimalPoint string "." The decimal point
thousandSeparator string "," The separator character between each thousand
prefix string The character in front of number. If the number represent dollar currency, your prefix could be "$"
suffix string The character appeared at the end of number.

Datetime settings ("type"=>"datetime")

Name type default description
format string "Y-m-d H:i:s" The format string of datetime column

Example


<?php
use \koolreport\processes\ColumnMeta;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new ColumnMeta(array(
            "customerName"=>array(
                "type"=>"string",
                "label"=>"Customer Name"
            ),
            "amount"=>array(
                "type"=>"number",
                "label"=>"Amount in USD"
                "decimals"=>2,
                "prefix"=>"$ "
            ),
            "orderDate"=>array(
                "type"=>"datetime",
                "label"=>"Order Date",
                "format"=>"Y-m-d H:i:s",
            )
        )))
        ...
    }
}
                        

ColumnsSort

ColumnsSort process helps you to sort the column list by its name or its labels. For example, you have data {name:"Peter",age:35}, the ColumnsSort will turn those data to {age:35,name:"Peter"} if you choose to sort by name of columns.

Example


<?php
use \koolreport\processes\ColumnsSort;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        //Sort by the name of columns
        ->pipe(new ColumnsSort(array(
            "{name}"=>"asc", 
        )))
        ...

        //Sort by the label of columns
        ->pipe(new ColumnsSort(array(
            "{label}"=>"asc", 
        )))
        ...


        //Sort by name of columns using custom function
        ->pipe(new ColumnsSort(array(
            "{name}"=>function($a,$b){
                return $a<$b;
            }, 
        )))
        ...


        //Sort by label of columns using custom function
        ->pipe(new ColumnsSort(array(
            "{label}"=>function($a,$b){
                return $a<$b;
            }, 
        )))
        ...
    }
}
                        

CopyColumn

CopyColumn will make a copy of an existed column

Example


<?php
use \koolreport\processes\CopyColumn;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new CopyColumn(array(
            "name"=>"copy_of_name",
            "amount"=>"copy_of_amount"
        )))
        ...
    }
}
                        

RemoveColumn

Sometime, you need to remove columns after usage to keep thing neat. RemoveColumn process help you to remove those unwanted columns.

Example


<?php
use \koolreport\processes\CopyColumn;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new RemoveColumn(array(
            "extra_column",
            "unimportant_column"
        )))
        ...
    }
}
                        

The "extra_column" or "unimportant_task" will be removed from datastream

ColumnRename

If you want to rename a column, you may use ColumnRename process.

Example


<?php
use \koolreport\processes\ColumnRename;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new ColumnRename(array(
            "old_name"=>"new_name",
            "amount"=>"sale_amount",
        )))
        ...
    }
}
                        

The "old_name" and "amount" column will be renamed to "new_name" and "sale_amount"

OnlyColumn

OnlyColumn process helps you to keep those columns data you need.

Example


<?php
use \koolreport\processes\OnlyColumn;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new OnlyColumn(array(
            "important_column",
            "another_important_one"
        )))
        ...
    }
}
                        

In above example, ONLY "important_column" column and "another_important_one" column will be kept.

Transformation

The Transformation processes will make change to the data cell, transform original value to another.

The Transformation process is needed when

  1. The data from source is not clean
  2. We need to categorize data for later grouping
  3. Mapping value to another

DateTimeFormat

DateTimeFormat process will transform to datetime of a column to another format. To transform it requires the original format of datetime and the format you want to convert to.

Example


<?php
use \koolreport\processes\DateTimeFormat;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new DateTimeFormat(array(
            "last_login_time"=>"F j, Y",
            "created_time"=>array(
                "from"=>"Y-m-d H:i:s",
                "to"=>"F j, Y"
            )
        )))
        ...
    }
}
                        

There are two ways to write settings for DateTimeFormat

  1. Long hand: "created_time"=>array("from"=>"Y-m-d","to"=>"F j, Y"). This is very clear that this created_time column will be converted from 2016-1-12 to January 12, 2016
  2. Short hand: "last_login_time"=>"F j, Y". This is used when the date format of the column (in this example is "last_login_time") has been defined with ColumnMeta. If not then the default "from" format is "Y-m-d H:i:s".

TimeBucket

When you have datetime column and you want to group those date into week of year, month, quarter or year, TimeBucket process will come into play. This process will collect the datetime and put them into bucket of your choice whether year or month or other.

TimeBucket is great when used with Group or Pivot process

The datetime data after categorized by TimeBucket can be aggregated by Group or Pivot process.

Example


<?php
use \koolreport\processes\TimeBucket;
use \koolreport\processes\Group;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new TimeBucket(array(
            "created_time"=>"quarter"
        )))
        ->pipe(new Group(array(
            "by"=>"created_time",
            "sum"=>"amount"
        ))
        ...
    }
}
                        

Preset time bucket

Bucket description
date Categorize datetime into date
month Categorize datetime into month
quarter Categorize dateime into quarter
week Categorize datetime into week number
year Categorize datetime into year
hourofday Categorize time into hour of day, outout value's range is [0-23]
dayofweek Categorize datetime into day of week, output value's range is [0-6] [Monday - Sunday]
dayofmonth Categorize datetime into day of month, output value's range is [1-31]
monthofyear Categorize datetime into month of year, output value's range is [1-12] [January - December]

NumberBucket

NumberBucket helps to categorize number into group of predefined range for example 0-10 or 10-20.

NumberBucket is great when used with Group or Pivot process

The number data after categorized by NumberBucket can be aggregated by Group or Pivot process.

Example


<?php
use \koolreport\processes\NumberBucket;
use \koolreport\processes\Group;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new NumberBucket(array(
            "age"=>array("step"=>5)
        )))
        ->pipe(new Group(array(
            "by"=>"age",
            "avg"=>"income"
        ))
        ...
    }
}
                        

Settings

For each column need bucket, we can have follow advanced settings:

Name type default description
step number *required This is range of bucket you want to set
formatString string "{from} - {to}" The format string of output. With default settings, output will be 0-10 for example.
decimals number 0 The number of decimals for {from} and {to}
thousandSeparator string "," Thousand separator format for number
decimalPoint string "." Decimal character separating number and it's decimal
prefix string The string in front of number
suffix string The string goes after number

Advanced example

This below example will count the number of people by their income ranges.


<?php
use \koolreport\processes\NumberBucket;
use \koolreport\processes\Group;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new NumberBucket(array(
            "income"=>array(
                "step"=>1000,
                "formatString"=>"From {from} to {to}",
                "decimals"=>0,
                "prefix"=>"$",
            )
        )))
        ->pipe(new Group(array(
            "by"=>"income",
            "count"=>"user_id"
        ))
        ...
    }
}
                        

NumberRange

NumberRange helps to set custom name for defined range. For example, you can defined income from $0 to $1000 as low, $1000 to $6000 is medium and above $6000 is high.

Example


<?php
use \koolreport\processes\NumberRange;
use \koolreport\processes\Group;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new NumberRange(array(
            "income"=>array(
                "low"=>array(null,999.99)
                "medium"=>array(1000,5999.99),
                "high"=>array(6000,null)
            )
        )))
        ->pipe(new Group(array(
            "by"=>"income",
            "count"=>"user_id"
        ))
        ...
    }
}
                        

StringCase

StringCase process helps you to format string

Example


<?php
use \koolreport\processes\StringCase;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new StringCase(array(
            "upper"=>"country,region",
            "lower"=>"address1,address2",
            "first-cap"=>"description,note",
            "all-cap"=>"name"
        )))
        ...
    }
}
                        

Case function

Name description example
lower Lowercase all data of a column "joHn DoE"=>"john doe"
upper Uppercase data of a column "joHn DoE"=>"JOHN DOE"
first-cap Capitalize the first character "john doe"=>"John doe"
all-cap Capitalize all words in string "john doe"=>"John Doe"

StringTrim

StringTrim helps to trim off the space or some of special character at beginning and at the end of string data

Example


<?php
use \koolreport\processes\StringTrim;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new StringTrim(array(
            "name",
            "address",
            "description"
        )))
        ...
    }
}
                        

You can set what to trim off

Set character_mask to determine what character to trim off. For example:


->pipe(new StringTrim(array(
    "name",
    "address",
    "description",
    "character_mask"=>"\t\n\r\0\x0B"
)))									
                                

ValueMap

ValueMap process helps to map from one value to other.

Example


<?php
use \koolreport\processes\ValueMap;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new ValueMap(array(
            "level"=>array(
                "0"=>"easy",
                "1"=>"medium",
                "2"=>"hard"
            )
        )))
        ...
    }
}
                        

Change the column meta data

Problem: It happens that the converted value may be in different type compared to original value As the above example, we map from number to string.

Solution: You can add the {meta} to set new meta data for columns


<?php
use \koolreport\processes\ValueMap;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new ValueMap(array(
            "level"=>array(
                0=>"Monday",
                1=>"Tuesday",
                3=>"Wednesday",
                4=>"Thursday",
                5=>"Friday",
                6=>"Saturday",
                7=>"Sunday",
                "{meta}"=>array(
                    "type"=>"string",
                )
            )
        )))
        ...
    }
}
                        

Set a custom function to map value

Sometime you need function to map value to another. Here is how to do:


<?php
use \koolreport\processes\ValueMap;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new ValueMap(array(
            "level"=>array(
                "{func}"=>function($value){
                    return "This is $value";
                },
                "{meta}"=>array(
                    "type"=>"string",
                )
            )
        )))
        ...
    }
}
                        

Custom

Custom is a special process which you set your own function to perform custom change to data row. Your function will receive data row as parameters. After you are done with processing, return the row data for the next process.

Example


<?php
use \koolreport\processes\Custom;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new Custom(function($row){
            $row["name"] = strtolower($row["name"]);
            return $row;
        }))
        ...
    }
}
                        

Analysis

Analyis processes are those summarize whole data and output meaningful information or data

Group

Group process acts like the GROUP BY statement in SQL Query.

Example


<?php
use \koolreport\processes\Group;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new Group(array(
            "by"=>"country,state",
            "sum"=>"polulation"
        )))
        ...
    }
}
                        

Properties

Name type default description example
by string List of columns to be group "by"=>"country,state"
sum string List of columns to be sum "sum"=>"population"
avg string List of columns to be averaged "avg"=>"income"
min string List of columns to get min value "min"=>"income"
max string List of columns to get max value "avg"=>"max"
sort bool true Whether to sort data in grouped columns "sort"=>false

Pivot

Pivot process is moved to Pivot package

The `Pivot` process has been move to the Pivot package. Pivot package contains both Pivot process to analyze data as well as Pivot widget to let user navigate data summarization.

Pivot process helps to summarize data in multi dimensions. If you have familiar with Excel Pivot table then Pivot produce results in the same way. However, Pivot process is more advanced in the way that it does not stop at two dimensions rather it can summarize data in three or more dimensions.

Example


<?php
use use \koolreport\pivot\processes\Pivot;
class MyReport extends \koolreport\KoolReport
{
    public function setup()
    {
        ...
        ->pipe(new Pivot(array(
            "dimensions"=>array(
                "row"=>"customerName",
                "column"=>"productName",
            ),
            "aggregates"=>array(
                "sum"=>"dollar_sales"
            )
        )))
        ...
    }
}
                        

Creating new process

Creating new data process is very simple. You follow this below template:


<?php

class MyProcess extends \koolreport\core\Process
{
    public function onInit()
    {
        // Get called when the process is initiated.
        // You may get the params setting from $this->params
    }
    public function onStartInput()
    {
        // Get called when the previous process prepared to send your process data.
        // You may know which process send this start input through $this->streamingSource
    }
    public function onInput($data)
    {
        // $data is the associate array in format array("customerName"=>"John Doe","amount"=>500)
        // Get the streaming source through $this->streamingSource
    }
    public function onEndInput()
    {
        // Get called when previous process ended input
        // You may know which process end by $this->streamingSource
    }
}
                
Data Sources Data Visualization