Slicing and Dicing

Let us look in more details at our last query.

The corresponding facts, in JSON, are in the FactTable array. In this case, there is only one fact.

    {
      "_id" : "633e1c7a-ef35-441e-8cc0-0431a9809f8b",
      "KeyAspects" : [
        "jppfs-cor:ConsolidatedOrNonConsolidatedAxis",
        "xbrl:Period",
        "xbrl:Entity",
        "xbrl:Concept",
        "xbrl:Unit"
      ],
      "Aspects" : {
        "jppfs-cor:ConsolidatedOrNonConsolidatedAxis" : "jppfs-cor:NonConsolidatedMember",
        "xbrl:Period" : "2014-03-31",
        "xbrl:Entity" : "http://disclosure.edinet-fsa.go.jp E04147-000",
        "xbrl:Concept" : "jppfs-cor:Assets",
        "xbrl:Unit" : "iso4217:JPY"
      },
      "Type" : "NumericValue",
      "Value" : 6913416000000,
      "Decimals" : -6,
      "Concept" : {
        "Balance" : "DEBIT",
        "PeriodType" : "instant",
        "DataType" : "xbrli:monetaryItemType"
      }
    }

There are two fields that you are already familiar with: Aspects and Value. We are now going through the others, beginning with the KeyAspects field.

Slicer aspects, dicer aspects

The behavior of slicing and dicing in cell stores is very similar to that of slicing and dicing in OLAP warehouses, for example, with MDX. It is also very similar to the behavior of pivot tables in Excel.

In short:

  • facts are grouped along dicer dimensions.
  • facts are aggregated along slicer dimensions, to the extent that an aggregation function is specified. Also, by default, slicers are not visible in the output fact table.

Key aspects: the dicers

If you are familiar with databases (even RDBMS), key aspects are the cell store equivalent of a primary key.

If you are familiar with data warehouses (OLAP), key aspects are the cell store equivalent of a dicer.

In the example shown above, you can see in the KeyAspects field that all five aspects of this fact are dicers. These five together unambiguously identify the fact at hand, so that there is only that fact in the cell with these dimensional coordinates.

Builtin behavior for slice and dice

The default behavior of the cell store is that, in a hypercube query, the builtin aspects (concept, entity, period, unit) are dicers, and so are all additional XBRL dimensions.

Also, the builtin aspects are always implicitly part of the hypercube query. If not explicitly specified by the user, they are non-filtering dicers, that is, do not restrict the domains of these four builtin aspects.

Facts contain further aspects that were added to the facts upon import, such as:

  • fiscal year and period stamps
  • the archive ID
  • a boolean stating whether the concept is an extension concept as opposed to a concept from the original taxonomy

All these further aspects are slicers by default. In particular, they will not appear by default and need to be explicitly added to the hypercube query.

In the SEC and Japan profiles, as opposed to the generic profile, some of these slicers, such as archive IDs or fiscal periods and years, are implicitly added as visible slicers for convenience.

Default values

Like in OLAP, a hypercube query implicitly contains all the XBRL dimensions it doesn't use as slicers that filter for their default value. Adding a dimension to a hypercube hence allows dicing it and looking into further values.

For the sake of experimenting, let us remove the jppfs-cor:ConsolidatedOrNonConsolidatedAxis dimension from the example above and see what happens.

The fact quoted above has disappeared: since jppfs-cor:ConsolidatedOrNonConsolidatedAxis is now missing in the hypercube, it becomes a slicer for its default value. The fact table output by the above query, contains one fact that has only has four dicers.

If you know add the dimension with, with a value of ALL, then its entire domain will appear: the default value, and its (in this case unique) member.

Key aspects scale up

The number of default dicers, or primary keys, scales up elegantly with cell stores: the number of dicers in the (unique) cell store hypercube already scales up today beyond the 10,000 threshold, as opposed to a few hundreds in an OLAP data cube, or typically just a few primary keys in a relational table.