Command Line Interface

To process a filing or a taxonomy through the command line interface (CLI) we need to mount one of the folders of the host machine, denoted <INFOSETGENERATOR_WORK_FOLDER> in what follows, in the Infoset Generator Docker container /var/infoset-generator folder. The Infoset Generator will be able to access files in the folder <INFOSETGENERATOR_WORK_FOLDER> through the mount point /var/infoset-generator inside the container. The path of the filing to process and of the processed infoset will then all need to start with /var/infoset-generator. For more details refer to (https://docs.docker.com/engine/userguide/containers/dockervolumes/).

Processing a filing or a taxonomy is a memory intensive operation. You can specify the maximum amount of RAM that the Infoset Generator is allowed to use (in MB) through the parameter <MAX_HEAP_SIZE_MB>. Most filings and taxonomies can be processed using 4000MB of RAM, so this is a good starting value in most cases. This limit shold not be less than 1000MB. If this value exceeds the amount of free memory on your system, performance might be severely degraded and your operating system might be forced to kill running processes. If the memory limit is insufficient to process a given filing or taxonomy an OutOfMemoryError will be raised.

Process filings

To process a single filing we can use the following command:

docker run --rm -v <INFOSETGENERATOR_FOLDER>:/etc/infoset-generator <INFOSETGENERATOR_WORK_FOLDER>:/var/infoset-generator infoset-generator:latest process-filing <MAX_HEAP_SIZE_MB> <ARGUMENTS> <XBRL_FILING>

where <XBRL_FILING> is one of the following:

  • a path to a folder containing the XBRL/IXBRL filing
  • a path to a tar.gz or .zip archive containing an XBRL/IXBRL filing
  • a path to an XBRL/IXBRL instance file

Mandatory arguments:

  • One of:
    • --output-directory <PATH>, the directory in which to write the generated infoset (uncompressed), cannot be specified together with --output-archive.
    • --output-archive <PATH>, the file to which to write the generated infoset (compressed), cannot be specified together with --output-directory.

Optional arguments:

  • --profile <PROFILE>, the profile to use for processing the filing (e.g. SEC (U.S.A.), FSA (Japan), SVS (Chile), UK, DUTCH or NOPROFILE). Default is NOPROFILE.
  • --filing-detection-profile <PROFILE>, the profile to use to identify which files are XBRL/IXBRL instances, when <XBRL_FILING> is an compressed file or a directory. Allowed values are: AUTO (automatic detection) and FSA (automatic detection, with identification of Audit and Public documents). For debugging purposes the following values can also be specified: XBRL (.xbrl files), XML (.xml files), XBRLANDXML (.xbrl and .xml files).
  • --taxonomy, do not import facts and import all labels and concepts for the default component role.
  • --archive-id <ARCHIVE-ID>: archive id of the generated infoset. If not specified a random id will be used.
  • --entity-id <ENTITY-ID>: entity id of the generated infoset. Used only by the SEC profile.
  • --metadata-file, path to a JSON file containing archive metadata.
  • --entity-metadata-file, path to a JSON file containing entity metadata.
  • --timeout <SECONDS>, interrupt processing after the specified number of seconds. Default is 600 seconds.

Development options:

  • --pretty, turn on pretty printing.
  • --debug, produce debugging information during processing.
  • --unit-test, produce deterministic results. To be used only for unit testing.
  • --allow-missing-metadata-files, do not raise an error if the specified metadata files are not present. To be used only for unit testing.

For instance, to process a filing stored in the file /28msec/filings/source-filing.zip on the host machine and write it to the file /28msec/filings/processed-filing.zip:

docker run --rm -v /28msec/infoset-generator:/etc/infoset-generator /28msec/filings:/var/infoset-generator infoset-generator:latest process-filing 4000 --output-archive /var/infoset-generator/processed-filing.zip /var/infoset-generator/source-filing.zip

Processing taxonomies

To process a single taxonomy, given its entrypoint URI(s), we can use the following command:

docker run --rm -v <INFOSETGENERATOR_FOLDER>:/etc/infoset-generator <INFOSETGENERATOR_WORK_FOLDER>:/var/infoset-generator infoset-generator:latest process-taxonomy <MAX_HEAP_SIZE_MB> <ARGUMENTS> <ENTRYPOINTS>

where <ENTRYPOINTS> is the space-separated list of the taxonomy entrypoint URIs.

Mandatory arguments:

  • --archive-id <ARCHIVE-ID>: archive id of the generated taxonomy filing
  • --entity-scheme <ENTITY-SCHEME>: entity scheme of the generated taxonomy filing
  • --entity-id <ENTITY-ID>: entity id of the generated taxonomy filing
  • One of:
    • --output-directory <PATH>, the directory in which to write the generated infoset (uncompressed), cannot be specified together with --output-archive
    • --output-archive <PATH>, the file to which to write the generated infoset (compressed), cannot be specified together with --output-directory

Optional arguments:

  • --profile <PROFILE>, the profile to use (e.g. SEC (U.S.A.), FSA (Japan), SVS (Chile), UK, DUTCH or NOPROFILE). Default is NOPROFILE.
  • --timeout <SECONDS>, interrupt processing after the specified number of seconds. Default is 600 seconds.

Development options:

  • --pretty, turn on pretty printing.
  • --debug, produce debugging information during processing.
  • --unit-test, produce deterministic results. To be used only for unit testing.

For instance, to process the taxonomy whose entrypoint is http://28.io/example.xsd and store it in the file /28msec/filings/processed-taxonomy.zip on the host machine:

docker run --rm -v /28msec/infoset-generator:/etc/infoset-generator /28msec/filings:/var/infoset-generator infoset-generator:latest process-taxonomy 4000 --output-archive /var/infoset-generator/processed-taxonomy.zip http://28.io/example.xsd