Step-by-step guide to Workflow Logging
NOTE: this is a repost of a post on the know.bi blog by Adalennis Buchillón Soris earlier.
Workflow Log
Apache Hop is a data engineering and data orchestration platform that allows data engineers and data developers to visually design workflows and data pipelines to build powerful solutions.
After your project has gone through the initial development and testing, knowing what is going on in runtime becomes important.
The Workflow Logs in Hop allow workflow logging information to be passed down to a pipeline for processing as JSON objects. The receiving pipeline can process this logging information with all the functionality Hop pipelines have to offer, e.g. write to a relational or NoSQL database, a Kafka topic, etc.
Hop will send the logging information for each workflow you run to the Workflow Log pipeline you specify.
In this post, we’ll look at an example of how to configure and use the Workflow Log metadata to write workflow logging information to a relational database.
Step 1: Create a Workflow Log metadata object
To create a Workflow Log click on the New -> Workflow Log option or click on the Metadata -> Workflow Log option.
The system displays the New Workflow Log view with the following fields to be configured.
The Workflow Log can be configured as in the following example:
- Name: The name of the metadata object (workflow-log).
- Enabled: Logging is active by default (enabled).
- Logging parent workflows only: Ensures that only parent workflows are logged, avoiding redundancy in nested executions (enabled).
- Pipeline executed to capture logging: Select or create the pipeline to be used for logging the activity (${PROJECT_HOME}/code/logging/workflow-log-database.hpl). Remember that ${PROJECT_HOME} represents the root directory of your project.
You can either select an existing pipeline, specify a custom path where you plan to create your logging pipeline, or create a new one directly.
In the next step, we’ll customize the pipeline.
The only requirement is that the pipeline must begin with a Workflow Logging Transform as the first step to ensure proper log capture and processing. - Execute at the start of a workflow: Logs are captured at the beginning of workflow execution (enabled).
- Execute at the end of a workflow: Logs are captured at the completion of the workflow (enabled).
- Execute periodically during execution: This option allows the workflow to log progress at regular intervals while the workflow is running. In this case, periodic logging is turned off (disabled).
- Interval in Seconds: If periodic execution is enabled, logs would be captured every 30 seconds during the workflow’s execution, providing real-time insights into its progress (30).
Finally, save the workflow log configuration.
Workflow logging will apply to any workflow you run in the current project. That may not be necessary or even not desired. If you want to only work with logging information for a selected number of workflows, you can add a selection of workflows to the table below the configuration options (Capture output of the following workflows).
The screenshot below shows the single “flights-processing.hwf” workflow from my-hop-project.
Step 2: Create a new pipeline with the Workflow Logging transform
To create the pipeline you can go to the perspective area or by clicking on the New button in the New Workflow Log dialog. Then, choose a folder and a name for the pipeline.
A new pipeline is automatically created with a Workflow Logging transform connected to a Dummy transform (Save logging here).
Now it’s time to configure the Workflow Logging transform. This configuration is very simple, open the transform and set your values as in the following example:
- Transform name: choose a name for your transform, just remember that the name of the transform should be unique in your pipeline (log).
- Also log transform: selected by default.
Step 3: Add and configure a Table output transform
The Table Output transform allows you to load data into a database table. Table Output is equivalent to the DML operator INSERT. This transform provides configuration options for the target table and a lot of housekeeping and/or performance-related options such as Commit Size and Use batch update for inserts.
TIP: In this example, we are going to use a relational database connection to log but you can also use output files. In case you decide to use a database connection, check the installation and availability as a pre-requirement.
Add a Table Output transform by clicking anywhere in the pipeline canvas, then Search ‘table output’ -> Table Output.
Now it’s time to configure the Table Output transform. Open the transform and set your values as in the following example:
- Transform name: choose a name for your transform, just remember that the name of the transform should be unique in your pipeline (workflows logging).
- Connection: The database connection to which data will be written (logging-connection). The connection was configured by using the logging-connection.json environment file that contains the variables:
- Target table: The name of the table to which data will be written (workflows-log).
- Click on the SQL option to generate the SQL to create the output table automatically:
- Execute the SQL statements. In this simple scenario, we’ll execute the SQL directly. In real-life projects, consider managing your DDL in version control and through tools like Liquibase or Flyway.
- Open the created table to see all the logging fields:
- Close and save the transform.
Step 4: Run a workflow and check the logs
Finally, run a workflow by clicking on the Run -> Launch option. The Workflow Log pipeline will be executed by any workflow you’ll run.
In this case, we use a basic workflow that executes two pipelines, both are included in my-hop-project.
The data of the workflow execution will be recorded in the workflows-log table.
Check the data in the table.
Remarks
- Default logging behavior → Step 1: By default, all workflows are logged. However, users can customize which workflows they want to log by enabling or disabling logging for specific workflows in the workflows log settings.
- Pipeline Logging transform → Step 2: When the new option is used from the workflow log dialog, a pipeline with the workflow logging transform is automatically created.
- Log storage options → Step 3: If the pipeline is created from scratch, the user must manually add the workflow logging transform as the input transform to ensure logging functionality.
- Users can also choose alternative storage options such as output files to write logs, providing flexibility based on project requirements.
Don’t miss the YouTube video for a step-by-step walkthrough!
Next steps
You now know how to use the workflow log metadata type to work with everything Apache Hop has to offer to process your workflow logging information.
Feel free to reach out if you’d like to find out more or to discuss how we can help with pipeline logging or any other aspect of your data engineering projects with Apache Hop.