Updating Database Structures with Flyway, Apache Hop, and Putki
NOTE: this is a repost of a post on the know.bi blog by Adalennis Buchillón Soris earlier.
Managing database schema changes is a critical aspect of any data engineering workflow. In this blog post, we’ll cover three practical methods to perform database migrations using Flyway, a popular open-source tool for database version control.
1. Using standalone Flyway with PostgreSQL
You can use Flyway through a standalone installation. In this setup, you manually configure and run migrations against a PostgreSQL database.
Steps to perform a Flyway database migration
Follow these steps to set up and run a database migration using Flyway:
1. Install Flyway
- Download Flyway from the official Flyway website.
- Follow the installation instructions for your operating system (e.g., Windows, macOS, or Linux).
- Ensure the Flyway executable is accessible in your terminal.
2. Set Up Your Flyway Project
Use the Flyway client to create a new project:
- This will generate the necessary project structure, including:
- A migrations folder to store your SQL migration files.
- Configuration files like flyway.toml and flyway.user.toml.
3. Create the connection to the target database
- Set an identifier: Assign a unique ID and display name for the database.
- Database details: Specify the host, port, database name, and schemas.
- Credentials: Provide the username and password for authentication.
4. Prepare your migration files
- Organize your SQL migration scripts in the migrations folder within your project directory.
- Follow Flyway’s naming convention for migration files:
Prefix:
- Starts with a version number prefixed by V. For example, V2 indicates the second version of the migration.
Separator:
- Two underscores (__) separate the version prefix from the description.
Description:
- After the double underscores, the description provides a short, meaningful explanation of the migration’s purpose.
- Each word in the description is separated by a single underscore (_) for readability.
Suffix:
- Ends with the file extension .sql, indicating it’s an SQL script.
Examples:
V1__create_table.sql
V2__add_column.sql
- Migration scripts can include SQL commands like CREATE TABLE, ALTER TABLE, or INSERT.
5. Run the migration
- Flyway provides a preconfigured command. Run the migration from the Flyway client or directly via command line:
flyway migrate \
-configFiles="/Users/user/Documents/flyway/my-flyway-project/flyway.toml,/Users/user/Documents/flyway/my-flyway-project/flyway.user.toml"
-workingDirectory="/Users/user/Documents/flyway/my-flyway-project"
-schemaModelLocation="schema-model"
-schemaModelSchemas=
-environment=target1
- Alternatively, create a bash script to run migrations:
Example bash script:
#!/bin/bash
# Set the paths to your Flyway config files and project directory
CONFIG_FILES="/Users/user/Documents/flyway/my-flyway-project/flyway.toml,/Users/user/flyway/my-flyway-project/flyway.user.toml"
WORKING_DIRECTORY="/Users/user/flyway/my-flyway-project"
SCHEMA_MODEL_LOCATION="schema-model"
SCHEMA_MODEL_SCHEMAS=""
ENVIRONMENT="target1"
# Run Flyway migrate with the specified parameters
flyway migrate \
-configFiles="$CONFIG_FILES" \
-workingDirectory="$WORKING_DIRECTORY" \
-schemaModelLocation="$SCHEMA_MODEL_LOCATION" \
-schemaModelSchemas="$SCHEMA_MODEL_SCHEMAS" \
-environment="$ENVIRONMENT"
# Check if migration was successful
if [ $? -eq 0 ]; then
echo "Migrations applied successfully!"
else
echo "Error applying migrations."
fi# Set the paths to your Flyway config files and project directoryCONFIG_FILES="/Users/user/Documents/flyway/my-flyway-project/flyway.toml,/Users/user/flyway/my-flyway-project/flyway.user.toml"WORKING_DIRECTORY="/Users/user/flyway/my-flyway-projectSCHEMA_MODEL_LOCATION="schema-model"SCHEMA_MODEL_SCHEMAS=""ENVIRONMENT="target1"# Run Flyway migrate with the specified parametersflyway migrate \-configFiles="$CONFIG_FILES" \-workingDirectory="$WORKING_DIRECTORY" \-schemaModelLocation="$SCHEMA_MODEL_LOCATION" \-schemaModelSchemas="$SCHEMA_MODEL_SCHEMAS" \-environment="$ENVIRONMENT"
6. Verify migration
- After the migration completes, verify that the changes were applied:
- Check the target database for the expected schema updates or data changes.
- Flyway maintains a “flyway_schema_history” table in the database, where it logs applied migrations. Review this table to confirm migration status.
7. Maintain your migration workflow
- Version SQL migration files correctly to ensure proper execution order.
- Test all migration scripts in a development or staging environment before production.
- If using out-of-order migrations, ensure they are intentional and well-tested to avoid unexpected issues.
2. Using Apache Hop with a Shell action
Apache Hop can automate the Flyway process by embedding it into workflows. This method is particularly useful when managing larger, more complex pipelines.
1. Set up your migration files
As with the standalone approach, organize your SQL files in a folder.
2. Create an Apache Hop workflow
- Open Apache Hop and create a new workflow.
- Add a Shell action to the canvas.
3. Configure the Shell action
- Enable script insertion: Check the “Insert script” option to embed your script directly in the action.
- Use the provided bash script: Copy and paste the Bash script mentioned earlier into the script field.
- Dynamic path configuration: Optionally, define a ${FLYWAY_PATH} variable in Apache Hop to dynamically set the Flyway installation path for flexibility across environments.
4. Execute the workflow
- Link the Shell action to other actions in your workflow as needed.
- Run the workflow to apply the migrations.
Advantages:
- Integrates database migrations into a broader data pipeline.
- Allows for centralized monitoring and logging of migration processes.
3. Using the Flyway Plugin in Putki Platform
The Flyway migrate plugin, which will be part of the first Putki release, offers a fully integrated approach to database migrations. Unlike the previous methods, this does not require a standalone Flyway installation.
Key features:
- Integration: It’s fully integrated within Putki, no need for external tools or installations.
- Database Migrations: Automate the management and application of schema changes.
- Support for Out-of-Order Migrations: Enables flexibility in applying schema changes even when they are not sequential.
- Database Connection Setup: Easily configure and manage database connections within the platform.
- Management & Monitoring: Migrate, monitor, and log schema changes all within the Putki platform.
1. Install the plugin or verify it’s installed
Ensure the Flyway plugin is installed and the necessary JDBC driver is placed in the correct directory:
2. Set up migration files
Organize your migration files as before.
3. Create a Putki workflow
- Add the Flyway migration action from the plugin library.
- Configure the action to point to your migration file directory and database connection.
- Flyway migrate: Use a unique name in your workflow for the Flyway action.
- Connection: Select your database connection.
- SQL Folder: Add the path to the folder containing your migration files.
- You can also enable advanced options like “Allow out-of-order migration files” for more flexibility. Flyway runs migrations sequentially by default; this option allows out-of-order execution.
Out-of-order example: If V3 has already been applied and V2 is missing, enabling outOfOrder allows Flyway to run V2 afterward.
4. Run the workflow
- Connect the Flyway action to other workflow actions as needed.
- Execute the workflow to apply the migrations.
- Review the execution results to ensure everything is correct and verify your migration in the database.
Key takeaways
- Database compatibility: While the example uses a PostgreSQL database, Flyway supports many popular SQL databases out of the box, such as PostgreSQL, MySQL, Oracle, and SQL Server. For other databases, you can still use Flyway by simply including the appropriate JDBC driver.
- Database connection: In Apache Hop and Putki, database connections are managed as metadata objects, allowing centralized, reusable configurations for workflows and pipelines. For setup details, refer to the Apache Hop documentation or the Putki platform docs.
- Migration handling: In this example, we first create the director table with fields like director_id, first_name, last_name, and birth_date. In the second migration (V2), we add the nationality field to the table. Flyway automatically tracks which migrations have already been applied, ensuring they are not executed again.
- Migration files & naming conventions: The migration files in the example are stored in a folder within the Putki project, but you are free to choose any folder you prefer, as long as you reference it correctly. It’s important to follow Flyway’s naming convention.
- The Flyway plugin in Putki offers several key benefits:
- Integration: Fully integrated within the Putki platform, eliminating the need for external tools.
- Flexibility: Supports out-of-order migrations, allowing for more flexibility when applying schema changes.
- Ease of Use: Simplifies database connection setup and migration management, reducing manual steps.
- Monitoring: Enables efficient tracking and logging of schema changes, all within Putki.
Conclusion
- For quick and simple migrations, the standalone Flyway setup is a great choice.
- For automation within data pipelines, Apache Hop with a Shell action offers flexibility.
- For seamless integration and advanced capabilities, the Flyway plugin in Putki Platform is an optimal solution.
Choose the method that best fits your workflow, and ensure your database schemas evolve smoothly alongside your applications.
Don’t miss the YouTube video for the Flyway plugin step-by-step walkthrough!
Stay connected
If you have any questions or run into issues, contact us and we’ll be happy to help.