Chapter 5. Train the model by using your prepared data


To train the model, you can use the Red Hat Training Hub and the KubeFlow Trainer Operator.

You can simplify and accelerate the process of fine-tuning and customizing a foundation model by using your own data. The Red Hat Training Hub is an algorithm-focused interface for common LLM training, continual learning, and reinforcement learning techniques.

5.1. Explore the Training Hub examples

The Training Hub repository hosts multiple cookbooks for using different LLM algorithms such as supervised fine tuning (SFT) and Orthogonal Subspace Fine Tuning (OSFT)/Continual Learning. OSFT is a training algorithm built by the Red Hat AI Innovation team. With OSFT, you can continually post-train a fine-tuned model to expand its knowledge on new data. You can tinker with the Training Hub cookbooks from a workbench within your OpenShift AI project.

To get started with the Training Hub, explore the provided examples.

Prerequisites

Procedure

  1. To access the Training Hub examples, clone the Training Hub Git repository:

    • To clone the https://github.com/Red-Hat-AI-Innovation-Team/training_hub.git repository from JupyterLab, follow the steps in Clone an example Git repository.

      • To create a local clone of the repository, run the following command:

        git clone https://github.com/Red-Hat-AI-Innovation-Team/training_hub
        Copy to Clipboard Toggle word wrap
  2. Go to the examples directory to view the Training Hub notebooks, Python scripts and documentation.

    • For a quick overview and descriptions of the supported algorithms and features, with links to examples and getting started code, see the top-level README file.
    • For detailed parameter documentation, see the docs directory.
    • For hands-on learning with the interactive notebooks, see the notebooks directory.
    • For pre-written, configurable python scripts to run training algorithms with various language models, see the scripts directory.

5.2. Estimate memory usage

To learn how to estimate the amount of memory you need for running and training a specific model, as well as whether your configured GPUs can support the model, use the memory estimator. The memory estimator currently supports only supervised fine tuning (SFT) and Orthogonal Subspace Fine Tuning (OSFT) algorithms. See the following example files in the Training Hub Git repository:

  • For the Memory Estimator API, see the src/training_hub/profiling/memory_estimator.py file.
  • For an example notebook that uses the API, see notebooks/memory_estimator_example.ipynb file.

You can use the OSFT (Orthogonal Subspace Fine-Tuning) and SFT (Supervised Fine-Tuning) algorithms in the Training Hub.

Use SFT to fine-tune a model on supervised datasets with support for:

  • Single-node and multi-node distributed training
  • Configurable training parameters, for example, epochs, batch size, and learning rate.
  • InstructLab-Training backend integration

Use OSFT to fine-tune a model while controlling how much of its existing behavior to preserve, with support for:

  • Single-node and multi-node distributed training
  • Configurable training parameters (epochs, batch size, learning rate, etc.)
  • RHAI Innovation Mini-Trainer backend integration

The examples/docs directory contains information and examples for how to use each algorithm.

Here is a performance comparison of using OSFT and SFT in the Training Hub:

  • Memory scaling: OSFT memory scales linearly with the unfreeze rank ratio (URR) which is a hyperparameter for OSFT that is a value between 0 and 1 representing the fraction of the matrix rank that is unfrozen and updated during fine-tuning.

    A rough comparison can be expressed as OSFT Memory ~ 3r times SFT Memory where r is the URR unfreeze rank ratio — the fraction of the matrix being fine-tuned. At URR = 1/3, OSFT and SFT have similar memory usage.

    In most post-training setups, URR values below 1/3 are sufficient for learning new tasks, making OSFT notably lighter in memory.

  • Training time: On datasets of equal size, OSFT typically takes about twice as long per phase. However, because OSFT does not require replay buffers from past tasks (unlike SFT), the total training time across multiple phases or tasks is lower with clear benefits as the number of tasks grows. Because OSFT supports continual learning without maintaining or reusing old data, it enables lighter, single-pass end-to-end runs.

If you want to implement distributed training across multiple nodes to meet the needs of your training workloads, you can use the KubeFlow Trainer Operator (KFTO). KFTO abstracts the underlying infrastructure complexity of distributed training and fine-tuning of models. The iterative process of fine-tuning significantly reduces the time and resources required compared to training models from scratch.

Learn more about the KubeFlow Trainer Operator in the following OpenShift AI documentation:

The KubeFlow Trainer Operator supports distributed fine-tuning by using Training Hub, abstracting the complexity of distributed training. It seamlessly manages scaling and orchestration for you, allowing you to focus on your domain-specific fine-tuning logic by using the simplified Training Hub APIs.

For a comprehensive tutorial on Fine Tuning with Training Hub leveraging distributed nodes with Kubeflow Trainer, follow these guided examples:

Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top