Chapter 5. Tuning a model by using the Training Operator
To tune a model by using the Kubeflow Training Operator, you configure and run a training job.
Optionally, you can use Low-Rank Adaptation (LoRA) to efficiently fine-tune large language models, such as Llama 3. The integration optimizes computational requirements and reduces memory footprint, allowing fine-tuning on consumer-grade GPUs. The solution combines PyTorch Fully Sharded Data Parallel (FSDP) and LoRA to enable scalable, cost-effective model training and inference, enhancing the flexibility and performance of AI workloads within OpenShift environments.
5.1. Configuring the training job Copy linkLink copied to clipboard!
Before you can use a training job to tune a model, you must configure the training job. The example training job in this section is based on the IBM and Hugging Face tuning example provided in GitHub.
Prerequisites
- You have logged in to OpenShift.
- You have access to a data science cluster that is configured to run distributed workloads as described in Managing distributed workloads.
- You have created a data science project. For information about how to create a project, see Creating a data science project.
You have Admin access for the data science project.
- If you created the project, you automatically have Admin access.
- If you did not create the project, your cluster administrator must give you Admin access.
- You have access to a model.
- You have access to data that you can use to train the model.
Procedure
In a terminal window, if you are not already logged in to your OpenShift cluster, log in to the OpenShift CLI as shown in the following example:
$ oc login <openshift_cluster_url> -u <username> -p <password>Configure a training job, as follows:
-
Create a YAML file named
config_trainingjob.yaml. Add the
ConfigMapobject definition as follows:Example training-job configuration
kind: ConfigMap apiVersion: v1 metadata: name: training-config namespace: kfto data: config.json: | { "model_name_or_path": "bigscience/bloom-560m", "training_data_path": "/data/input/twitter_complaints.json", "output_dir": "/data/output/tuning/bloom-twitter", "save_model_dir": "/mnt/output/model", "num_train_epochs": 10.0, "per_device_train_batch_size": 4, "per_device_eval_batch_size": 4, "gradient_accumulation_steps": 4, "save_strategy": "no", "learning_rate": 1e-05, "weight_decay": 0.0, "lr_scheduler_type": "cosine", "include_tokens_per_second": true, "response_template": "\n### Label:", "dataset_text_field": "output", "padding_free": ["huggingface"], "multipack": [16], "use_flash_attn": false }Optional: To fine-tune with Low Rank Adaptation (LoRA), update the
config.jsonsection as follows:-
Set the
peft_methodparameter to"lora". Add the
lora_r,lora_alpha,lora_dropout,bias, andtarget_modulesparameters.Example LoRA configuration
... "peft_method": "lora", "lora_r": 8, "lora_alpha": 8, "lora_dropout": 0.1, "bias": "none", "target_modules": ["all-linear"] }
-
Set the
Optional: To fine-tune with Quantized Low Rank Adaptation (QLoRA), update the
config.jsonsection as follows:-
Set the
use_flash_attnparameter to"true". -
Set the
peft_methodparameter to"lora". -
Add the LoRA parameters:
lora_r,lora_alpha,lora_dropout,bias, andtarget_modules. -
Add the QLoRA mandatory parameters:
auto_gptq,torch_dtype, andfp16. If required, add the QLoRA optional parameters:
fused_loraandfast_kernels.Example QLoRA configuration
... "use_flash_attn": true, "peft_method": "lora", "lora_r": 8, "lora_alpha": 8, "lora_dropout": 0.1, "bias": "none", "target_modules": ["all-linear"], "auto_gptq": ["triton_v2"], "torch_dtype": float16, "fp16": true, "fused_lora": ["auto_gptq", true], "fast_kernels": [true, true, true] }
-
Set the
Edit the metadata of the training-job configuration as shown in the following table.
Expand Table 5.1. Training-job configuration metadata Parameter Value nameName of the training-job configuration
namespaceName of your project
Edit the parameters of the training-job configuration as shown in the following table.
Expand Table 5.2. Training-job configuration parameters Parameter Value model_name_or_pathName of the pre-trained model or the path to the model in the training-job container; in this example, the model name is taken from the Hugging Face web page
training_data_pathPath to the training data that you set in the
training_data.yamlConfigMapoutput_dirOutput directory for the model
save_model_dirDirectory where the tuned model is saved
num_train_epochsNumber of epochs for training; in this example, the training job is set to run 10 times
per_device_train_batch_sizeBatch size, the number of data set examples to process together; in this example, the training job processes 4 examples at a time
per_device_eval_batch_sizeBatch size, the number of data set examples to process together per GPU or TPU core or CPU; in this example, the training job processes 4 examples at a time
gradient_accumulation_stepsNumber of gradient accumulation steps
save_strategyHow often model checkpoints can be saved; the default value is
"epoch"(save model checkpoint every epoch), other possible values are"steps"(save model checkpoint for every training step) and"no"(do not save model checkpoints)save_total_limitNumber of model checkpoints to save; omit if
save_strategyis set to"no"(no model checkpoints saved)learning_rateLearning rate for the training
weight_decayWeight decay to apply
lr_scheduler_typeOptional: Scheduler type to use; the default value is
"linear", other possible values are"cosine","cosine_with_restarts","polynomial","constant", and"constant_with_warmup"include_tokens_per_secondOptional: Whether or not to compute the number of tokens per second per device for training speed metrics
response_templateTemplate formatting for the response
dataset_text_fieldDataset field for training output, as set in the
training_data.yamlconfig mappadding_freeWhether to use a technique to process multiple examples in a single batch without adding padding tokens that waste compute resources; if used, this parameter must be set to
["huggingface"]multipackWhether to use a technique for multi-GPU training to balance the number of tokens processed in each device, to minimize waiting time; you can experiment with different values to find the optimum value for your training job
use_flash_attnWhether to use flash attention
peft_methodTuning method: for full fine-tuning, omit this parameter; for LoRA and QLoRA, set to
"lora"; for prompt tuning, set to"pt"lora_rLoRA: Rank of the low-rank decomposition
lora_alphaLoRA: Scale the low-rank matrices to control their influence on the model’s adaptations
lora_dropoutLoRA: Dropout rate applied to the LoRA layers, a regularization technique to prevent overfitting
biasLoRA: Whether to adapt bias terms in the model; setting the bias to
"none"indicates that no bias terms will be adaptedtarget_modulesLoRA: Names of the modules to apply LoRA to; to include all linear layers, set to "all_linear"; optional parameter for some models
auto_gptqQLoRA: Sets 4-bit GPTQ-LoRA with AutoGPTQ; when used, this parameter must be set to
["triton_v2"]torch_dtypeQLoRA: Tensor datatype; when used, this parameter must be set to
float16fp16QLoRA: Whether to use half-precision floating-point format; when used, this parameter must be set to
truefused_loraQLoRA: Whether to use fused LoRA for more efficient LoRA training; if used, this parameter must be set to
["auto_gptq", true]fast_kernelsQLoRA: Whether to use fast cross-entropy, rope, rms loss kernels; if used, this parameter must be set to
[true, true, true]-
Save your changes in the
config_trainingjob.yamlfile. Apply the configuration to create the
training-configobject:$ oc apply -f config_trainingjob.yaml
-
Create a YAML file named
Create the training data.
NoteThe training data in this simple example is for demonstration purposes only, and is not suitable for production use. The usual method for providing training data is to use persistent volumes.
-
Create a YAML file named
training_data.yaml. Add the following
ConfigMapobject definition:kind: ConfigMap apiVersion: v1 metadata: name: twitter-complaints namespace: kfto data: twitter_complaints.json: | [ {"Tweet text":"@HMRCcustomers No this is my first job","ID":0,"Label":2,"text_label":"no complaint","output":"### Text: @HMRCcustomers No this is my first job\n\n### Label: no complaint"}, {"Tweet text":"@KristaMariePark Thank you for your interest! If you decide to cancel, you can call Customer Care at 1-800-NYTIMES.","ID":1,"Label":2,"text_label":"no complaint","output":"### Text: @KristaMariePark Thank you for your interest! If you decide to cancel, you can call Customer Care at 1-800-NYTIMES.\n\n### Label: no complaint"}, {"Tweet text":"@EE On Rosneath Arial having good upload and download speeds but terrible latency 200ms. Why is this.","ID":3,"Label":1,"text_label":"complaint","output":"### Text: @EE On Rosneath Arial having good upload and download speeds but terrible latency 200ms. Why is this.\n\n### Label: complaint"}, {"Tweet text":"Couples wallpaper, so cute. :) #BrothersAtHome","ID":4,"Label":2,"text_label":"no complaint","output":"### Text: Couples wallpaper, so cute. :) #BrothersAtHome\n\n### Label: no complaint"}, {"Tweet text":"@mckelldogs This might just be me, but-- eyedrops? Artificial tears are so useful when you're sleep-deprived and sp… https:\/\/t.co\/WRtNsokblG","ID":5,"Label":2,"text_label":"no complaint","output":"### Text: @mckelldogs This might just be me, but-- eyedrops? Artificial tears are so useful when you're sleep-deprived and sp… https:\/\/t.co\/WRtNsokblG\n\n### Label: no complaint"}, {"Tweet text":"@Yelp can we get the exact calculations for a business rating (for example if its 4 stars but actually 4.2) or do we use a 3rd party site?","ID":6,"Label":2,"text_label":"no complaint","output":"### Text: @Yelp can we get the exact calculations for a business rating (for example if its 4 stars but actually 4.2) or do we use a 3rd party site?\n\n### Label: no complaint"}, {"Tweet text":"@nationalgridus I have no water and the bill is current and paid. Can you do something about this?","ID":7,"Label":1,"text_label":"complaint","output":"### Text: @nationalgridus I have no water and the bill is current and paid. Can you do something about this?\n\n### Label: complaint"}, {"Tweet text":"@JenniferTilly Merry Christmas to as well. You get more stunning every year ��","ID":9,"Label":2,"text_label":"no complaint","output":"### Text: @JenniferTilly Merry Christmas to as well. You get more stunning every year ��\n\n### Label: no complaint"} ]-
Replace the example namespace value
kftowith the name of your project. - Replace the example training data with your training data.
-
Save your changes in the
training_data.yamlfile. Apply the configuration to create the training data:
$ oc apply -f training_data.yaml
-
Create a YAML file named
Create a persistent volume claim (PVC), as follows:
-
Create a YAML file named
trainedmodelpvc.yaml. Add the following
PersistentVolumeClaimobject definition:apiVersion: v1 kind: PersistentVolumeClaim metadata: name: trained-model namespace: kfto spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi-
Replace the example namespace value
kftowith the name of your project, and update the other parameters to suit your environment. To calculate thestoragevalue, multiply the model size by the number of epochs, and add a little extra as a buffer. -
Save your changes in the
trainedmodelpvc.yamlfile. Apply the configuration to create a Persistent Volume Claim (PVC) for the training job:
$ oc apply -f trainedmodelpvc.yaml
-
Create a YAML file named
Verification
- In the OpenShift console, select your project from the Project list.
-
Click ConfigMaps and verify that the
training-configandtwitter-complaintsConfigMaps are listed. -
Click Search. From the Resources list, select PersistentVolumeClaim and verify that the
trained-modelPVC is listed.
5.2. Running the training job Copy linkLink copied to clipboard!
You can run a training job to tune a model. The example training job in this section is based on the IBM and Hugging Face tuning example provided here.
Prerequisites
- You have access to a data science cluster that is configured to run distributed workloads as described in Managing distributed workloads.
- You have created a data science project. For information about how to create a project, see Creating a data science project.
You have Admin access for the data science project.
- If you created the project, you automatically have Admin access.
- If you did not create the project, your cluster administrator must give you Admin access.
- You have access to a model.
- You have access to data that you can use to train the model.
- You have configured the training job as described in Configuring the training job.
Procedure
In a terminal window, log in to the OpenShift CLI as shown in the following example:
$ oc login <openshift_cluster_url> -u <username> -p <password>Create a PyTorch training job, as follows:
-
Create a YAML file named
pytorchjob.yaml. Add the following
PyTorchJobobject definition:apiVersion: kubeflow.org/v1 kind: PyTorchJob metadata: name: kfto-demo namespace: kfto spec: pytorchReplicaSpecs: Master: replicas: 1 restartPolicy: Never template: spec: containers: - env: - name: SFT_TRAINER_CONFIG_JSON_PATH value: /etc/config/config.json image: 'quay.io/modh/fms-hf-tuning:release' imagePullPolicy: IfNotPresent name: pytorch volumeMounts: - mountPath: /etc/config name: config-volume - mountPath: /data/input name: dataset-volume - mountPath: /data/output name: model-volume volumes: - configMap: items: - key: config.json path: config.json name: training-config name: config-volume - configMap: name: twitter-complaints name: dataset-volume - name: model-volume persistentVolumeClaim: claimName: trained-model runPolicy: suspend: false-
Replace the example namespace value
kftowith the name of your project, and update the other parameters to suit your environment. - Edit the parameters of the PyTorch training job, to provide the details for your training job and environment.
-
Save your changes in the
pytorchjob.yamlfile. Apply the configuration to run the PyTorch training job:
$ oc apply -f pytorchjob.yaml
-
Create a YAML file named
Verification
- In the OpenShift console, select your project from the Project list.
-
Click Workloads
Pods and verify that the <training-job-name>-master-0 pod is listed.
5.3. Monitoring the training job Copy linkLink copied to clipboard!
When you run a training job to tune a model, you can monitor the progress of the job. The example training job in this section is based on the IBM and Hugging Face tuning example provided here.
Prerequisites
- You have access to a data science cluster that is configured to run distributed workloads as described in Managing distributed workloads.
- You have created a data science project. For information about how to create a project, see Creating a data science project.
You have Admin access for the data science project.
- If you created the project, you automatically have Admin access.
- If you did not create the project, your cluster administrator must give you Admin access.
- You have access to a model.
- You have access to data that you can use to train the model.
- You are running the training job as described in Running the training job.
Procedure
- In the OpenShift console, select your project from the Project list.
-
Click Workloads
Pods. Search for the pod that corresponds to the PyTorch job, that is, <training-job-name>-master-0.
For example, if the training job name is
kfto-demo, the pod name is kfto-demo-master-0.- Click the pod name to open the pod details page.
Click the Logs tab to monitor the progress of the job and view status updates, as shown in the following example output:
0%| | 0/10 [00:00<?, ?it/s] 10%|█ | 1/10 [01:10<10:32, 70.32s/it] {'loss': 6.9531, 'grad_norm': 1104.0, 'learning_rate': 9e-06, 'epoch': 1.0} 10%|█ | 1/10 [01:10<10:32, 70.32s/it] 20%|██ | 2/10 [01:40<06:13, 46.71s/it] 30%|███ | 3/10 [02:26<05:25, 46.55s/it] {'loss': 2.4609, 'grad_norm': 736.0, 'learning_rate': 7e-06, 'epoch': 2.0} 30%|███ | 3/10 [02:26<05:25, 46.55s/it] 40%|████ | 4/10 [03:23<05:02, 50.48s/it] 50%|█████ | 5/10 [03:41<03:13, 38.66s/it] {'loss': 1.7617, 'grad_norm': 328.0, 'learning_rate': 5e-06, 'epoch': 3.0} 50%|█████ | 5/10 [03:41<03:13, 38.66s/it] 60%|██████ | 6/10 [04:54<03:22, 50.58s/it] {'loss': 3.1797, 'grad_norm': 1016.0, 'learning_rate': 4.000000000000001e-06, 'epoch': 4.0} 60%|██████ | 6/10 [04:54<03:22, 50.58s/it] 70%|███████ | 7/10 [06:03<02:49, 56.59s/it] {'loss': 2.9297, 'grad_norm': 984.0, 'learning_rate': 3e-06, 'epoch': 5.0} 70%|███████ | 7/10 [06:03<02:49, 56.59s/it] 80%|████████ | 8/10 [06:38<01:39, 49.57s/it] 90%|█████████ | 9/10 [07:22<00:48, 48.03s/it] {'loss': 1.4219, 'grad_norm': 684.0, 'learning_rate': 1.0000000000000002e-06, 'epoch': 6.0} 90%|█████████ | 9/10 [07:22<00:48, 48.03s/it]100%|██████████| 10/10 [08:25<00:00, 52.53s/it] {'loss': 1.9609, 'grad_norm': 648.0, 'learning_rate': 0.0, 'epoch': 6.67} 100%|██████████| 10/10 [08:25<00:00, 52.53s/it] {'train_runtime': 508.0444, 'train_samples_per_second': 0.197, 'train_steps_per_second': 0.02, 'train_loss': 2.63125, 'epoch': 6.67} 100%|██████████| 10/10 [08:28<00:00, 52.53s/it]100%|██████████| 10/10 [08:28<00:00, 50.80s/it]In the example output, the solid blocks indicate progress bars.
Verification
- The <training-job-name>-master-0 pod is running.
- The Logs tab provides information about the job progress and job status.