此内容没有您所选择的语言版本。

Chapter 3. Training and evaluating the model


RHEL AI can use your taxonomy tree and synthetic data to create a newly trained model with your domain-specific knowledge or skills using multi-phase training and evaluation. You can run the full training and evaluation process using the synthetic dataset you generated. The LAB optimized technique of multi-phase training is a type of LLM training that goes through multiple stages of training and evaluation. In these various stages, RHEL AI runs the training process and produces model checkpoints, which are scored in the evaluation process, and the highest scored checkpoint is selected for each phase. This process creates many checkpoints and selects the best scored checkpoint. This best scored checkpoint is your newly trained LLM.

The entire process creates a newly generated model that is trained and evaluated using the synthetic data from your taxonomy tree.

3.1. Training your model on the data

You can use Red Hat Enterprise Linux AI to train a model with your synthetically generated data. The following procedures show how to do this using the LAB multi-phase training strategy.

Important

Red Hat Enterprise Linux AI general availability does not support training and inference serving at the same time. If you have an inference server running, you must close it before you start the training process.

Prerequisites

  • You installed RHEL AI with the bootable container image.
  • You downloaded the granite-7b-starter model.
  • You created a custom qna.yaml file with knowledge data.
  • You ran the synthetic data generation (SDG) process.
  • You downloaded the prometheus-8x7b-v2-0 judge model.
  • You have root user access on your machine.

Procedure

  1. You can run multi-phase training and evaluation by running the following command with the data files generated from SDG.

    Note

    You can use the --enable-serving-output flag with the ilab model train commmand to display the training logs.

    $ ilab model train --strategy lab-multiphase \
            --phased-phase1-data ~/.local/share/instructlab/datasets/<knowledge-train-messages-jsonl-file> \
            --phased-phase2-data ~/.local/share/instructlab/datasets/<skills-train-messages-jsonl-file>

    where

    <knowledge-train-messages-file>
    The location of the knowledge_messages.jsonl file generated during SDG. RHEL AI trains the student model granite-7b-starter using the data from this .jsonl file. Example path: ~/.local/share/instructlab/datasets/knowledge_train_msgs_2024-08-13T20_54_21.jsonl.
    <skills-train-messages-file>

    The location of the skills_messages.jsonl file generated during SDG. RHEL AI trains the student model granite-7b-starter using the data from the .jsonl file. Example path: ~/.local/share/instructlab/datasets/skills_train_msgs_2024-08-13T20_54_21.jsonl.

    Important

    This process can be very time consuming depending on your hardware specifications.

    1. The first phase trains the model using the synthetic data from your knowledge contribution.

      Example output of training knowledge

      Training Phase 1/2...
      TrainingArgs for current phase: TrainingArgs(model_path='/opt/app-root/src/.cache/instructlab/models/granite-7b-starter', chat_tmpl_path='/opt/app-root/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py', data_path='/tmp/jul19-knowledge-26k.jsonl', ckpt_output_dir='/tmp/e2e/phase1/checkpoints', data_output_dir='/opt/app-root/src/.local/share/instructlab/internal', max_seq_len=4096, max_batch_len=55000, num_epochs=2, effective_batch_size=128, save_samples=0, learning_rate=2e-05, warmup_steps=25, is_padding_free=True, random_seed=42, checkpoint_at_epoch=True, mock_data=False, mock_data_len=0, deepspeed_options=DeepSpeedOptions(cpu_offload_optimizer=False, cpu_offload_optimizer_ratio=1.0, cpu_offload_optimizer_pin_memory=False, save_samples=None), disable_flash_attn=False, lora=LoraOptions(rank=0, alpha=32, dropout=0.1, target_modules=('q_proj', 'k_proj', 'v_proj', 'o_proj'), quantize_data_type=<QuantizeDataType.NONE: None>))

    2. Then, RHEL AI evaluates all of the checkpoints from phase 1 model training using the Massive Multi-task Language Understanding (MMLU) benchmark and passes the best performing checkpoint to the next phase of training.

      Example output of evaluating knowledge

      MMLU evaluation for Phase 1...
      INFO 2024-08-15 01:23:40,975 lm-eval:152: Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
      INFO 2024-08-15 01:23:40,976 lm-eval:189: Initializing hf model, with arguments: {'pretrained': '/tmp/e2e/phase1/checkpoints/hf_format/samples_26112', 'dtype': 'bfloat16'}
      
      Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
      Loading checkpoint shards:  33%|███▎      | 1/3 [00:01<00:02,  1.28s/it]
      Loading checkpoint shards:  67%|██████▋   | 2/3 [00:02<00:01,  1.15s/it]
      Loading checkpoint shards: 100%|██████████| 3/3 [00:02<00:00,  1.36it/s]
      Loading checkpoint shards: 100%|██████████| 3/3 [00:02<00:00,  1.16it/s]

    3. The next phase trains the model using the synthetic data from the skills data.

      Example output of training skills

      Training Phase 2/2...
      TrainingArgs for current phase: TrainingArgs(model_path='/tmp/e2e/phase1/checkpoints/hf_format/samples_52096', chat_tmpl_path='/opt/app-root/lib64/python3.11/site-packages/instructlab/training/chat_templates/ibm_generic_tmpl.py', data_path='/usr/share/instructlab/sdg/datasets/skills.jsonl', ckpt_output_dir='/tmp/e2e/phase2/checkpoints', data_output_dir='/opt/app-root/src/.local/share/instructlab/internal', max_seq_len=4096, max_batch_len=55000, num_epochs=2, effective_batch_size=3840, save_samples=0, learning_rate=2e-05, warmup_steps=25, is_padding_free=True, random_seed=42, checkpoint_at_epoch=True, mock_data=False, mock_data_len=0, deepspeed_options=DeepSpeedOptions(cpu_offload_optimizer=False, cpu_offload_optimizer_ratio=1.0, cpu_offload_optimizer_pin_memory=False, save_samples=None), disable_flash_attn=False, lora=LoraOptions(rank=0, alpha=32, dropout=0.1, target_modules=('q_proj', 'k_proj', 'v_proj', 'o_proj'), quantize_data_type=<QuantizeDataType.NONE: None>))

    4. Then, RHEL AI evaluates all of the checkpoints from phase 2 model training using the Multi-turn Benchmark (MT-Bench) and returns the best performing checkpoint as the fully trained output model.

      Example output of evaluating skills

      MT-Bench evaluation for Phase 2...
      Using gpus from --gpus or evaluate config and ignoring --tensor-parallel-size configured in serve vllm_args
      INFO 2024-08-15 10:04:51,065 instructlab.model.backends.backends:437: Trying to connect to model server at http://127.0.0.1:8000/v1
      INFO 2024-08-15 10:04:53,580 instructlab.model.backends.vllm:208: vLLM starting up on pid 79388 at http://127.0.0.1:54265/v1
      INFO 2024-08-15 10:04:53,580 instructlab.model.backends.backends:450: Starting a temporary vLLM server at http://127.0.0.1:54265/v1
      INFO 2024-08-15 10:04:53,580 instructlab.model.backends.backends:465: Waiting for the vLLM server to start at http://127.0.0.1:54265/v1, this might take a moment... Attempt: 1/300
      INFO 2024-08-15 10:04:58,003 instructlab.model.backends.backends:465: Waiting for the vLLM server to start at http://127.0.0.1:54265/v1, this might take a moment... Attempt: 2/300
      INFO 2024-08-15 10:05:02,314 instructlab.model.backends.backends:465: Waiting for the vLLM server to start at http://127.0.0.1:54265/v1, this might take a moment... Attempt: 3/300
      moment... Attempt: 3/300
      INFO 2024-08-15 10:06:07,611 instructlab.model.backends.backends:472: vLLM engine successfully started at http://127.0.0.1:54265/v1

  2. After training is complete, a confirmation appears and displays your best performed checkpoint.

    Example output of a complete multi-phase training run

    Training finished! Best final checkpoint: samples_1945 with score: 6.813759384

    Make a note of this checkpoint because the path is necessary for evaluation and serving.

Verification

  • When training a model with ilab model train, multiple checkpoints are saved with the samples_ prefix based on how many data points they have been trained on. These are saved to the ~/.local/share/instructlab/phase/ directory.

    $ ls ~/.local/share/instructlab/phase/<phase1-or-phase2>/checkpoints/

    Example output of the new models

    samples_1711 samples_1945 samples_1456 samples_1462 samples_1903

3.2. Evaluating your new model

If you want to measure the improvements of your new model, you can compare its performance to the base model with the evaluation process. You can also chat with the model directly to qualitatively identify whether the new model has learned the knowledge you created. If you want more quantitative results of the model improvements, you can run the evaluation process in the RHEL AI CLI with the following procedure.

Prerequisites

  • You installed RHEL AI with the bootable container image.
  • You created a custom qna.yaml file with skills or knowledge.
  • You ran the synthetic data generation process.
  • You trained the model using the RHEL AI training process.
  • You downloaded the prometheus-8x7b-v2-0 judge model.
  • You have root user access on your machine.

Procedure

  1. Navigate to your working Git branch where you created your knowledge data set.
  2. You can now run the evaluation process on different benchmarks. Each command needs the path to the trained samples model to evaluate, you can access these checkpoints in your ~/.local/share/instructlab/checkpoints folder.

    1. If you want to measure how your knowledge contributions have impacted your model, run the mmlu_branch benchmark by executing the following command:

      $ ilab model evaluate --benchmark mmlu_branch --model ~/.local/share/instructlab/phased/<phase1-or-phase2>/checkpoints/hf_format/<checkpoint> --tasks-dir ~/.local/share/instructlab/datasets/<node-dataset> --base-model ~/.cache/instructlab/models/granite-7b-starter

      where

      <checkpoint>
      Specify the best scored checkpoint file generated during multi-phase training
      <node-dataset>

      Specify latest node_datasets file that is in the ~/.local/share/instructlab/datasets/ directory.

      Example output

      # KNOWLEDGE EVALUATION REPORT
      
      ## BASE MODEL
      /home/<example-user>/.local/share/instructlab/models/instructlab/granite-7b-starter
      
      ## MODEL
      /home/<example-user>/.local/share/instructlab/models/instructlab/granite-7b-starter
      
      ### AVERAGE:
      +1.0(across 1)

    2. Optional with the sample taxonomy: If you want to measure how your skills contributions have impacted your model, run the mt_bench_branch benchmark by executing the following command:

      $ ilab model evaluate \
          --benchmark mt_bench_branch \
          --model ~/.local/share/checkpoints/hf_format/<checkpoint> \
          --judge-model ~/.cache/instructlab/models/prometheus-8x7b-v2-0 \
          --branch <worker-branch> \
          --base-branch main \
          --gpus <num-gpus> \
          --enable-serving-output

      where

      <checkpoint>
      Specify the best scored checkpoint file generated during multi-phase training.
      <worker-branch>
      Specify the branch you used when adding data to your taxonomy tree.
      <num-gpus>

      Specify the number of GPUs you want to use for evaluation.

      Note

      Customizing skills is not currently supported on Red Hat Enterprise Linux AI version {product-verion}.

      Example output

      # SKILL EVALUATION REPORT
      
      ## BASE MODEL
      /home/example/.local/share/instructlab/models/instructlab/granite-7b-lab
      
      ## MODEL
      /home/example/.local/share/instructlab/models/instructlab/granite-7b-lab
      
      ### IMPROVEMENTS:
      1. compositional_skills/extraction/receipt/markdown/qna.yaml (+4.0)
      ...
      
      ### REGRESSIONS:
      1. compositional_skills/extraction/abstractive/title/qna.yaml (-5.0)
      ...
      
      ### NO CHANGE:
      2. compositional_skills/extraction/commercial_lease_agreement/csv/qna.yaml
      ...
      
      ### ERROR RATE:
      0.32

  3. Optional: If you do not run multi-phase training, you can manually evaluate each checkpoint using the MMLU and MT_BENCH benchmarks. You can evaluate any model against the standardized set of knowledge or skills, allowing you to compare the scores of your own model against other LLMs. If you do run multi-phase training, this process is done with single-phase training.

    1. If you want to see the evaluation score of your new model against a standardized set of knowledge data, set the mmlu benchmark by running the following command:

      $ ilab model evaluate --benchmark mmlu --model ~/.local/share/checkpoints/hf_format/<checkpoint>

      where

      <checkpoint>

      Specify one of the checkpoint files generated during multi-phase training.

      Example output

      # KNOWLEDGE EVALUATION REPORT
      
      ## MODEL
      /home/<example-user>/.local/share/instructlab/models/instructlab/granite-7b-lab
      
      ### AVERAGE:
      0.45 (across 3)
      
      ### SCORES:
      mmlu_abstract_algebra - 0.35
      mmlu_anatomy - 0.44
      mmlu_astronomy - 0.55

    2. If you want to see the evaluation score of your new model against a standardized set of skills, set the mt_bench benchmark by running the following command:

      $ ilab model evaluate --benchmark mt_bench --model ~/.local/share/instructlab/checkpoints/hf_format/<checkpoint> --enable-serving-output

      where

      <checkpoint>

      Specify one of the checkpoint files generated during multi-phase training.

      Example output

      # SKILL EVALUATION REPORT
      
      ## MODEL
      /home/<example-user>/.local/share/instructlab/models/instructlab/granite-7b-lab
      
      ### AVERAGE:
      8.07 (across 91)
      
      ### TURN ONE:
      8.64
      
      ### TURN TWO:
      7.19
      
      ### ERROR RATE:
      0.43

Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.