Chapter 4. YAML creation practices for optimizing model performance
The guidelines referenced in Adding knowledge to your taxonomy tree and Adding skills to your taxonomy tree provide standard instructions on creating skills and knowledge YAML files. However, there are ways you can to improve your YAML file to optimize the synthetic data generated and create a higher quality model.
Diverse and comprehensive content in the context field of the YAML file
Each context block should contain a variety information and format types from your document. This allows the model to learn different methods of presenting information. These different information presentation types can include: paragraphs, different types tables, lists, procedures and definitions.
The context block should be a comprehensive example from your document. The total length of the context content and the Q&A pairs should not exceed 750 tokens.
Writing effective questions
The questions should align with the type of questions you want the model to be capable of answering. Each question should be unique and reference the information in the context field. Including full sentence questions improves the generated synthetic data and improves model response quality.
Writing effective answers
The answer directly responds to the question and should reflect the type of answer you want the model to be capable of providing. Answers should be in complete sentences and reference the original question. Including full sentence answers improves the generated synthetic data and improves model response quality.
The answer should not be directly copied from the the context block, this can cause the model to learn extraction instead of reasoning.
The information to answer the question must be in the context block. If the information is in a separate context block, or is not referenced at all, the model can hallucinate.
Example of a high-quality question-and-answer pair
- question: How many eggs are needed to make roughly 24 chocolate chip cookies?
answer: You need around two eggs to make 24 chocolate chip cookies.
When to use multiple documents or multiple qna.yaml files
If multiple documents are related to a similar subject or domain, it is recommended to use a single qna.yaml file. Each qna.yaml file must contain a singular document type, you cannot mix document types in your YAML file.
If the documents are not related, it is recommended to to use separate qna.yaml files.
Adding Links in your YAML file
Models can memorize link, so you can add them to your YAML file. However, its recommended to avoid adding hyperlinks if they change frequently.