Creating skills and knowledge YAML files
Guidelines on creating skills and knowledge YAML files
Abstract
Chapter 1. Customizing your taxonomy tree Copy linkLink copied to clipboard!
You can modify a taxonomy tree with knowledge or skills data in your RHEL AI environment to create your own custom Granite Large Language Model (LLM). On RHEL AI, knowledge and skills created data sets are formatted in YAML. This YAML configuration is called a qna.yaml file, where "qna" stands for question and answer. A taxonomy tree is a categorization and information classification method that holds your qna.yaml files.
The following documentation sections describe how to create skill and knowledge qna.yaml files for your taxonomy tree.
There are a few supported knowledge document types that you can use for training the starter Granite LLM. The current supported document types include:
- Markdown
1.1. Overview of skill and knowledge Copy linkLink copied to clipboard!
You can use skill and knowledge sets and specify domain-specific information to teach your custom model.
- Knowledge
- A dataset that consists of information and facts. When creating knowledge data for a model, you are providing it with additional data and information so the model can answer questions more accurately.
- Skills
A dataset where you can teach the model how to do a task. Skills on RHEL AI are split into categories:
Compositional skill: Compositional skills allow AI models to perform specific tasks or functions. There are two types of composition skills:
- Freeform compositional skills: These are performative skills that to not require additional context or information to function.
- Grounded compositional skills: These are performative skills that require additional context. For example, you can teach the model to read a table, where the additional context is an example of the table layout.
- Foundation skills: Foundational skills are skills that involve math, reasoning, and coding.
Additional Resources
Chapter 2. Adding knowledge to your taxonomy tree Copy linkLink copied to clipboard!
You can customize your taxonomy tree so a model can learn domain-specific information. For RHEL AI, your knowledge data is hosted in a Git repository. Knowledge contributions use the qna.yaml file to learn how to read the document that you want to teach the model. Each qna.yaml file for knowledge contains a set of key-value entries with the following keys:
| Field | Description | Constraints | Example |
|---|---|---|---|
|
| The taxonomy schema version used in the qna.yaml file. |
The currently supported value for this parameter is |
|
|
| The name or username of the contributor. | - |
|
|
| The subject or category of the knowledge document. The domain prompts to the teacher model during synthetic data generation and adds additional context. |
It is recommended that your |
For a knowledge document about the Phoenix contestation, the domain would be
For a knowledge document about Health insurance information, the domain would be |
|
| The field that contains the question-and-answer pairs with context from the knowledge document. |
At least five seed examples are required in your |
|
|
| A chunk of information taken exactly from the knowledge document. Highlight different types of content, including tables, paragraphs or lists, to help guide the teacher model. | Each qna.yaml needs five context blocks and has a maximum token count of 500 tokens. |
|
|
| The field that contains the questions and answers for your model to learn. |
Each |
|
|
| A question related to and grounded in the relevant context. Provide a variety questions and question types, including fact-based, reasoning, or clarifications. | Maximum token count of 250 tokens. |
|
|
|
The answer to the specified question. Answers should be in complete sentences and be referenced in the | Maximum token count of 250 tokens. |
|
|
| A brief summary of the document, similar to a thesis statement. This provides a high-level context for the content of the document. |
Must be detailed and reference the content in the |
|
|
| The field that contains the source of your knowledge data. | - |
|
|
| The URL to your git repository that contains your knowledge files. | - |
|
|
| The full commit hash that corresponds to the document in the repo | - | - |
|
| Contains the file(s) in your git repository |
Valid values include |
|
Additional resources for creating a qna.yaml
-
For a full
qna.yamlfile with example parameters, see the Sample knowledge YAML specifications documentation. - For full guidelines on the YAML curation process, see the YAML creation practices for optimizing model performance documentation.
2.1. Creating a knowledge YAML file Copy linkLink copied to clipboard!
The following process displays how to create a qna.yaml file that teaches your LLM about your provided knowledge files using the RHEL AI toolset.
Prerequisites
- You installed RHEL AI with the bootable container image.
- You installed the git CLI.
-
You initialized InstructLab and can use the
ilabCLI. - You have root user access on your machine.
Procedure
- Since you are hosting your knowledge files in a git repository, you need to checkout a working branch when updating your taxonomy.
- Navigate to the taxonomy folder. RHEL AI includes a ready made taxonomy tree to interact with.
- Navigate to the knowledge folder in the taxonomy directory.
Add directories and folders in the taxonomy tree where you want to add your knowledge
qna.yamlfile.Example file path in the taxonomy tree
taxonomy/knowledge/technical_documents/product_customer_cases/qna.yamlUsing your desired text editor, create the
qna.yamlfile. Your YAML must have theqna.yamltitle.NoteFor SDG to run properly, you must include at least five
contextchunks and three question and answer seeds per context value in thequestions_and_answersparameter.-
Add the necessary keys to the
qna.yamlfile and save your changes. For more information about formatting yourqna.yamlfile, see "Sample knowledge YAML specifications".
Verification
To verify that your knowledge
qna.yamlfile is in the proper format, you can run the following command:$ ilab taxonomy diffThe CLI displays if your taxonomy tree and
qna.yamlfile is valid and properly formatted. The CLI also shows you where you can fix any errors you encounter.Example output of valid taxonomy tree and
qna.yamlfileknowledge/technical_documents/product_customer_cases/qna.yaml Taxonomy in /taxonomy/ is valid :)Example output of invalid taxonomy tree and
qna.yamlfile with errors9:15 error syntax error: mapping values are not allowed here (syntax) Reading taxonomy failed with the following error: 1 taxonomy with errors! Exiting.
2.1.1. Sample knowledge YAML specifications Copy linkLink copied to clipboard!
Knowledge contributions use the qna.yaml file to learn how to read the document that you want to teach the model. On RHEL AI, the synthetic data generation (SDG) process uses your qna.yaml seed examples to create a large quantity of artificial data. This process makes it so the model has more data to learn from rather than exclusively relying on provided samples.
Example knowledge qna.yaml file
version: 3
domain: astronomy
document_outline: |
Information about the Phoenix Constellation including the
history, characteristics, and features of the stars in the constellation.
created_by: <user-name>
seed_examples:
- context: |
**Phoenix** is a minor constellation in the southern sky. Named after the mythical
Phoenix_(mythology), it was first depicted on a celestial atlas by Johann Bayerin his 1603
Uranometria. The French explorer and astronomer Nicolas Louis de Lacaille charted the brighter stars
and gave their Bayer designations in 1756. The constellation stretches from roughly −39 degrees to −57
degrees declination, and from 23.5h to 2.5h of right ascension. The constellations Phoenix, Grus,
Pavo and Tucana are known as the Southern Birds.
questions_and_answers:
- question: |
What is the Phoenix constellation?
answer: |
The Phoenix constellation is a minor constellation in the southern sky.
- question: |
Who charted the Phoenix constellation?
answer: |
The Phoenix constellation was charted by french explorer and
astronomer Nicolas Louis de Lacaille.
- question: |
How far does the Phoenix constellation stretch?
answer: |
The phoenix constellation stretches from roughly −39° to −57°
declination, and from 23.5h to 2.5h of right ascension.
- context: |
Phoenix was the largest of the 12 constellations established by Petrus Plancius from the observations
of Pieter Dirkszoon Keyser and Frederick de Houtman. It first appeared on a 35cm diameter celestial globe
published in 1597 (or 1598) in Amsterdam by Plancius with Jodocus Hondius. The first depiction of this
constellation in a celestial atlas was in Johann Bayer *Uranometria* of 1603. De Houtman included it in
his southern star catalog the same year under the Dutch name *Den voghel Fenicx*, "The Bird Phoenix",
symbolising the phoenix of classical mythology. One name of the brightest star Alpha Phoenicis—Ankaa—is
derived from the Arabic: العنقاء, romanized: al-‘anqā’, lit. 'the phoenix', and was coined sometime after
1800 in relation to the constellation.
questions_and_answers:
- question: |
What is the brightest star in the Phoenix constellation
called?
answer: |
Alpha Phoenicis or Ankaa is the brightest star in the Phoenix
Constellation.
- question: Where did the Phoenix constellation first appear?
answer: |
The Phoenix constellation first appeared on a 35-cm diameter
celestial globe published in 1597 (or 1598) in Amsterdam by
Plancius with Jodocus Hondius.
- question: |
What does "The Bird Phoenix" symbolize?
answer: |
"The Bird Phoenix" symbolizes the phoenix of classical mythology.
- context: |
Phoenix is a small constellation bordered by Fornax and Sculptor to the north, Grus to the west,
Tucana to the south, touching on the corner of Hydrus to the south, and Eridanus to the east and southeast.
The bright star Achernar is nearby. The three-letter abbreviation for the constellation, as adopted by
the International Astronomical Union in 1922, is "Phe". The official constellation boundaries,
as set by Belgian astronomer Eugène Delporte in 1930, are defined by a polygon of 10 segments.
In the equatorial coordinate system, the right ascension coordinates of these borders lie
between 23<sup>h</sup> 26.5<sup>m</sup> and 02<sup>h</sup> 25.0<sup>m</sup>,
while the declination coordinates are between −39.31° and −57.84°. This means it remains
below the horizon to anyone living north of the 40th parallel in the Northern
Hemisphere, and remains low in the sky for anyone living north of the equator. It is most
visible from locations such as Australia and South Africa during late Southern Hemisphere spring.
Most of the constellation lies within, and can be located by, forming a triangle of the bright
stars Achernar, Fomalhaut and Beta Ceti—Ankaa lies roughly in the centre of this.
questions_and_answers:
- question: What are the characteristics of the Phoenix constellation?
answer: |
Phoenix is a small constellation bordered by Fornax and Sculptor to
the north, Grus to the west, Tucana to the south, touching on the
corner of Hydrus to the south, and Eridanus to the east and southeast.
The bright star Achernar is nearby.
- question: |
When is the phoenix constellation most visible?
answer: |
Phoenix is most visible from locations such as Australia and
South Africa during late Southern Hemisphere spring.
- question: |
What are the Phoenix Constellation boundaries?
answer: |
The official constellation boundaries for Phoenix, as set by Belgian
astronomer Eugène Delporte in 1930, are defined by a polygon of 10
segments.
- context: |
Ten stars have been found to have planets to date, and four planetary systems have been
discovered with the SuperWASP project. HD 142 is a yellow giant that has an apparent magnitude
of 5.7, and has a planet (HD 142b) 1.36 times the mass of Jupiter which orbits every 328 days.
HD 2039 is a yellow subgiant with an apparent magnitude of 9.0 around 330 light years away which
has a planet (HD 2039) six times the mass of Jupiter. WASP-18 is a star of magnitude 9.29 which
was discovered to have a hot Jupiter-like planet (WASP-18b) taking less than a day to orbit the star.
The planet is suspected to be causing WASP-18 to appear older than it really is. WASP-4and WASP-5
are solar-type yellow stars around 1000 light years distant and of 13th magnitude, each with a single
planet larger than Jupiter. WASP-29 is an orange dwarf of spectral type K4V and visual magnitude 11.3,
which has a planetary companion of similar size and mass to Saturn. The planet completes an orbit
every 3.9 days.
questions_and_answers:
- question: In the Phoenix constellation, how many stars have planets?
answer: |
In the Phoenix constellation, ten stars have been found to have
planets to date, and four planetary systems have been discovered
with the SuperWASP project.
- question: |
What is HD 142?
answer: |
HD 142 is a yellow giant that has an apparent magnitude of 5.7, and
has a planet (HD 142 b) 1.36 times the mass of Jupiter which
orbits every 328 days.
- question: |
Are WASP-4 and WASP-5 solar-type yellow stars?
answer: |
Yes, WASP-4 and WASP-5 are solar-type yellow stars around 1000 light
years distant and of 13th magnitude, each with a single planet
larger than Jupiter.
- context: |
The constellation does not lie on the galactic plane of the Milky Way, and there are no prominent star
clusters. NGC 625 is a dwarf irregular galaxy of apparent magnitude 11.0 and lying some 12.7 million
light years distant. Only 24000 light years in diameter, it is an outlying member of the Sculptor Group.
NGC 625 is thought to have been involved in a collision and is experiencing a burst of active star formation.
NGC 37 is a lenticular galaxy of apparent magnitude 14.66. It is approximately 42 kiloparsecs 137,000
light-years in diameter and about 12.9 billion years old. Robert's Quartet composed of the irregular galaxy
NGC 87, and three spiral galaxies NGC 88, NGC 89 and NGC 92 is a group of four galaxies located around 160 million
light-years away which are in the process of colliding and merging. They are within a circle of radius of 1.6 arcmin,
corresponding to about 75,000 light-years. Located in the galaxy ESO 243-49 is HLX-1, an intermediate-mass
black hole—the first one of its kind identified. It is thought to be a remnant of a dwarf galaxy that was absorbed
in a collision with ESO 243-49. Before its discovery, this class of black hole was only hypothesized.
questions_and_answers:
- question: |
Is the Phoenix Constellation part of the Milky Way?
answer: |
The Phoenix constellation does not lie on the galactic plane of
the Milky Way, and there are no prominent star clusters.
- question: |
How many light years away is NGC 625?
answer: |
NGC 625 is 24000 light years in diameter and is an outlying
member of the Sculptor Group.
- question: |
What is Robert's Quartet composed of?
answer: |
Robert's Quartet is composed of the irregular galaxy NGC 87,
and three spiral galaxies NGC 88, NGC 89 and NGC 92.
document:
repo: https://github.com/<profile>/<repo-name> /
commit: <commit hash>
patterns:
- phoenix_constellation.md
- phoenix_history.md
- 1
- Specify the version of the knowledge
qna.yamlformat. Currently, the valid value is3. - 2
- Specify the subject or category of the document. For example, "Technical Documents" or "Installation Guides".
- 3
- Specify a outline of the document’s contents. It is recommenced to reference the subjects that you include in the
contextparameter in thedocument_outlinefield. For example, if the document was an installation guide and eachcontextincluded details on different cloud providers, thedocument_outlinewould be "Installation guides for AWS, GCP, and Azure". - 4
- Specify your name or git username.
- 5
- Specify a paragraph of your knowledge data. This is content that your questions and answers are be based on. The format of the context block must match the format of the your knowledge file. For example, if your knowledge document is in Markdown, your
contextblock must also be in the Markdown format. - 6
- Specify a question for the model. The question should be based on the information in the
contextfield. For example, "What is the latest version of the product?". - 7
- Specify the desired response from the model. The information for the answer must be included, but not copied, from the
contextblock and be in complete sentences. Answers should be in complete sentences. For example, "The latest version of the product is version 1.5". - 8
- Specify the URL to the repository that holds your knowledge files.
- 9
- Specify the SHA of the commit from your git repository of your knowledge files.
- 10
- Specify the documents in your git repository. Valid document type values include
.mdor.pdf. A singleqna.yamlfile can only reference one document type, mixing file types within the sameqna.yamlis not supported.
2.2. Creating a knowledge markdown file Copy linkLink copied to clipboard!
On Red Hat Enterprise Linux AI version 1.5, you must host your knowledge documentation and data in a git repository and in markdown format. You can use the standard git workflow to create and upload files to your repository. There are various open source markdown conversion tools you can use, including:
- Pandoc: An open source conversion tool.
- Visual Studio Code with All in one extention: You can open your document in Visual Studio Code, and use the Markdown All in One extensions to convert to Markdown.
- IBM Deepsearch/Docling: Bundles PDF document conversion to JSON and markdown in a self-contained package.
Procedure
- Select your preferred git hosting platform. You can use any platform on RHEL AI as long as it’s compatible with git.
Convert your documents into the
.mdmarkdown format. You can use any markdown conversion software you want for your knowledge data.The following list includes guidelines for knowledge markdown files:
- All documents must be text, images are not currently supported.
- Remove any footnotes from your documents.
- Tables must be in markdown format.
- Charts and graphs are currently not supported.
-
Make a note of your file name and commit hash. This value is used in your
qna.yamlfile. Create and upload a
mdfile into your git repository.Example markdown of a knowledge document
# Phoenix (constellation) **Phoenix** is a minor constellation in the southern sky. Named after the mythical phoenix, it was first depicted on a celestial atlas by Johann Bayer in his 1603 *Uranometria*. The French explorer and astronomer Nicolas Louis de Lacaille charted the brighter stars and gave their Bayer designations in 1756. The constellation stretches from roughly −39 degrees to −57 degrees declination, and from 23.5h to 2.5h of right ascension. The constellations Phoenix, Grus , Pavo and Tucana, are known as the Southern Birds. The brightest star, Alpha Phoenicis, is named Ankaa, an Arabic word meaning 'the Phoenix'. It is an orange giant of apparent magnitude 2.4. Next is Beta Phoenicis, actually a binary system composed of two yellow giants with a combined apparent magnitude of 3.3. Nu Phoenicis has a dust disk, while the constellation has ten star systems with known planets and the recently discovered galaxy clusters El Gordo and the Phoenix Cluster—located 7.2 and 5.7 billion light years away respectively, two of the largest objects in the visible universe. Phoenix is the radiant of two annual meteor showers: the Phoenicids in December, and the July Phoenicids. ## History Phoenix was the largest of the 12 constellations established by Petrus Plancius from the observations of Pieter Dirkszoon Keyser and Frederick de Houtman. It first appeared on a 35-cm diameter celestial globe published in 1597 (or 1598) in Amsterdam by Plancius with Jodocus Hondius. The first depiction of this constellation in a celestial atlas was in Johann Bayer's*Uranometria* of 1603. De Houtman included it in his southern star catalog the same year under the Dutch name *Den voghel Fenicx*, "The Bird Phoenix", symbolizing the phoenix of classical mythology. One name of the brightest star Alpha Phoenicis—Ankaa—is derived from the Arabic: العنقاء, romanized: al-‘anqā’, lit. 'the phoenix', and was coined sometime after 1800 in relation to the constellation. Celestial historian Richard Allen noted that unlike the other constellations introduced by Plancius and La Caille, Phoenix has actual precedent in ancient astronomy, as the Arabs saw this formation as representing young ostriches, *Al Ri'āl*, or as a griffin or eagle. In addition, the same group of stars was sometimes imagined by the Arabs as a boat, *Al Zaurak*, on the nearby river Eridanus. He observed, "the introduction of a Phoenix into modern astronomy was, in a measure, by adoption rather than by invention." The Chinese incorporated Phoenix's brightest star, Ankaa (Alpha Phoenicis), and stars from the adjacent constellation Sculptor to depict *Bakui*, a net for catching birds. Phoenix and the neighboring constellation of Grus together were seen by Julius Schiller as portraying Aaron the High Priest. These two constellations, along with nearby Pavo and Tucana, are called the Southern Birds. ## Characteristics Phoenix is a small constellation bordered by Fornax and Sculptor to the north, Grus to the west, Tucana to the south, touching on the corner of Hydrus to the south, and Eridanus to the east and southeast. The bright star Achernar is nearby. The three-letter abbreviation for the constellation, as adopted by the International Astronomical Union in 1922, is "Phe". The official constellation boundaries, as set by Belgian astronomer Eugène Delporte in 1930, are defined by a polygon of 10 segments. In the equatorial coordinate system, the right ascension coordinates of these borders lie between 23<sup>h</sup> 26.5<sup>m</sup> and 02<sup>h</sup> 25.0<sup>m</sup>, while the declination coordinates are between −39.31° and −57.84°. This means it remains below the horizon to anyone living north of the 40th parallel in the Northern Hemisphere, and remains low in the sky for anyone living north of the equator. It is most visible from locations such as Australia and South Africa during late Southern Hemisphere spring. Most of the constellation lies within, and can be located by, forming a triangle of the bright stars Achernar, Fomalhaut and Beta Ceti—Ankaa lies roughly in the centre of this.
Chapter 3. Adding skills to your taxonomy tree Copy linkLink copied to clipboard!
The starter model can learn custom skills by populating the qna.yaml file with your domain specific skill. Each qna.yaml file for skills contains a set of key-value entries with the following keys:
| Field | Desciption | Restraints |
|---|---|---|
|
|
The version of the | The currently supported value for this parameter is 2. |
|
| Your Git username or name of contributor. | None |
|
| A description of your skill and its function. | None |
|
| A collection of key and value entries. |
Each |
|
| Grounded skills require the user to provide additional context containing information that the model needs to know for executing the skill. |
This filed is required for grounded skills. Each |
|
| Specify a question for the model. |
Each |
|
| Specify an answer for the model. |
Each |
3.1. Creating a skill YAML file Copy linkLink copied to clipboard!
You can customize your taxonomy tree so the model can learn new skills for your desired use cases. The following procedure displays how to create a taxonomy tree that contains your skill qna.yaml file.
Prerequisites
- You installed RHEL AI with the bootable container image.
-
You initialized InstructLab and can use the
ilabCLI. - You have root user access on your machine.
Procedure
Navigate to the
compositional_skillsfolder the taxonomy directory.ImportantCreating foundational skills, skills involving creating code, is not currently supported on RHEL AI version 1.5.
Based on the directories that exist in the tree, select where in the tree you want to add your skill
qna.yamlfile.Example file path in the taxonomy tree
taxonomy/compositional_skills/grounded/<add_example>/qna.yamlUsing your desired text editor, create the
qna.yamlfile.NoteFor SDG to run properly, you must include at least five question and answer pair examples in your
qna.yamlfile.-
Add the necessary keys to the
qna.yamlfile and save your changes. For more information on formatting yourqna.yamlfile, see "Sample skill YAML specifications".
Verification
To verify that your skill is in the proper format, you can run the following command:
$ ilab taxonomy diffThe CLI displays if your taxonomy tree and
qna.yamlfile is valid and properly formatted. The CLI also displays where to fix any errors you may encounter.Example output of valid taxonomy tree and
qna.yamlfilecompositional_skills/writing/freeform/<example>/qna.yaml Taxonomy in /taxonomy/ is valid :)Example output of invalid taxonomy tree and
qna.yamlfile with errors6:11 error syntax error: mapping values are not allowed here (syntax) Reading taxonomy failed with the following error: 1 taxonomy with errors! Exiting.
3.2. Sample skills YAML specifications Copy linkLink copied to clipboard!
Skills share a similar question and answer layout as knowledge YAML files. On RHEL AI, the synthetic data generation (SDG) process uses your qna.yaml seed examples to create a large set of artificial data for the model to learn, rather than relying exclusively on user generated data.
The order of the question, answer, and context pairs does not influence the SDG or training process. There are multiple types of skills that are split into categories: freeform, grounded, and foundational skills. You can see samples of each category in the following examples:
Example freeform compositional skill qna.yaml file
version: 2
created_by: <user-name>
task_description: 'Teach the model how to rhyme.'
seed_examples:
- question: What are 5 words that rhyme with horn?
answer: warn, torn, born, thorn, and corn.
- question: What are 5 words that rhyme with cat?
answer: bat, gnat, rat, vat, and mat.
- question: What are 5 words that rhyme with poor?
answer: door, shore, core, bore, and tore.
- question: What are 5 words that rhyme with bank?
answer: tank, rank, prank, sank, and drank.
- question: What are 5 words that rhyme with bake?
answer: wake, lake, steak, make, and quake.
Example grounded compositional skill qna.yaml file
version: 2
created_by: <user-name>
task_description: This skill provides the ability to read a markdown-formatted table.
seed_examples:
- context: |
| **Breed** | **Size** | **Barking** | **Energy** |
|----------------|--------------|-------------|------------|
| Afghan Hound | 25-27 in | 3/5 | 4/5 |
| Labrador | 22.5-24.5 in | 3/5 | 5/5 |
| Cocker Spaniel | 14.5-15.5 in | 3/5 | 4/5 |
| Poodle (Toy) | <= 10 in | 4/5 | 4/5 |
question: |
Which breed has the most energy?
answer: |
The breed with the most energy is the Labrador.
- context: |
| **Name** | **Date** | **Color** | **Letter** | **Number** |
|----------|----------|-----------|------------|------------|
| George | Mar 5 | Green | A | 1 |
| Gráinne | Dec 31 | Red | B | 2 |
| Abigail | Jan 17 | Yellow | C | 3 |
| Bhavna | Apr 29 | Purple | D | 4 |
| Rémy | Sep 9 | Blue | E | 5 |
question: |
What is Gráinne's letter and what is her color?
answer: |
Gráinne's letter is B and her color is red.
- context: |
| Banana | Apple | Blueberry | Strawberry |
|--------|------------|-----------|------------|
| Yellow | Red, Green | Blue | Red |
| Large | Medium | Small | Small |
| Peel | Peel | No peel | No peel |
question: |
Which fruit is blue, small, and has no peel?
answer: |
The blueberry is blue, small, and has no peel.
- 1 1
- Specify the version of the skill
qna.yamlformat. - 2 2
- Specify your name or git username.
- 3 3
- Specify a description of your skill and its function.
- 4
- Specify additional context containing information that the model needs to know for executing the skill. Required for grounded skills.
- 4 5
- Specify a question for the model.
- 5 6
- Specify the desired response from the model.
Chapter 4. YAML creation practices for optimizing model performance Copy linkLink copied to clipboard!
The guidelines referenced in Adding knowledge to your taxonomy tree and Adding skills to your taxonomy tree provide standard instructions on creating skills and knowledge YAML files. However, there are ways you can to improve your YAML file to optimize the syndetic data generated and create a higher quality model.
Diverse and comprehensive content in the context field of the YAML file
Each context block should contain a variety information and format types from your document. This allows the model to learn different methods of presenting information. These different information presentation types can include: paragraphs, different types tables, lists, procedures and definitions.
The context block should be a comprehensive example from your document. The total length of the context content and the Q&A pairs should not exceed 750 tokens.
Writing effective questions
The questions should align with the type of questions you want the model to be capable of answering. Each question should be unique and reference the information in the context field. Including full sentence questions improves the generated synthetic data and improves model response quality.
Writing effective answers
The answer directly responds to the question and should reflect the type of answer you want the model to be capable of providing. Answers should be in complete sentences and reference the original question. Including full sentence answers improves the generated synthetic data and improves model response quality.
The answer should not be directly copied from the the context block, this can cause the model to learn extraction instead of reasoning.
The information to answer the question must be in the context block. If the information is in a separate context block, or is not referenced at all, the model can hallucinate.
Example of a high-quality question-and-answer pair
- question: How many eggs are needed to make roughly 24 chocolate chip cookies?
answer: You need around two eggs to make 24 chocolate chip cookies.
When to use multiple documents or multiple qna.yaml files
If multiple documents are related to a similar subject or domain, it is recommended to use a single qna.yaml file. Each qna.yaml file must contain a singular document type, you cannot mix document types in your YAML file.
If the documents are not related, it is recommended to to use separate qna.yaml files.
Adding Links in your YAML file
Models can memorize link, so you can add them to your YAML file. However, its recommended to avoid adding hyperlinks if they change frequently.