Tips & Tricks

📦 Preparing your data

Technically speaking, Llama 2 and Nous Hermes 2 should both be able to "understand" either templating format. However, we have seen that different base models perform better by following specific formatting schema, and those are reflected in our recommendations below. Feel free to experiment for yourself and see what gets the best results for your dataset and use case!

When fine-tuning on llama-7b-chat, we recommend formatting each data sample like this:

{ "inputs": "<s>[INST] <<<<SYS>>>>\n{{ system_prompt }}\n<</SYS>>\n\n{{ user_message }} [/INST] {{ response }} </s>" }

Note that the Llama 2 template offers a system prompt space to help shape the LLM responses; this is optional and may be omitted.

For nous-hermes2, we recommend formatting each data sample like this:

{ "inputs": "<s>### Instruction:\n{{ user_message }}\n\n### Response:\n{{ response }}</s>" }

Example data point:

Llama 2 7b

{ "inputs": "<s>[INST] <<<<SYS>>>>\nYou are a helpful assistant who gives concise answers to questions about technology companies.\n<</SYS>>\n\nWhat is the address of the company known as Gradient? [/INST] 123 Main Street, San Francisco, CA 94107 </s>" }

Nous Hermes 2

{ "inputs": "<s>### Instruction:\nWhat is the address of the company known as Gradient?\n\n### Response:\n123 Main Street, San Francisco, CA 94107 </s>" }

It is important to have the correct number and placement of \n and other spaces and/or punctuation to result in the best model performance.

⚙️ Adjust how your model learns

There are a few techniques you can use to adjust how your model is being fine-tuned.

Learning rate and rank

Parameter	Description	Python
Learning Rate	Determines how fast a model updates its knowledge during fine-tuning.	`learning_rate: Optional[float] = None`
Rank	Changes the size of the weights that are being updated when you are fine-tuning. This reduces model size and computational needs while retaining performance. A higher rank is recommended for higher complexity tasks and if you have more net new data samples that are "dissimilar" to the data the foundational model was trained on (i.e., the data is your own private data, rather than datasets publicly available online).	`rank: Optional[int] = None`

You can set your learning rate and rank when you create your model adapter:

new_model_adapter = base_model.create_model_adapter(
    name="my test model adapter"
    learning_rate=0.1
)

To learn more about rank, please see the paper LoRA: Low-Rank Adaptation of Large Language Models.

Note: Learning rate and rank cannot be adjusted after initial fine-tuning. We also do not currently support passing an epoch argument.

Multiplier

You can also set a multiplier in your training data to adjust the learning rate on each sample. The multiplier applies a scaling factor to the learning rate so that you can control how quickly or slowly a model learns during fine-tuning. For example, consider these two lines in a fine-tuning dataset JSONL file:

{ "inputs": "<s>### Instruction:\nWhat is my name?\n\n### Response:\nYour name is Taylor</s>", "fineTuningParameters": { "multiplier": 2.0 } }
{ "inputs": "<s>### Instruction:\nWhat is my mother's name?\n\n### Response:\nHer name is Andrea</s>", "fineTuningParameters": { "multiplier": 1.5 } }

🔧 Adjust how your model generates completions

You can use a few parameters to adjust how your model generates a completion.

Completion length

Parameter	Description	Python	CLI
Completion Length	Adjusts your completion length by setting the maximum number of tokens.	`max_generated_token_count: Optional[int] = None`	`--max-generated-token-count=<int>`

When using the Python SDK, you can add this parameter to the complete function:

completion = new_model_adapter.complete(query=sample_query, max_generated_token_count=128).generated_output

In the CLI, the parameter can be added at the end of your existing command:

$ gradient model complete 3c7a5aec-94c0-42bb-956c-5c186ff20bc7_model_adapter "<s>### Instruction:\nWhat is Gradient?\n\n### Response:\n" --max-generated-token-count=128

Temperature, top-k and top-p

You can use a few techniques to influence the content of the output generated from your model.

Parameter	Description	Python	CLI
Temperature	Adjusts the “sharpness” of the probability distribution. Higher temperature (>1) results in more randomness, lower temperature (value closer to 0) results in more deterministic outputs.	`temperature: Optional[float] = None`	`--temperature=<float>`
Top-K	Restricts the model to pick from k most likely words, adding diversity without extreme randomness. The value of k may need to be adjusted in different contexts. Common values are in the range of 40-100.	`top_k: Optional[int] = None`	`--top-k=<int>`
Top-P	Restricts the model to pick the next word from a subset of words where the cumulative probability is just below p. This completion may add diversity than top-k because the number of words considered can vary. P is a float between 0-1, and in practice it is typically between 0.7-0.95.	`top_p: Optional[float] = None`	`--top-p=<float>`

You can use a combination of these techniques to achieve the balance between diversity and coherency that you desire. We recommend experimenting to find the optimal settings for your specific use case.

When using the Python SDK, you can add these parameters to the complete function:

completion = new_model_adapter.complete(query=sample_query, temperature=0.7, top_k=50).generated_output

In the CLI, the parameters can be added at the end of your existing command:

$ gradient model complete 3c7a5aec-94c0-42bb-956c-5c186ff20bc7_model_adapter "<s>### Instruction:\nWhat is Gradient?\n\n### Response:\n" --temperature=0.7 --top-k=50

📚 Batch fine-tuning

There are often limits on how much data a model can be trained on at a given time, making it difficult to fine-tune a custom LLM.

Gradient makes this process easier by supporting batch fine-tuning. You can continue to train your models on virtually infinite amounts of data by running the fine-tune command in the CLI with more training data files. Note that if you turn off your computer, the fine-tune process will halt, and you will need to restart the process (so don't close your laptop!)