Overview
Retrieval-augmented generation (RAG) is a technique used to improve the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM’s internal representation of information. RAG is particularly useful for reducing hallucinations and citing specific sources in business use cases. Using our Accelerator Block for RAG, you’ll receive a fully managed, production grade RAG service that can be set up in seconds.
If you want to learn more about RAG, check out our RAG 101 guide here.
Create an account and workspace
If you haven't already, go to gradient.ai, click sign up and create your account. Once you have verified your email, log in to the account. Click "Create New Workspace" and give it a name.
You can see your workspaces at any time by going to https://auth.gradient.ai/select-workspace.
Create a RAG and add files
You can use either the web UI or the SDK to create a RAG and add files to it.
Using the Gradient UI
Gradient allows you to easily drop in the data you want to use, and the RAG pipeline is automatically constructed behind the scenes.
-
Collect all the relevant documents you want your LLM to reference when generating a response.
-
Log into Gradient and select the workspace you want to use.
-
Select the “RAG Collections” tab in the left sidebar, then click the “Create” button. Each RAG Collection should include the full set of documents you want the LLM to reference during completions for a particular use case.
-
Name your RAG Collection and start adding files. Please ensure you keep your browser tab open while the files are uploading.
Once the files have been uploaded, they will be automatically chunked, embedded, and stored into a vector database behind the scenes for you. This process may take several minutes, so feel free to check back in a bit.
Using the SDK
The SDK is best suited for integrating RAG into your application or business workflow.
Reference our SDK Quickstart to get set up. From there, follow the steps below.
-
Create your RAG with an initial set of files. Once the files have been uploaded, they will be automatically chunked, embedded, and stored into a vector database behind the scenes for you. This process may take several minutes, so feel free to check back in a bit.
from gradientai import Gradient gradient = Gradient() rag_collection = gradient.create_rag_collection( name="RAG with two sample text files", slug="bge-large", filepaths=[ "samples/a.txt", "samples/b.txt", ], )
-
If you'd like to specify chunking parameters (such as chunk size and chunk overlap), you can optionally specify the parser. Currently, you can use SimpleNodeParser to supply the chunk size and overlap values.
from gradientai import Gradient, SimpleNodeParser gradient = Gradient() custom_parser = SimpleNodeParser(chunk_size=1024, chunk_overlap=20) rag_collection = gradient.create_rag_collection( name="RAG with chunking parameters specified", parser=custom_parser, slug="bge-large", filepaths=[ "samples/a.txt" ], )
-
Follow the steps below to add additional files to your RAG.
-
You can find the ID on the “RAG Collections” tab in the Gradient UI. You will use this ID to reference the RAG in other parts of the Gradient API, SDK, or CLI.
-
Use the code snippet to add more files to your existing RAG.
from gradientai import Gradient gradient = Gradient() rag_id = "12345678-abcd-ef01-1234-9876543210ab_rag_config" rag_collection = gradient.get_rag_collection(id_=rag_id) rag_collection.add_files(filepaths = ["docs/one.txt", "docs/two.txt"])
-
Use your RAG for completions
Once your RAG Collection is set up, you can reference the RAG when running completions using Gradient.
Using the SDK
When using the SDK for completions, simply select and pass in the ID of the RAG collection you want the LLM to reference for that completion request.
Note that you can find the ID on the “RAG Collections” tab in the Gradient UI.
Using the "Model Testing" tab
You can also easily try out your new RAG Collection in the "Model Testing" tab. Simply toggle on RAG in the settings and select the name of the RAG Collection to use.
Updated 3 days ago