Overview

Retrieval-augmented generation (RAG) is a technique used to improve the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM’s internal representation of information. RAG is particularly useful for reducing hallucinations and citing specific sources in business use cases. Using our Accelerator Block for RAG, you’ll receive a fully managed, production grade RAG service that can be set up in seconds.

If you want to learn more about RAG, check out our RAG 101 guide here.

Create an account and workspace

If you haven't already, go to gradient.ai, click sign up and create your account. Once you have verified your email, log in to the account. Click "Create New Workspace" and give it a name.

You can see your workspaces at any time by going to https://auth.gradient.ai/select-workspace.

Create a RAG and add files

You can use either the web UI or the SDK to create a RAG and add files to it.

Using the Gradient UI

Gradient allows you to easily drop in the data you want to use, and the RAG pipeline is automatically constructed behind the scenes.

  1. Collect all the relevant documents you want your LLM to reference when generating a response.

  2. Log into Gradient and select the workspace you want to use.

  3. Select the “RAG Collections” tab in the left sidebar, then click the “Create” button. Each RAG Collection should include the full set of documents you want the LLM to reference during completions for a particular use case.

  4. Name your RAG Collection and start adding files. Please ensure you keep your browser tab open while the files are uploading.

Once the files have been uploaded, they will be automatically chunked, embedded, and stored into a vector database behind the scenes for you. This process may take several minutes, so feel free to check back in a bit.

Using the SDK

The SDK is best suited for integrating RAG into your application or business workflow.

Reference our SDK Quickstart to get set up. From there, follow the steps below.

  1. Create your RAG with an initial set of files. Once the files have been uploaded, they will be automatically chunked, embedded, and stored into a vector database behind the scenes for you. This process may take several minutes, so feel free to check back in a bit.

    from gradientai import Gradient
    
    gradient = Gradient()
    
    rag_collection = gradient.create_rag_collection(
      name="RAG with two sample text files",
      slug="bge-large",
      filepaths=[
        "samples/a.txt",
        "samples/b.txt",
      ],
    )
    
  2. Follow the steps below to add additional files to your RAG.

    1. You can find the ID on the “RAG Collections” tab in the Gradient UI. You will use this ID to reference the RAG in other parts of the Gradient API, SDK, or CLI.

    2. Use the code snippet to add more files to your existing RAG.

      from gradientai import Gradient
      
      gradient = Gradient()
      
      rag_id = "12345678-abcd-ef01-1234-9876543210ab_rag_config"
      rag_collection = gradient.get_rag_collection(id_=rag_id)
      
      rag_collection.add_files(filepaths = ["docs/one.txt", "docs/two.txt"])
      

Use your RAG for completions

Once your RAG Collection is set up, you can reference the RAG when running completions using Gradient.

Using the SDK

When using the SDK for completions, simply select and pass in the ID of the RAG collection you want the LLM to reference for that completion request.

Note that you can find the ID on the “RAG Collections” tab in the Gradient UI.

Using the "Model Testing" tab

You can also easily try out your new RAG Collection in the "Model Testing" tab. Simply toggle on RAG in the settings and select the name of the RAG Collection to use.