πŸ¦™ Overview of the LlamaIndex Integration

LlamaIndex is a framework for Retrieval Augmented Generation (RAG) in AI applications. It allows users to retrieve relevant information from a vector database and produce LLM completions based on this additional context.

The Gradient and LlamaIndex integration enables users to effectively implement RAG on top of their private custom models in the Gradient platform. With the LlamaIndex framework, users can generate enhanced completions based on additional information retrieved from an indexed knowledge database.

Query Gradient LLM directly

You can run completions on your prompts for Gradient models directly from LlamaIndex.

from llama_index.llms import GradientBaseModelLLM

# You can also use a model adapter you've trained with GradientModelAdapterLLM
llm = GradientBaseModelLLM(

Retrieval Augmented Generation (RAG) with Gradient Embeddings

By leveraging the Gradient Embeddings API integration with LlamaIndex, you can enable RAG for your applications.

To start, simply set up the Gradient embeddings:

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.embeddings import GradientEmbedding

embed_model = GradientEmbedding(

service_context = ServiceContext.from_defaults(
    chunk_size=1024, llm=llm, embed_model=embed_model

Then set up the documents, query, and index:

documents = SimpleDirectoryReader("../australian_animals/data").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine()

response = query_engine.query("How many thumbs do koalas have?")


You can see an example of using LlamaIndex with Gradient here.