LlamaIndex
π¦ Overview of the LlamaIndex Integration
LlamaIndex is a framework for Retrieval Augmented Generation (RAG) in AI applications. It allows users to retrieve relevant information from a vector database and produce LLM completions based on this additional context.
The Gradient and LlamaIndex integration enables users to effectively implement RAG on top of their private custom models in the Gradient platform. With the LlamaIndex framework, users can generate enhanced completions based on additional information retrieved from an indexed knowledge database.
Query Gradient LLM directly
You can run completions on your prompts for Gradient models directly from LlamaIndex.
from llama_index.llms import GradientBaseModelLLM
# You can also use a model adapter you've trained with GradientModelAdapterLLM
llm = GradientBaseModelLLM(
base_model_slug="llama2-7b-chat",
max_tokens=400,
)
Retrieval Augmented Generation (RAG) with Gradient Embeddings
By leveraging the Gradient Embeddings API integration with LlamaIndex, you can enable RAG for your applications.
To start, simply set up the Gradient embeddings:
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.embeddings import GradientEmbedding
embed_model = GradientEmbedding(
gradient_access_token=os.environ["GRADIENT_ACCESS_TOKEN"],
gradient_workspace_id=os.environ["GRADIENT_WORKSPACE_ID"],
gradient_model_slug="bge-large",
)
service_context = ServiceContext.from_defaults(
chunk_size=1024, llm=llm, embed_model=embed_model
)
Then set up the documents, query, and index:
documents = SimpleDirectoryReader("../australian_animals/data").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine()
response = query_engine.query("How many thumbs do koalas have?")
print(response)
Example
You can see an example of using LlamaIndex with Gradient here.
Updated 5 months ago