[langchain] 로컬 RAG 애플리케이션 만들기

Langchain 공식문서의 내용을 정리한 것입니다. 내용 및 예제는 일부 변경하였지만 가능한 구조는 유지했습니다.

Build a Local RAG Application - https://python.langchain.com/docs/tutorials/local_rag/

llama.cpp, Ollama, llamafile과 같은 프로젝트의 인기는 LLM을 로컬에서 실행하는 것의 중요성을 강조한다.

LangChain은 로컬에서 실행할 수 있는 많은 오픈 소스 LLM 제공업체와 통합되어 있다.

이 가이드는 Ollama를 통해 LLaMA 3.1을 로컬(예: 노트북)에서 실행하는 방법을 보여준다. 로컬 임베딩과 로컬 LLM을 사용한다. 그러나 원하신다면 LlamaCPP와 같은 다른 로컬 제공업체를 설정하고 교체할 수 있다.

참고: 이 가이드는 사용 중인 특정 로컬 모델에 맞게 입력 프롬프트를 형식화하는 챗 모델 래퍼를 사용한다. 그러나 텍스트 입력/출력 LLM 래퍼를 사용하여 로컬 모델에 직접 프롬프트를 제공하는 경우, 특정 모델에 맞춘 프롬프트를 사용해야 할 수도 있다. 이는 종종 특수 토큰의 포함을 요구한다. LLaMA 2의 예시는 다음과 같다.

준비

먼저 Ollama를 설정해야 한다.

그들의 GitHub 리포지토리에서 제공하는 지침은 다음과 같이 요약된다.

데스크탑 앱 다운로드 및 실행: Ollama의 데스크탑 앱을 다운로드하여 실행한다.
명령줄에서 모델 가져오기

: 다음 옵션 목록에서 모델을 가져온다. 이 가이드에서는 다음 모델이 필요하다.
- 일반 용도 모델인 llama3.1:8b는 ollama pull llama3.1:8b와 같은 명령어로 가져올 수 있다.
- 텍스트 임베딩 모델인 nomic-embed-text는 ollama pull nomic-embed-text와 같은 명령어로 가져올 수 있다.
앱 실행: 앱이 실행되면 모든 모델이 자동으로 localhost:11434에서 제공된다.

모델 선택은 하드웨어 성능에 따라 달라질 수 있다.

다음으로, 로컬 임베딩, 벡터 저장소 및 추론에 필요한 패키지를 설치한다.

# Document loading, retrieval methods and text splitting
%pip install -qU langchain langchain_community

# Local vector store via Chroma
%pip install -qU langchain_chroma

# Local inference and embeddings via Ollama
%pip install -qU langchain_ollama

# Web Loader
% pip install -qU beautifulsoup4

사용 가능한 임베딩 모델의 전체 목록은 이 페이지를 참조하실 수 있습니다.

문서 로딩

이제 예제 문서를 로드하고 분할해 보자.

Lilian Weng의 에이전트에 관한 블로그 게시물을 예제로 사용할 것이다.

from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

다음 단계에서는 벡터 저장소를 초기화한다. 우리는 nomic-embed-text를 사용하지만, 다른 제공업체나 옵션도 탐색할 수 있다.

from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings

local_embeddings = OllamaEmbeddings(model="nomic-embed-text")

vectorstore = Chroma.from_documents(documents=all_splits, embedding=local_embeddings)

이제 작동하는 벡터 저장소가 준비되었다! 유사성 검색이 작동하는지 테스트해 보자.:

question = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(question)
len(docs)

docs[0]

Document(metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log"}, page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.')

다음으로 모델을 설정한다. 여기서는 Ollama의 llama3.1:8b를 사용하지만, 하드웨어 설정에 따라 다른 제공업체나 모델 옵션을 탐색할 수 있다.

from langchain_ollama import ChatOllama

model = ChatOllama(
    model="llama3.1:8b",
)

설정이 제대로 되었는지 확인하기 위해 테스트해 보자.

response_message = model.invoke(
    "Simulate a rap battle between Stephen Colbert and John Oliver"
)

print(response_message.content)

**The scene is set: a packed arena, the crowd on their feet. In the blue corner, we have Stephen Colbert, aka "The O'Reilly Factor" himself. In the red corner, the challenger, John Oliver. The judges are announced as Tina Fey, Larry Wilmore, and Patton Oswalt. The crowd roars as the two opponents face off.**

**Stephen Colbert (aka "The Truth with a Twist"):**
Yo, I'm the king of satire, the one they all fear
My show's on late, but my jokes are clear
I skewer the politicians, with precision and might
They tremble at my wit, day and night

**John Oliver:**
....

**The crowd goes wild as both opponents take a bow. The rap battle may be over, but the satire war is just beginning...

체인에서 사용하기

우리는 검색된 문서와 간단한 프롬프트를 전달하여 두 모델 중 하나로 요약 체인을 생성할 수 있다.

이 체인은 제공된 입력 키 값을 사용하여 프롬프트 템플릿을 형식화하고, 형식화된 문자열을 지정된 모델에 전달한다.

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    "Summarize the main themes in these retrieved docs: {docs}"
)


# Convert loaded documents into strings by concatenating their content
# and ignoring metadata
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


chain = {"docs": format_docs} | prompt | model | StrOutputParser()

question = "What are the approaches to Task Decomposition?"

docs = vectorstore.similarity_search(question)

chain.invoke(docs)

'The main themes in these documents are:\n\n1. **Task Decomposition**: The process of breaking down complex tasks into smaller, manageable subgoals is crucial for efficient task handling.\n2. **Autonomous Agent System**: A system powered by Large Language Models (LLMs) that can perform planning, reflection, and refinement to improve the quality of final results.\n3. **Challenges in Planning and Decomposition**:\n\t* Long-term planning and task decomposition are challenging for LLMs.\n\t* Adjusting plans when faced with unexpected errors is difficult for LLMs.\n\t* Humans learn from trial and error, making them more robust than LLMs in certain situations.\n\nOverall, the documents highlight the importance of task decomposition and planning in autonomous agent systems powered by LLMs, as well as the challenges that still need to be addressed.'

Q&A

로컬 모델과 벡터 저장소를 사용하여 질문-답변을 수행할 수도 있다. 다음은 간단한 문자열 프롬프트를 사용한 예이다.

from langchain_core.runnables import RunnablePassthrough

RAG_TEMPLATE = """
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

<context>
{context}
</context>

Answer the following question:

{question}"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

chain = (
    RunnablePassthrough.assign(context=lambda input: format_docs(input["context"]))
    | rag_prompt
    | model
    | StrOutputParser()
)

question = "What are the approaches to Task Decomposition?"

docs = vectorstore.similarity_search(question)

# Run
chain.invoke({"context": docs, "question": question})

'Task decomposition can be done through (1) simple prompting using LLM, (2) task-specific instructions, or (3) human inputs. This approach helps break down large tasks into smaller, manageable subgoals for efficient handling of complex tasks. It enables agents to plan ahead and improve the quality of final results through reflection and refinement.'

검색(retrieval)용 Q&A

마지막으로, 문서를 수동으로 전달하는 대신 사용자 질문에 따라 벡터 저장소에서 자동으로 검색할 수 있다.

retriever = vectorstore.as_retriever()

qa_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | model
    | StrOutputParser()
)

question = "What are the approaches to Task Decomposition?"

qa_chain.invoke(question)

'Task decomposition can be done through (1) simple prompting in Large Language Models (LLM), (2) using task-specific instructions, or (3) with human inputs. This process involves breaking down large tasks into smaller, manageable subgoals for efficient handling of complex tasks.'

저작자표시 (새창열림)

준비

문서 로딩

체인에서 사용하기

Q&A

검색(retrieval)용 Q&A

티스토리툴바