pip install graphrag
pip install ollama
1、ollama安装
直接从modelscope下载ollama安装包
modelscope download --model=modelscope/ollama-linux --local_dir ./ollama-linux
# 运行ollama安装脚本
sudo chmod 777 ./ollama-linux/ollama-modelscope-install.sh
sh ./ollama-linux/ollama-modelscope-install.sh
#启动ollama,最好在后台执行,该进程不可以中断
ollama serve
2、下载模型
在ollama中部署以下两个模型:mistral和nomic-embed-text
# llm
ollama pull mistral
# embedding
ollama pull nomic-embed-text
3、初始化工目录
在当前目录下新建一个目录./ragtest/input,用于存放我们初始化graphrag需要的相关文件。其中ragtest到时候系统会参数其他配置文件和目录,input目录需要我们自己放置要解析的txt文件。
mkdir -p ./ragtest/input
运行命令graphrag.index --init。由于我们在上一步中已经配置了一个名为ragtest的目录,因此我们可以运行以下命令:
python -m graphrag.index --init --root ./ragtest
4、修改settings.yaml等配置
默认graphRag使用的openia的key,要改用本地的llm需要修改相应的代码和配置,我已经整理好修改步骤,按如下步骤修改即可。
1)settings.yaml配置文件
根据红色的位置,修改ragtest目录下的settings.yaml相应的内容
2)openai_embeddings_llm.py嵌入式向量脚本
class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]): _client: OpenAIClientTypes _configuration: OpenAIConfiguration def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration): self._client = client self._configuration = configuration async def _execute_llm( self, input: EmbeddingInput, **kwargs: Unpack[LLMInput] ) -> EmbeddingOutput | None: args = { "model": self._configuration.model, **(kwargs.get("model_parameters") or {}), } embedding_list = [] for inp in input: embedding = ollama.embeddings(model="nomic-embed-text", prompt=inp) embedding_list.append(embedding["embedding"]) return embedding_list
屏蔽脚本原来OpenAIEmbeddingsLLM函数,用上面最新脚本替换。若找不到文件位置,使用下列命令查找
find / -name openai_embeddings_llm.py
2)embedding.py本地(local)查询脚本
def embed(self, text: str, **kwargs: Any) -> list[float]: """Embed text using Ollama's nomic-embed-text model.""" try: embedding = self.ollama_client.embeddings(model="nomic-embed-text", prompt=text) return embedding["embedding"] except Exception as e: self._reporter.error( message="Error embedding text", details={self.__class__.__name__: str(e)}, ) return np.zeros(self.embedding_dim).tolist() async def aembed(self, text: str, **kwargs: Any) -> list[float]: """Embed text using Ollama's nomic-embed-text model asynchronously.""" try: embedding = await self.ollama_client.embeddings(model="nomic-embed-text", prompt=text) return embedding["embedding"] except Exception as e: self._reporter.error( message="Error embedding text asynchronously", details={self.__class__.__name__: str(e)}, ) return np.zeros(self.embedding_dim).tolist()
屏蔽embedding.py脚本原来以上两个原函数,使用的新脚本替换
5、运行GraphRAG
前面的步骤完成后,运行索引,创建graph。
python -m graphrag.index --root ./ragtest
执行结果日志:
Logging enabled at ragtest/output/20240903-093443/reports/indexing-engine.log
⠹ GraphRAG Indexer
⠹ GraphRAG Indexer les loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
⠹ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
⠹ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
⠹ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
⠼ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
⠼ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
⠼ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
⠼ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
⠼ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
⠼ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
⠼ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
⠼ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
/usr/local/lib/python3.10/site-packages/numpy/core/fromnumeric.py:59:
FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a
future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
⠼ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━ 100% 0:00:… 0:00:…
└── create_base_text_units
标签:00,GraphRAG,text,100%,Indexer,LLM,loaded,ollama
From: https://blog.csdn.net/m0_38007743/article/details/141852489