Vearch
Vearch is the vector search infrastructure for deeping learning and AI applications.
Setting up
Follow instructions.
You'll need to install langchain-community with pip install -qU langchain-community to use this integration
%pip install --upgrade --quiet  vearch
# OR
%pip install --upgrade --quiet  vearch_cluster
Example
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores.vearch import Vearch
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from transformers import AutoModel, AutoTokenizer
# repalce to your local model path
model_path = "/data/zhx/zhx/langchain-ChatGLM_new/chatglm2-6b"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda(0)
Loading checkpoint shards: 100%|██████████| 7/7 [00:07<00:00,  1.01s/it]
query = "你好!"
response, history = model.chat(tokenizer, query, history=[])
print(f"Human: {query}\nChatGLM:{response}\n")
query = "你知道凌波微步吗,你知道都有谁学会了吗?"
response, history = model.chat(tokenizer, query, history=history)
print(f"Human: {query}\nChatGLM:{response}\n")
Human: 你好!
ChatGLM:你好👋!我是人工智能助手 ChatGLM2-6B,很高兴见到你,欢迎问我任何问题。
Human: 你知道凌波微步吗,你知道都有谁学会了吗?
ChatGLM:凌波微步是一种步伐,最早出自《倚天屠龙记》。在电视剧《人民的名义》中,侯亮平也学会了凌波微步。
# Add your local knowledge files
file_path = "/data/zhx/zhx/langchain-ChatGLM_new/knowledge_base/天龙八部/lingboweibu.txt"  # Your local file path"
loader = TextLoader(file_path, encoding="utf-8")
documents = loader.load()
# split text into sentences and embedding the sentences
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
# replace to your model path
embedding_path = "/data/zhx/zhx/langchain-ChatGLM_new/text2vec/text2vec-large-chinese"
embeddings = HuggingFaceEmbeddings(model_name=embedding_path)
No sentence-transformers model found with name /data/zhx/zhx/langchain-ChatGLM_new/text2vec/text2vec-large-chinese. Creating a new one with MEAN pooling.
# first add your document into vearch vectorstore
vearch_standalone = Vearch.from_documents(
    texts,
    embeddings,
    path_or_url="/data/zhx/zhx/langchain-ChatGLM_new/knowledge_base/localdb_new_test",
    table_name="localdb_new_test",
    flag=0,
)
print("***************after is cluster res*****************")
vearch_cluster = Vearch.from_documents(
    texts,
    embeddings,
    path_or_url="http://test-vearch-langchain-router.vectorbase.svc.ht1.n.jd.local",
    db_name="vearch_cluster_langchian",
    table_name="tobenumone",
    flag=1,
)
docids ['18ce6747dca04a2c833e60e8dfd83c04', 'aafacb0e46574b378a9f433877ab06a8', '9776bccfdd8643a8b219ccee0596f370']
***************after is cluster res*****************
docids ['1841638988191686991', '-4519586577642625749', '5028230008472292907']
query = "你知道凌波微步吗,你知道都有谁会凌波微步?"
vearch_standalone_res = vearch_standalone.similarity_search(query, 3)
for idx, tmp in enumerate(vearch_standalone_res):
    print(f"{'#'*20}第{idx+1}段相关文档{'#'*20}\n\n{tmp.page_content}\n")
# combine your local knowleadge and query
context = "".join([tmp.page_content for tmp in vearch_standalone_res])
new_query = f"基于以下信息,尽可能准确的来回答用户的问题。背景信息:\n {context} \n 回答用户这个问题:{query}\n\n"
response, history = model.chat(tokenizer, new_query, history=[])
print(f"********ChatGLM:{response}\n")
print("***************************after is cluster res******************************")
query_c = "你知道凌波微步吗,你知道都有谁会凌波微步?"
cluster_res = vearch_cluster.similarity_search(query_c, 3)
for idx, tmp in enumerate(cluster_res):
    print(f"{'#'*20}第{idx+1}段相关文档{'#'*20}\n\n{tmp.page_content}\n")
# combine your local knowleadge and query
context_c = "".join([tmp.page_content for tmp in cluster_res])
new_query_c = f"基于以下信息,尽可能准确的来回答用户的问题。背景信息:\n {context_c} \n 回答用户这个问题:{query_c}\n\n"
response_c, history_c = model.chat(tokenizer, new_query_c, history=[])
print(f"********ChatGLM:{response_c}\n")
####################第1段相关文档####################
午饭过后,段誉又练“凌波微步”,走一步,吸一口气,走第二步时将气呼出,六十四卦走完,四肢全无麻痹之感,料想呼吸顺畅,便无害处。第二次再走时连走两步吸一口气,再走两步始行呼出。这“凌波微步”是以动功修习内功,脚步踏遍六十四卦 一个周天,内息自然而然地也转了一个周天。因此他每走一遍,内力便有一分进益。
这般练了几天,“凌波微步”已走得颇为纯熟,不须再数呼吸,纵然疾行,气息也已无所窒滞。心意既畅,跨步时渐渐想到《洛神赋》中那些与“凌波微步”有关的句子:“仿佛兮若轻云之蔽月,飘飘兮若流风之回雪”,“竦轻躯以鹤立,若将飞而未翔”,“体迅飞凫,飘忽若神”,“动无常则,若危若安。进止难期,若往若还”。
百度简介
凌波微步是「逍遥派」独门轻功身法,精妙异常。
凌波微步乃是一门极上乘的轻功,所以列于卷轴之末,以易经八八六十四卦为基础,使用者按特定顺序踏着卦象方位行进,从第一步到最后一步正好行走一个大圈。此步法精妙异常,原是要待人练成「北冥神功」,吸人内力,自身内力已【颇为深厚】之后再练。
####################第2段相关文档####################
《天龙八部》第五回 微步縠纹生
卷轴中此外诸种经脉修习之法甚多,皆是取人内力的法门,段誉虽自语宽解,总觉习之有违本性,单是贪多务得,便非好事,当下暂不理会。
卷到卷轴末端,又见到了“凌波微步”那四字,登时便想起《洛神赋》中那些句子来:“凌波微步,罗袜生尘……转眄流精,光润玉颜。含辞未吐,气若幽兰。华容婀娜,令我忘餐。”曹子建那些千古名句,在脑海中缓缓流过:“秾纤得衷,修短合度,肩若削成,腰如约素。延颈秀项,皓质呈露。芳泽无加,铅华弗御。云髻峨峨,修眉连娟。丹唇外朗,皓齿内鲜。明眸善睐,靥辅承权。瑰姿艳逸,仪静体闲。柔情绰态,媚于语言……”这些句子用在木婉清身上,“这话倒也有理”;但如用之于神仙姊姊,只怕更为适合。想到神仙姊姊的姿容体态,“皎若太阳升朝霞,灼若芙蓉出绿波”,但觉依她吩咐行事,实为人生至乐,心想:“我先来练这‘凌波微步’,此乃  逃命之妙法,非害人之手段也,练之有百利而无一害。”
####################第3段相关文档####################
《天龙八部》第二回 玉壁月华明
再展帛卷,长卷上源源皆是裸女画像,或立或卧,或现前胸,或见后背。人像的面容都是一般,但或喜或愁,或含情凝眸,或轻嗔薄怒,神情各异。一共有三十六幅图像,每幅像上均有颜色细线,注明穴道部位及练功法诀。
帛卷尽处题着“凌波微步”四字,其后绘的是无数足印,注明“妇妹”、“无妄”等等字样,尽是《易经》中的方位。段誉前几日还正全心全意地钻研《易经》,一见到这些名称,登时精神大振,便似遇到故交良友一般。只见足印密密麻麻,不知有几千百个,自一个足印至另一个足印均有绿线贯串,线上绘有箭头,最后写着一行字道:“步法神妙,保身避敌,待积内力,再取敌命。”