A library for efficient similarity search and storage of deep learning vectors.


Keywords
real, time, index, vector, nearest, neighbors, audio-search, cloud-native, deep-learning, distributed-systems, document-retrieval, embeddings, face-recoginition, image-search, knn-search, langchain, llms, machine-learning, vector-database, vector-search, vectors, video-search, visual-search
License
Apache-2.0
Install
pip install vearch==3.3.2

Documentation

Build Status     Gitter

Overview

Vearch is a scalable distributed system for efficient similarity search of deep learning vectors.

Document

Quick start

Install Vearch

Deploy vearch cluster on k8s

Add charts through the repo

$ helm repo add vearch https://vearch.github.io/vearch-helm
$ helm repo update && helm install my-release vearch/vearch

Add charts from local

$ git clone https://github.com/vearch/vearch-helm.git
$ cd vearch-helm
$ make
$ helm install my-release ./charts -f ./charts/values.yaml

Start by docker-compose

$ cd cloud
$ cp ../config/config.toml.example config.toml
$ docker-compose up

Compile by source code

Quickly compile the source codes to build a distributed vector search system with RESTful API, please see SourceCompileDeployment.md.

Deploy a visual search system

Vearch can be leveraged to build a complete visual search system to index billions of images. The image retrieval plugin for object detection and feature extraction is also required. For more information, please refer to Quickstart.md.

Use python sdk

Vearch Python SDK enables vearch to use locally. Vearch python sdk can be installed easily by pip install vearch. For more information, please refer to APIPythonSDK.md.

APIs and Use Cases

LowLevelAPI

VisualSearchAPI

PythonSDKAPI

Components

Vearch Architecture

arc

Master Responsible for schema mananagement, cluster-level metadata, and resource coordination.
Router Provides RESTful API: `create` , `delete` `search` and `update` ; request routing, and result merging.
PartitionServer (PS) Hosts document partitions with raft-based replication.

Gamma is the core vector search engine implemented based on faiss. It provides the ability of storing, indexing and retrieving the vectors and scalars.

Benchmarks

Demo

docs/img/plugin/main_process.gif

Reference

Reference to cite when you use Vearch in a research paper:

@misc{li2019design,
      title={The Design and Implementation of a Real Time Visual Search System on JD E-commerce Platform}, 
      author={Jie Li and Haifeng Liu and Chuanghua Gui and Jianyu Chen and Zhenyun Ni and Ning Wang},
      year={2019},
      eprint={1908.07389},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}

Community

You can report bugs or ask questions in the issues page of the repository.

For public discussion of Vearch or for questions, you can also send email to vearch-maintainers@groups.io.

Our slack : https://vearchwrokspace.slack.com

Known Users

Welcome to register the company name in this issue: #230 (in order of registration)

欢迎在此 issue #230 中登记公司名称

科大讯飞 飞搜科技 君库科技 爱奇艺 人民科技 趣头条 网易严选 咸唐科技 华为技术 OPPO 汽车之家 芯翌智能 图灵机器人 金山云 汇智通信 小红书 VIVO 京东 中原消费金融

License

Licensed under the Apache License, Version 2.0. For detail see LICENSE and NOTICE.