A PHP Client for Jina
A tool to connect to Jina with PHP. This client will not work without a running Jina installation.
To see how that is set up go here: Jina Installation
Jina Documentation
For more information about Jina go here: Jina
Install with composer command
composer require dco-ai/php-jina
Install using composer.json
from GitHub directly:
Add this to your composer.json
file or create the file and put this in it.
{
"name": "dco-ai/php-jina",
"repositories": [
{
"type": "svn",
"url": "https://github.com/Dco-ai/php-jina.git"
}
],
"require": {
"dco-ai/php-jina": "main"
}
}
or from Packagist:
{
"require": {
"dco-ai/php-jina": "v1.*"
}
}
now run composer with composer update
Configuration
This client needs to know a few things about your Jina project to make the connection.
The configuration is an associative array with the following fields:
Attribute |
Type |
Description |
---|---|---|
url (required) |
string |
The endpoint of your Jina application. This can be a public URL or a private one if this client is used on the same network. |
port (required) |
string |
The port used in your Jina application |
endpoints (required) |
associative array |
This is how this client knows what endpoint uses which method when making the curl request.
Since Jina allows you to make custom endpoints we need to know how to handle them.
The default is |
dataStore (optional) |
associative array |
This is an optional configuration used to identify the Data Store being used. Interaction between Data Stores inside of DocArray differs so this client needs to know in order to handle certain functionality accordingly. If no dataStore is identified then the default functions will be used which may cause unintended results. |
Usage
A small example is src/example.php
. This shows you how to load the class and then create/update Jina's Document
and DocumentArray structures.
First include the package in the header:
<?php
use DcoAi\PhpJina\JinaClient;
Then Instantiate the JinaClient class with your configuration:
$config = [
"url" => "localhost", // The URL or endpoint of your Jina installation
"port" => "1234", // The port used for your Jina Installation
"endpoints" => [ // These are the active endpoints in your Jina application with the corresponding method
"/status" => "GET",
"/post" => "POST",
"/index" => "POST",
"/search" => "POST",
"/delete" => "DELETE",
"/update" => "PUT",
"/" => "GET"
]
];
$jina = new JinaClient($config);
Now you can use these functions:
// this creates a Document that you can add data to the structure
$d = $jina->document();
// This creates a DocumentArray that Documents can be added to
$da = $jina->documentArray();
// This adds Documents to a DocumentArray
$jina->addDocument($da, $d);
// This sends the DocumentArray to your JinaClient application and returns the result.
$jina->submit("/index",$da);
Structures
Document
Attribute |
Type |
Description |
---|---|---|
id |
string |
A hexdigest that represents a unique document ID. It is recommended to let Jina set this value. |
blob |
bytes |
the raw binary content of this document, which often represents the original document |
tensor |
|
the ndarray of the image/audio/video document |
text |
string |
a text document |
granularity |
int |
the depth of the recursive chunk structure |
adjacency |
int |
the width of the recursive match structure |
parent_id |
string |
the parent id from the previous granularity |
weight |
float |
The weight of this document |
uri |
string |
a uri of the document could be: a local file path, a remote url starts with http or https or data URI scheme |
modality |
string |
modality, an identifier to the modality this document belongs to. In the scope of multi/cross modal search |
mime_type |
string |
mime type of this document, for blob content, this is required; for other contents, this can be guessed |
offset |
float |
the offset of the doc |
location |
float |
the position of the doc, could be start and end index of a string; could be x,y (top, left) coordinate of an image crop; could be the timestamp of an audio clip |
chunks |
array |
array of the sub-documents of this document (recursive structure) |
matches |
array |
array of matched documents on the same level (recursive structure) |
embedding |
|
the embedding of this document |
tags |
|
a structured data value, consisting of field which map to dynamically typed values. |
scores |
/stdClass |
Scores performed on the document, each element corresponds to a metric |
evaluations |
|
Evaluations performed on the document, each element corresponds to a metric |
DocumentArray
Attribute |
Type |
Description |
---|---|---|
data |
array |
an array of Documents |
parameters |
|
a key/value set of custom instructions to be passed along with the request to Jina |
targetExecutor |
string |
A string indicating an Executor to target. Default targets all Executors |
Filters
Filters are unique to each Data Store in DocArray. the structure and how they are passed is dependent on how you
have your Executors set up. For every example I am providing I am assuming your Executors accept a filter
key in the
parameters
section of the request. If your Executors are set up to accept filters in a different way you will need to
modify the request accordingly.
In this client you can build a filter by chaining together filter functions. First you have to create an instance of the
Filter
class with the useFilterFormatter()
function.
use DcoAi\PhpJina\JinaClient;
// set the config and create a new instance of the JinaClient
$config = [
"url" => "localhost",
"port" => "1234",
"endpoints" => [
"/status" => "GET",
"/post" => "POST",
"/index" => "POST",
"/search" => "POST",
"/delete" => "DELETE",
"/update" => "PUT",
"/" => "GET",
]
];
$jina = new JinaClient($config);
// create a new instance of the filter class
$filterBuilder = $jina->useFilterFormatter();
Now that you have the filter this is how you would chain together a basic filter:
$filterBuilder->
and()->
equal("env","dev")->
equal("userId","2")->
endAnd()->
or()->
notEqual("env","2")->
greaterThan("id","5")->
endOr()->
equal("env","dev")->
notEqual("env","prod");
Some Data Stores will have grouping operators like and
and or
that will allow you to group filters together.
If the Data Store has these operators there will be a closing function which corresponds to the opening function.
Once you have your filter built you will need to retrieve it from the Filter
class and add it to the request.
This is not done automatically.
// Lets make an empty DocumentArray
$da = $jina->documentArray();
// And add the filter to the parameters
$da->parameters->filter = $filterBuilder->createFilter();
// print ths document and see what we got
print_r(json_encode($da, JSON_PRETTY_PRINT));
This filter will produce a string like this:
{
"data": [],
"parameters": {
"filter": [
{
"$and": [
{
"env": {
"$eq": "dev"
}
},
{
"userId": {
"$eq": "2"
}
}
]
},
{
"$or": [
{
"env": {
"$ne": "2"
}
},
{
"id": {
"$gt": 5
}
}
]
},
{
"env": {
"$eq": "dev"
}
},
{
"env": {
"$ne": "prod"
}
}
]
}
}
This example is a bit complicated and probably not useful, but it shows what can be done.
Default Filter
DocArray has a default filter structure that can be used by this client without any configuration changes. Documentation can be found here: Documentation
This is a list of the operators that are supported by the Default filter. The $column
is the field you are filtering on
and the $value
is the value you are filtering on.
Query Operator |
Chainable Function |
Description |
---|---|---|
$eq |
equal($column, $value) |
Equal to (number, string) |
$ne |
notEqual($column, $value) |
Not equal to (number, string) |
$gt |
greaterThan($column, $value) |
Greater than (number) |
$gte |
greaterThanEqual($column, $value) |
Greater than or equal to (number) |
$lt |
lessThan($column, $value) |
Less than (number) |
$lte |
lessThanEqual($column, $value) |
Less than or equal to (number) |
$in |
in($column, $value) |
Is in an array |
$nin |
notIn($column, $value) |
Not in an array |
$regex |
regex($column, $value) |
Match the specified regular expression |
$size |
size($column, $value) |
Match array/dict field that have the specified size. |
$exists |
exists($column, $value) |
Matches documents that have the specified field; predefined fields having a default value
(for example empty string, or 0) are considered as not existing; if the expression specifies a field |
The list of combining functions for the Default filter is here:
Operator |
Chainable Function |
Closing Function |
Description |
---|---|---|---|
$and |
and() |
endAnd() |
Join query clauses with a logical AND by chaining operator function between these two functions. |
$or |
or() |
endOr() |
Join query clauses with a logical OR by chaining operator function between these two functions. |
$not |
not() |
endNot() |
Inverts the effect of a query expression that is chained between these two functions. |
AnnLite Filter
This filter uses the AnnLite Data Store and is very similar to the Default filter with some minor differences. Documentation can be found here: Documentation
To use this filter you must add the "type" => "annlite"
key to the dataStore
array in the configuration.
use DcoAi\PhpJina\JinaClient;
// set the config and create a new instance of the JinaClient
$config = [
"url" => "localhost",
"port" => "1234",
"endpoints" => [
"/status" => "GET",
"/post" => "POST",
"/index" => "POST",
"/search" => "POST",
"/delete" => "DELETE",
"/update" => "PUT",
"/" => "GET",
],
"dataStore" => [
"type" => "annlite",
]
];
$jina = new JinaClient($config);
Like all other filters you can build it using the chaining method. Here are the specific fields using this Data Store:
Query Operator |
Chainable Function |
Description |
---|---|---|
$eq |
equal($column, $value) |
Equal to (number, string) |
$ne |
notEqual($column, $value) |
Not equal to (number, string) |
$gt |
greaterThan($column, $value) |
Greater than (number) |
$gte |
greaterThanEqual($column, $value) |
Greater than or equal to (number) |
$lt |
lessThan($column, $value) |
Less than (number) |
$lte |
lessThanEqual($column, $value) |
Less than or equal to (number) |
$in |
in($column, $value) |
Is in an array |
$nin |
notIn($column, $value) |
Not in an array |
The list of combining functions for the Default filter is here:
Operator |
Chainable Function |
Closing Function |
Description |
---|---|---|---|
$and |
and() |
endAnd() |
Join query clauses with a logical AND by chaining operator function between these two functions. |
$or |
or() |
endOr() |
Join query clauses with a logical OR by chaining operator function between these two functions. |
Weaviate Filter
This filter uses the Weaviate Data Store and uses GraphQL as the query language. Since this language is dependent
on the schema in the DB we need to connect to your Weaviate instance and retrieve the schema to build the query.
This is done automatically, but you will need to add the url
and port
parameters when creating the JinaClient instance.
Documentation can be found here: Documentation
To use this filter you must add the "type" => "weaviate"
key to the dataStore
array in the configuration.
use DcoAi\PhpJina\JinaClient;
// set the config and create a new instance of the JinaClient
$config = [
"url" => "localhost",
"port" => "1234",
"endpoints" => [
"/status" => "GET",
"/post" => "POST",
"/index" => "POST",
"/search" => "POST",
"/delete" => "DELETE",
"/update" => "PUT",
"/" => "GET",
],
"dataStore" => [
"type" => "weaviate",
"url" => "localhost",
"port" => "8080",
]
];
$jina = new JinaClient($config);
Here are the specific fields using this Data Store:
Query Operator |
Chainable Function |
Description |
---|---|---|
Not |
not($column, $value) |
Exclude the value from the query |
Equal |
equal($column, $value) |
Equal to the value |
NotEqual |
notEqual($column, $value) |
Not equal to the value |
GreaterThan |
greaterThan($column, $value) |
Greater than the value |
GreaterThanEqual |
greaterThanEqual($column, $value) |
Greater than or equal to the value |
LessThan |
lessThan($column, $value) |
Less than the value |
LessThanEqual |
lessThanEqual($column, $value) |
Less than or equal to the value |
Like |
like($column, $value) |
Allows you to do string searches based on partial match |
WithinGeoRange |
withinGeoRange($column, $value) |
A special case of the Where filter is with geoCoordinates. If you've set the geoCoordinates property type, you can search in an area based on distance. |
IsNull |
isNull($column, $value=true or false) |
Allows you to do filter for objects where given properties are null or not null. Note that zero-length arrays and empty strings are equivalent to a null value. |
The list of combining functions for the Default filter is here:
Operator |
Chainable Function |
Closing Function |
Description |
---|---|---|---|
$and |
and() |
endAnd() |
Join query clauses with a logical AND by chaining operator function between these two functions. |
$or |
or() |
endOr() |
Join query clauses with a logical OR by chaining operator function between these two functions. |
All Other Data Store Filters
Currently, these are not supported but are planned for future releases. If you would like to contribute to this project please feel free to submit a pull request and reach out to me for any questions.
Response
To save on data transfer and memory when making calls to your Jina application this client will clean up the request and response automatically by removing any key where the value is not set. Keep this in mind when performing evaluations on the response by checking if the key exists first.
If you want all the values returned you can set a flag when using the submit()
function
// setting the third parameter to false will not remove any empty values from the response
$jina->submit("/index", $da, false);