You can further limit the vector similarity search by providing a filter based on a specific metadata criteria.
Queries with metadata filters only return vectors which have metadata matching with the filter.
Upstash Vector allows you to filter keys which have the following value types:
Filtering is implemented as a combination of in and post-filtering. Every query is assigned a filtering budget,
determining the number of candidate vectors that can be compared against the filter during query execution. If this
budget is exceeded, the system fallbacks into post-filtering. Therefore, with highly selective filters, fewer
than topK
vectors may be returned.
A filter has a syntax that resembles SQL, which consists of operators on object keys and boolean operators to combine them.
Assuming you have a metadata like below:
Then, you can query similar vectors with a filter like below:
The equals operator filters keys whose value is equal to given literal.
It is applicable to string, number, and boolean values.
The equals operator filters keys whose value is not equal to given literal.
It is applicable to string, number, and boolean values.
The less than operator filters keys whose value is less than the given literal.
It is applicable to number values.
The less than or equals operator filters keys whose value is less than or equal to the given literal.
It is applicable to number values.
The greater than operator filters keys whose value is greater than the given literal.
It is applicable to number values.
The greater than or equals operator filters keys whose value is greater than or equal to the given literal.
It is applicable to number values.
The glob operator filters keys whose value matches the given UNIX glob pattern.
It is applicable to string values.
It is a case sensitive operator.
The glob operator supports the following wildcards:
*
matches zero or more characters.?
matches exactly one character.[]
matches one character from the list
[abc]
matches either a
, b
, or c
.[a-z]
matches one of the range of characters from a
to z
.[^abc]
matches any one character other than a
, b
, or c
.[^a-z]
matches any one character other than a
to z
.For example, the filter below would only match with city names whose second character is s
or z
,
and ends with anything other than m
to z
.
The not glob operator filters keys whose value does not match the given UNIX glob pattern.
It is applicable to string values.
It has the same properties with the glob operator.
For example, the filter below would only match with city names whose first character is anything other than A
.
The in operator filters keys whose value is equal to any of the given literals.
It is applicable to string, number, and boolean values.
Semantically, it is equivalent to equals operator applied to all of the given literals with OR
boolean operator in between:
The not in operator filters keys whose value is not equal to any of the given literals.
It is applicable to string, number, and boolean values.
Semantically, it is equivalent to not equals operator applied to all of the given literals with AND
boolean operator in between:
The contains operator filter keys whose value contains the given literal.
It is applicable to array
values.
The not contains operator filter keys whose value does not contain the given literal.
It is applicable to array
values.
Operators above can be combined with AND
and OR
boolean operators to form
compound filters.
Boolean operators can be grouped with parentheses to have higher precendence.
When no parentheses are provided in ambigous filters, AND
will have higher
precendence than OR
. So, the filter
would be equivalent to
It is possible to filter nested object keys by referencing them with the .
accessor.
Nested objects can be at arbitrary depths, so more than one .
accessor can be used
in the same identifier.
Apart from the CONTAINS
and NOT CONTAINS
operators, individual array elements can also
be filtered by referencing them with the []
accessor by their indexes.
Indexing is zero based.
Also, it is possible to index from the back using the #
character with negative values.
#
can be thought as the number of elements in the array, so [#-1]
would reference the
last character.
[a-zA-Z_][a-zA-Z_0-9.[\]#-]*
. In simpler terms, they should
start with characters from the English alphabet or _
, and can continue with same characters plus numbers and other accessors
like .
, [0]
, or [#-1]
.1
or 0
.You can further limit the vector similarity search by providing a filter based on a specific metadata criteria.
Queries with metadata filters only return vectors which have metadata matching with the filter.
Upstash Vector allows you to filter keys which have the following value types:
Filtering is implemented as a combination of in and post-filtering. Every query is assigned a filtering budget,
determining the number of candidate vectors that can be compared against the filter during query execution. If this
budget is exceeded, the system fallbacks into post-filtering. Therefore, with highly selective filters, fewer
than topK
vectors may be returned.
A filter has a syntax that resembles SQL, which consists of operators on object keys and boolean operators to combine them.
Assuming you have a metadata like below:
Then, you can query similar vectors with a filter like below:
The equals operator filters keys whose value is equal to given literal.
It is applicable to string, number, and boolean values.
The equals operator filters keys whose value is not equal to given literal.
It is applicable to string, number, and boolean values.
The less than operator filters keys whose value is less than the given literal.
It is applicable to number values.
The less than or equals operator filters keys whose value is less than or equal to the given literal.
It is applicable to number values.
The greater than operator filters keys whose value is greater than the given literal.
It is applicable to number values.
The greater than or equals operator filters keys whose value is greater than or equal to the given literal.
It is applicable to number values.
The glob operator filters keys whose value matches the given UNIX glob pattern.
It is applicable to string values.
It is a case sensitive operator.
The glob operator supports the following wildcards:
*
matches zero or more characters.?
matches exactly one character.[]
matches one character from the list
[abc]
matches either a
, b
, or c
.[a-z]
matches one of the range of characters from a
to z
.[^abc]
matches any one character other than a
, b
, or c
.[^a-z]
matches any one character other than a
to z
.For example, the filter below would only match with city names whose second character is s
or z
,
and ends with anything other than m
to z
.
The not glob operator filters keys whose value does not match the given UNIX glob pattern.
It is applicable to string values.
It has the same properties with the glob operator.
For example, the filter below would only match with city names whose first character is anything other than A
.
The in operator filters keys whose value is equal to any of the given literals.
It is applicable to string, number, and boolean values.
Semantically, it is equivalent to equals operator applied to all of the given literals with OR
boolean operator in between:
The not in operator filters keys whose value is not equal to any of the given literals.
It is applicable to string, number, and boolean values.
Semantically, it is equivalent to not equals operator applied to all of the given literals with AND
boolean operator in between:
The contains operator filter keys whose value contains the given literal.
It is applicable to array
values.
The not contains operator filter keys whose value does not contain the given literal.
It is applicable to array
values.
Operators above can be combined with AND
and OR
boolean operators to form
compound filters.
Boolean operators can be grouped with parentheses to have higher precendence.
When no parentheses are provided in ambigous filters, AND
will have higher
precendence than OR
. So, the filter
would be equivalent to
It is possible to filter nested object keys by referencing them with the .
accessor.
Nested objects can be at arbitrary depths, so more than one .
accessor can be used
in the same identifier.
Apart from the CONTAINS
and NOT CONTAINS
operators, individual array elements can also
be filtered by referencing them with the []
accessor by their indexes.
Indexing is zero based.
Also, it is possible to index from the back using the #
character with negative values.
#
can be thought as the number of elements in the array, so [#-1]
would reference the
last character.
[a-zA-Z_][a-zA-Z_0-9.[\]#-]*
. In simpler terms, they should
start with characters from the English alphabet or _
, and can continue with same characters plus numbers and other accessors
like .
, [0]
, or [#-1]
.1
or 0
.