Metadata Filtering
You can further limit the vector similarity search by providing a filter based on a specific metadata criteria.
Queries with metadata filters only return vectors which have metadata matching with the filter.
Upstash Vector allows you to filter keys which have the following value types:
- string
- number
- boolean
- object
- array
Filtering is implemented as a combination of in and post-filtering. Every query is assigned a filtering budget,
determining the number of candidate vectors that can be compared against the filter during query execution. If this
budget is exceeded, the system fallbacks into post-filtering. Therefore, with highly selective filters, fewer
than topK
vectors may be returned.
Filter Syntax
A filter has a syntax that resembles SQL, which consists of operators on object keys and boolean operators to combine them.
Assuming you have a metadata like below:
Then, you can query similar vectors with a filter like below:
Operators
Equals (=)
The equals operator filters keys whose value is equal to given literal.
It is applicable to string, number, and boolean values.
Not Equals (!=)
The equals operator filters keys whose value is not equal to given literal.
It is applicable to string, number, and boolean values.
Less Than (<)
The less than operator filters keys whose value is less than the given literal.
It is applicable to number values.
Less Than or Equals (<=)
The less than or equals operator filters keys whose value is less than or equal to the given literal.
It is applicable to number values.
Greater Than (>)
The greater than operator filters keys whose value is greater than the given literal.
It is applicable to number values.
Greater Than or Equals (>=)
The greater than or equals operator filters keys whose value is greater than or equal to the given literal.
It is applicable to number values.
Glob
The glob operator filters keys whose value matches the given UNIX glob pattern.
It is applicable to string values.
It is a case sensitive operator.
The glob operator supports the following wildcards:
*
matches zero or more characters.?
matches exactly one character.[]
matches one character from the list[abc]
matches eithera
,b
, orc
.[a-z]
matches one of the range of characters froma
toz
.[^abc]
matches any one character other thana
,b
, orc
.[^a-z]
matches any one character other thana
toz
.
For example, the filter below would only match with city names whose second character is s
or z
,
and ends with anything other than m
to z
.
Not Glob
The not glob operator filters keys whose value does not match the given UNIX glob pattern.
It is applicable to string values.
It has the same properties with the glob operator.
For example, the filter below would only match with city names whose first character is anything other than A
.
In
The in operator filters keys whose value is equal to any of the given literals.
It is applicable to string, number, and boolean values.
Semantically, it is equivalent to equals operator applied to all of the given literals with OR
boolean operator in between:
Not In
The not in operator filters keys whose value is not equal to any of the given literals.
It is applicable to string, number, and boolean values.
Semantically, it is equivalent to not equals operator applied to all of the given literals with AND
boolean operator in between:
Contains
The contains operator filter keys whose value contains the given literal.
It is applicable to array
values.
Not Contains
The not contains operator filter keys whose value does not contain the given literal.
It is applicable to array
values.
Boolean Operators
Operators above can be combined with AND
and OR
boolean operators to form
compound filters.
Boolean operators can be grouped with parentheses to have higher precendence.
When no parentheses are provided in ambigous filters, AND
will have higher
precendence than OR
. So, the filter
would be equivalent to
Filtering Nested Objects
It is possible to filter nested object keys by referencing them with the .
accessor.
Nested objects can be at arbitrary depths, so more than one .
accessor can be used
in the same identifier.
Filtering Array Elements
Apart from the CONTAINS
and NOT CONTAINS
operators, individual array elements can also
be filtered by referencing them with the []
accessor by their indexes.
Indexing is zero based.
Also, it is possible to index from the back using the #
character with negative values.
#
can be thought as the number of elements in the array, so [#-1]
would reference the
last character.
Miscellaneous
- Identifiers (the left side of the operators) should be of the form
[a-zA-Z_][a-zA-Z_0-9.[\]#-]*
. In simpler terms, they should start with characters from the English alphabet or_
, and can continue with same characters plus numbers and other accessors like.
,[0]
, or[#-1]
. - The string literals (strings in the right side of the operators) can be either single or double quoted.
- Boolean literals are represented as
1
or0
. - The operators, boolean operators, and boolean literals are case insensitive.