Este conteúdo não está disponível no idioma selecionado.

Chapter 2. Creating Ickle queries


Data Grid provides an Ickle query language that lets you create relational and full-text queries.

2.1. Ickle queries

To use the API, call the cache .query() method and provide the query string.

For instance:

// Remote Query using protobuf
Query<Transaction> q = remoteCache.query("from sample_bank_account.Transaction where amount > 20");

// Embedded Query using Java Objects
Query<Transaction> q = cache.query("from org.infinispan.sample.Book where price > 20");

// Execute the query
QueryResult<Book> queryResult = q.execute();
Note

A query will always target a single entity type and is evaluated over the contents of a single cache. Running a query over multiple caches or creating queries that target several entity types (joins) is not supported.

Executing the query and fetching the results is as simple as invoking the execute() method of the Query object. Once executed, calling execute() on the same instance will re-execute the query.

2.1.1. Pagination

You can limit the number of returned results by using the Query.maxResults(int maxResults). This can be used in conjunction with Query.startOffset(long startOffset) to achieve pagination of the result set.

// sorted by year and match all books that have "clustering" in their title
// and return the third page of 10 results
Query<Book> query = cache.query("FROM org.infinispan.sample.Book WHERE title like '%clustering%' ORDER BY year").startOffset(20).maxResults(10)
Note

If you don’t explicitly set the maxResults for a query instance, Data Grid limits the number of results returned by the query to 100. You can change the default limit by setting the query.default-max-results cache property.

2.1.2. Number of hits

The QueryResult object includes the .hitCount() method, which returns a hit count value that represents the total number of results from a query, regardless of any pagination parameter.

Additionally, QueryResult object contains a boolean value returned by the .isExact() method which indicates whether the hit count number is exact or a lower bound. The hit count is only available for indexed queries for performance reasons.

2.1.2.1. Hit count accuracy

You can limit the required accuracy of hit counts by setting hit-count-accuracy attribute. When dealing with large data sets, precise hit counts can impact performance. Setting a limit to the hit count accuracy, lets you achieve faster query responses while ensuring that the provided hit counts remain sufficiently accurate for your application’s needs.

The default accuracy of the hit-count-accuracy attribute is limited to 10000. This means that for any query, Data Grid provides exact hit count up to maximum of 10,000. If the effective hit count is higher than 10,000, Data Grid returns a lower bound estimate of the count. You can change the default limit by setting the query.hit-count-accuracy cache property. Alternatively, it can be set on each query instance.

When the actual hit count exceeds the limit set by the hit-count-accuracy, the .isExact() method or the hit_count_exact JSON field will be false, indicating that the returned hit count is an estimation. Setting this value to Integer.MAX would return accurate results for any query, but this can severely impact query performance.

For optimal performance set the property value slightly above the expected hit count. If you do not require accurate hit counts, set it to a low value.

2.1.3. Iteration

The Query object has the .iterator() method to obtain the results lazily. It returns an instance of CloseableIterator that must be closed after usage.

Note

The iteration support for Remote Queries is currently limited, as it will first fetch all entries to the client before iterating.

2.1.4. Named query parameters

Instead of building a new Query object for every execution it is possible to include named parameters in the query which can be substituted with actual values before execution. This allows a query to be defined once and be efficiently executed many times. Parameters can only be used on the right-hand side of an operator and are defined when the query is created by supplying an object produced by the org.infinispan.query.dsl.Expression.param(String paramName) method to the operator instead of the usual constant value. Once the parameters have been defined they can be set by invoking either Query.setParameter(parameterName, value) or Query.setParameters(parameterMap) as shown in the examples below. ⁠

// Defining a query to search for various authors and publication years
Query<Book> query = cache.query("SELECT title FROM org.infinispan.sample.Book WHERE author = :authorName AND publicationYear = :publicationYear").build();

// Set actual parameter values
query.setParameter("authorName", "Doe");
query.setParameter("publicationYear", 2010);

// Execute the query
List<Book> found = query.execute().list();

Alternatively, you can supply a map of actual parameter values to set multiple parameters at once: ⁠

Setting multiple named parameters at once

Map<String, Object> parameterMap = new HashMap<>();
parameterMap.put("authorName", "Doe");
parameterMap.put("publicationYear", 2010);

query.setParameters(parameterMap);

Note

A significant portion of the query parsing, validation and execution planning effort is performed during the first execution of a query with parameters. This effort is not repeated during subsequent executions leading to better performance compared to a similar query using constant values instead of query parameters.

2.1.5. Query execution

The Query API provides two methods for executing Ickle queries on a cache:

  • Query.execute() runs a SELECT statement and returns a result.
  • Query.executeStatement() runs a DELETE statement and modifies data.
Note

You should always invoke executeStatement() to modify data and invoke execute() to get the result of a query.

2.2. Ickle query language syntax

The Ickle query language is subset of the JPQL query language, with some extensions for full-text.

The parser syntax has some notable rules:

  • Whitespace is not significant.
  • Wildcards are not supported in field names.
  • A field name or path must always be specified, as there is no default field.
  • && and || are accepted instead of AND or OR in both full-text and JPA predicates.
  • ! may be used instead of NOT.
  • A missing boolean operator is interpreted as OR.
  • String terms must be enclosed with either single or double quotes.
  • Fuzziness and boosting are not accepted in arbitrary order; fuzziness always comes first.
  • != is accepted instead of <>.
  • Boosting cannot be applied to >,>=,<,<= operators. Ranges may be used to achieve the same result.

2.2.1. Filtering operators

Ickle support many filtering operators that can be used for both indexed and non-indexed fields.

OperatorDescriptionExample

in

Checks that the left operand is equal to one of the elements from the Collection of values given as argument.

FROM Book WHERE isbn IN ('ZZ', 'X1234')

like

Checks that the left argument (which is expected to be a String) matches a wildcard pattern that follows the JPA rules.

FROM Book WHERE title LIKE '%Java%'

=

Checks that the left argument is an exact match of the given value.

FROM Book WHERE name = 'Programming Java'

!=

Checks that the left argument is different from the given value.

FROM Book WHERE language != 'English'

>

Checks that the left argument is greater than the given value.

FROM Book WHERE price > 20

>=

Checks that the left argument is greater than or equal to the given value.

FROM Book WHERE price >= 20

<

Checks that the left argument is less than the given value.

FROM Book WHERE year < 2020

<=

Checks that the left argument is less than or equal to the given value.

FROM Book WHERE price ⇐ 50

between

Checks that the left argument is between the given range limits.

FROM Book WHERE price BETWEEN 50 AND 100

2.2.2. Boolean conditions

Combining multiple attribute conditions with logical conjunction (and) and disjunction (or) operators in order to create more complex conditions is demonstrated in the following example. The well known operator precedence rule for boolean operators applies here, so the order of the operators is irrelevant. Here and operator still has higher priority than or even though or was invoked first.

# match all books that have "Data Grid" in their title
# or have an author named "Manik" and their description contains "clustering"

FROM org.infinispan.sample.Book WHERE title LIKE '%Data Grid%' OR author.name = 'Manik' AND description like '%clustering%'

Boolean negation has highest precedence among logical operators and applies only to the next simple attribute condition.

# match all books that do not have "Data Grid" in their title and are authored by "Manik"
FROM org.infinispan.sample.Book WHERE title != 'Data Grid' AND author.name = 'Manik'

2.2.3. Nested conditions

Changing the precedence of logical operators is achieved with parenthesis:

# match all books that have an author named "Manik" and their title contains
# "Data Grid" or their description contains "clustering"
FROM org.infinispan.sample.Book WHERE author.name = 'Manik' AND ( title like '%Data Grid%' OR description like '% clustering%')

2.2.4. Projections with SELECT statements

In some use cases returning the whole domain object is overkill if only a small subset of the attributes are actually used by the application, especially if the domain entity has embedded entities. The query language allows you to specify a subset of attributes (or attribute paths) to return - the projection. If projections are used then the QueryResult.list() will not return the whole domain entity but will return a List of Object[], each slot in the array corresponding to a projected attribute.

# match all books that have "Data Grid" in their title or description
# and return only their title and publication year
SELECT title, publicationYear FROM org.infinispan.sample.Book WHERE title like '%Data Grid%' OR description like '%Data Grid%'

2.2.4.1. Project cache entry version

It is possible to project the cache entry version, using the version projection function.

# return the title, publication year and the cache entry version
SELECT b.title, b.publicationYear, version(b) FROM org.infinispan.sample.Book b WHERE b.title like '%Data Grid%'

2.2.4.2. Project cache entry value

It is possible to project the cache entry value together with other projections. It can be used for instance to project the cache entry value together with the cache entry version in the same Object[] returned hit.

# return the cache entry value and the cache entry version
SELECT b, version(b) FROM org.infinispan.sample.Book b WHERE b.title like '%Data Grid%'

2.2.4.3. Project the score

If the query is indexed, it is possible to project the score obtained by each matching together with other projections. It can be used for instance to project the cache entry value together with the score in the same Object[] returned hit.

# return the cache entry value and the the score of the matching
SELECT b, score(b) FROM org.infinispan.sample.Book b WHERE b.title like '%Data Grid%'

Sorting

Ordering the results based on one or more attributes or attribute paths is done with the ORDER BY clause. If multiple sorting criteria are specified, then the order will dictate their precedence.

# match all books that have "Data Grid" in their title or description
# and return them sorted by the publication year and title
FROM org.infinispan.sample.Book WHERE title like '%Data Grid%' ORDER BY publicationYear DESC, title ASC

2.2.5. Grouping and aggregation

Data Grid has the ability to group query results according to a set of grouping fields and construct aggregations of the results from each group by applying an aggregation function to the set of values that fall into each group. Grouping and aggregation can only be applied to projection queries (queries with one or more field in the SELECT clause).

The supported aggregations are: avg, sum, count, max, and min.

The set of grouping fields is specified with the GROUP BY clause and the order used for defining grouping fields is not relevant. All fields selected in the projection must either be grouping fields or else they must be aggregated using one of the grouping functions described below. A projection field can be aggregated and used for grouping at the same time. A query that selects only grouping fields but no aggregation fields is legal. ⁠ Example: Grouping Books by author and counting them.

SELECT author, COUNT(title) FROM org.infinispan.sample.Book WHERE title LIKE '%engine%' GROUP BY author
Note

A projection query in which all selected fields have an aggregation function applied and no fields are used for grouping is allowed. In this case the aggregations will be computed globally as if there was a single global group.

Aggregations

You can apply the following aggregation functions to a field:

Table 2.1. Index merge attributes
Aggregation functionDescription

avg()

Computes the average of a set of numbers. Accepted values are primitive numbers and instances of java.lang.Number. The result is represented as java.lang.Double. If there are no non-null values the result is null instead.

count()

Counts the number of non-null rows and returns a java.lang.Long. If there are no non-null values the result is 0 instead.

max()

Returns the greatest value found. Accepted values must be instances of java.lang.Comparable. If there are no non-null values the result is null instead.

min()

Returns the smallest value found. Accepted values must be instances of java.lang.Comparable. If there are no non-null values the result is null instead.

sum()

Computes the sum of a set of Numbers. If there are no non-null values the result is null instead. The following table indicates the return type based on the specified field.

Table 2.2. Table sum return type
Field TypeReturn Type

Integral (other than BigInteger)

Long

Float or Double

Double

BigInteger

BigInteger

BigDecimal

BigDecimal

Evaluation of queries with grouping and aggregation

Aggregation queries can include filtering conditions, like usual queries. Filtering can be performed in two stages: before and after the grouping operation. All filter conditions defined before invoking the groupBy() method will be applied before the grouping operation is performed, directly to the cache entries (not to the final projection). These filter conditions can reference any fields of the queried entity type, and are meant to restrict the data set that is going to be the input for the grouping stage. All filter conditions defined after invoking the groupBy() method will be applied to the projection that results from the projection and grouping operation. These filter conditions can either reference any of the groupBy() fields or aggregated fields. Referencing aggregated fields that are not specified in the select clause is allowed; however, referencing non-aggregated and non-grouping fields is forbidden. Filtering in this phase will reduce the amount of groups based on their properties. Sorting can also be specified similar to usual queries. The ordering operation is performed after the grouping operation and can reference any of the groupBy() fields or aggregated fields.

2.2.6. DELETE statements

You can delete entities from Data Grid caches with the following syntax:

DELETE FROM <entityName> [WHERE condition]
  • Reference only single entities with <entityName>. DELETE queries cannot use joins.
  • WHERE conditions are optional.

DELETE queries cannot use any of the following:

  • Projections with SELECT statements
  • Grouping and aggregation
  • ORDER BY clauses
Tip

Invoke the Query.executeStatement() method to execute DELETE statements.

2.3. Full-text queries

You can perform full-text searches with the Ickle query language.

2.3.1. Fuzzy queries

To execute a fuzzy query add ~ along with an integer, representing the distance from the term used, after the term. For instance

FROM sample_bank_account.Transaction WHERE description : 'cofee'~2

2.3.2. Range queries

To execute a range query define the given boundaries within a pair of braces, as seen in the following example:

FROM sample_bank_account.Transaction WHERE amount : [20 to 50]

2.3.3. Phrase queries

A group of words can be searched by surrounding them in quotation marks, as seen in the following example:

FROM sample_bank_account.Transaction WHERE description : 'bus fare'

2.3.4. Proximity queries

To execute a proximity query, finding two terms within a specific distance, add a ~ along with the distance after the phrase. For instance, the following example will find the words canceling and fee provided they are not more than 3 words apart:

FROM sample_bank_account.Transaction WHERE description : 'canceling fee'~3

2.3.5. Wildcard queries

To search for "text" or "test", use the ? single-character wildcard search:

FROM sample_bank_account.Transaction where description : 'te?t'

To search for "test", "tests", or "tester", use the * multi-character wildcard search:

FROM sample_bank_account.Transaction where description : 'test*'

2.3.6. Regular expression queries

Regular expression queries can be performed by specifying a pattern between /. Ickle uses Lucene’s regular expression syntax, so to search for the words moat or boat the following could be used:

FROM sample_library.Book  where title : /[mb]oat/

2.3.7. Boosting queries

Terms can be boosted by adding a ^ after the term to increase their relevance in a given query, the higher the boost factor the more relevant the term will be. For instance to search for titles containing beer and wine with a higher relevance on beer, by a factor of 3, the following could be used:

FROM sample_library.Book WHERE title : beer^3 OR wine

2.4. Vector search queries

You can perform vector kNN searches with the Ickle query language using the special operator <-> to define predicates.

This is an example of kNN query:

from play.Item i where i.myVector <-> [7,7,7]~3

This query will find the items that have the myVector fields that are within 3 nearest neighbourhood from the vector [7,7,7].

Notice that in order to use this kind of search the entity, in our example play.Item, has to be @Indexed and the field, in our example myVector, should be annotated with @Vector.

We support two kinds of vector field types:

  • byte / Byte (to work with byte vectors)
  • float / Float (to work with float vectors)

You can have different vector fields on the same entity, but in any case you can have only one vector predicate on your queries.

2.4.1. Vector search parameters

Both the k-value and the vector can be passed as query parameter. The k-value scalar can be expressed with the usual placeholder :k in the Ickle text. For the vector we can use either a placeholder for each term of the vector:

Query<Item> query = cache.query("from play.Item i where i.floatVector <-> [:a,:b,:c]~:k");
query.setParameter("a", 1);
query.setParameter("b", 4.3);
query.setParameter("c", 3.3);
query.setParameter("k", 4);

Or a placeholder can be used for the entire vector:

Query<Item> query = cache.query("from play.Item i where i.floatVector <-> [:a]~:k");
query.setParameter("a", new float[]{7.1f, 7.0f, 3.1f});
query.setParameter("k", 3);

2.4.3. Filtering entities

Instead of applying the kNN search to the entire population of entities of a given type, you can limiting the searching set applying classic predicates (match, full-text-search, range, …​) to the kNN queries, defining what we call a filtering clause.

A filtering clause can contain any kind of predicates with the only exception of kNN predicates that cannot be included.

For instance the following query:

Query<Object[]> query = remoteCache.query(
   "select score(i), i from Item i where i.floatVector <-> [:a]~:k filtering (i.buggy : 'cat' or i.text : 'code')");
query.setParameter("a", new float[]{7, 7, 7});
query.setParameter("k", 3);

Will return the nearest 3 items from the point [7,7,7] selecting only the items that have a text containing the term cat or code.

The filtering queries is a way to apply the classic indexed search to the new vector search.

2.4.4. Vector field attributes

It is always required to specify the dimension of the vector field.

The other mapping attributes are optional, since Infinispan will have a default for each of them. You can configure them, for instance, to tune the desired accuracy / performance.

2.4.4.1. Similarity

Different VectorSimilarity algorithms are supported

ValueDistanceScoreNote

L2

  

This is the Data Grid default

INNER_PRODUCT

  

To use this similarity efficiently, both index and search vectors must be normalized

MAX_INNER_PRODUCT

  

This similarity does not require vector normalization

COSINE

  

This similarity cannot be of zero magnitude, e.g. when a vector is all zeroes: [0,0,0,…​ 0,0], the cosine is not defined and will result in an error.

2.4.4.2. Beanwidth

Changing the beamWidth you can modify the the size of the dynamic list used during k-NN graph creation. It affects how vectors are stored. Higher values lead to a more accurate graph but slower indexing speed. The Data Grid default is 512.

2.4.4.3. Max Connections

The number of neighbors each node will be connected to in the HNSW graph. Modifying this value will have an impact on memory consumption. It is recommended to keep this value between 2 and 100. The Data Grid default is 16.

Red Hat logoGithubRedditYoutubeTwitter

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Ajudamos os usuários da Red Hat a inovar e atingir seus objetivos com nossos produtos e serviços com conteúdo em que podem confiar.

Tornando o open source mais inclusivo

A Red Hat está comprometida em substituir a linguagem problemática em nosso código, documentação e propriedades da web. Para mais detalhes veja oBlog da Red Hat.

Sobre a Red Hat

Fornecemos soluções robustas que facilitam o trabalho das empresas em plataformas e ambientes, desde o data center principal até a borda da rede.

© 2024 Red Hat, Inc.