Chapter 4. Configuring Cache Encoding


Data Grid saves your data in a specific format that can be converted on-the-fly when you read and write to and from caches. configure the storage format by specifying a MediaType for keys and values, which describes the format of the data.

Data Grid can also convert data between different storage formats to handle interoperability between different client protocols and when using custom code to process data.

4.1. Cache Encoding and Client Interoperability

The encoding that you use for your data affects client interoperability and capabilities such as Data Grid Search.

Table 4.1. Protobuf Format
Store data in Protobuf format to use it with…​

Data Grid Console

Yes

REST clients

Yes

Java Hot Rod clients

Yes

Non-Java Hot Rod clients

Yes

Data Grid Search

Yes

Custom Java objects

Yes

Table 4.2. Text-Based Format
Store data in a text-based format to use it with…​

Data Grid Console

Yes

REST clients

Yes

Java Hot Rod clients

Yes

Non-Java Hot Rod clients

Yes

Data Grid Search

No

Custom Java objects

No

Table 4.3. Marshalled Java Objects
Marshalled Java objects are compatible with…​

Data Grid Console

No

REST clients

Yes

Java Hot Rod clients

Yes

Non-Java Hot Rod clients

No

Data Grid Search

No

Table 4.4. Unmarshalled Java Objects
Plain Old Java Objects (POJOs) are not recommended but compatible with…​

Data Grid Console

No

REST clients

Yes

Java Hot Rod clients

Yes

Non-Java Hot Rod clients

No

Data Grid Search

Yes. However, you must annotate entities to search with POJOs and make your classes available to Data Grid Server.

Custom Java objects

Yes

4.1.1. Configuring Cache Encoding for Memcached Clients

Data Grid Server disables the Memcached endpoint by default. If you enable the Memcached endpoint, you should configure caches with a suitable encoding for Memcached clients.

Important

The Memcached endpoint does not support authentication. For security purposes you should use dedicated caches for Memcached clients. You should not use REST or Hot Rod clients to interact on the same data set as Memcached clients.

Procedure

  1. Configure cache encoding to use text/plain for keys.
  2. Specify any appropriate MediaType, other than application/x-java- object, for values.

    Memcached clients can handle keys as text/plain only. Values can be any MediaType that Data Grid stores as byte[], which can be Protobuf, marshalled Java objects, or a text-based format.

    <encoding>
      <key media-type="text/plain"/>
      <value media-type="application/x-protostream"/>
    </encoding>
Tip

The Memcached endpoint includes a client-encoding attribute that converts the encoding of values.

For example, as in the preceding configuration example, you store values encoded as Protobuf. If you want Memcached clients to read and write values as JSON, you can use the following configuration:

<memcached-connector cache="memcachedCache" client-encoding="application/json">

4.2. Configuring Encoding for Data Grid Caches

Define the MediaType that Data Grid uses to encode your data when writing and reading to and from the cache.

Tip

When you define a MediaType, you specify the format of your data to Data Grid.

If you want to use the Data Grid Console, Hot Rod clients, and REST clients interchangeably, specify the application/x-protostream MediaType so Data Grid encodes data in Protobuf format.

Procedure

  • Specify a MediaType for key and values in your Data Grid cache configuration.

    • Declaratively: Set the encoding attribute.
    • Programmatically: Use the encoding() method.

Declarative examples

  • Use the same encoding for keys and values:
<local-cache>
  <encoding media-type="application/x-protostream"/>
</local-cache>
  • Use a different encoding for keys and values:
<cache>
   <encoding>
      <key media-type="application/x-java-object"/>
      <value media-type="application/xml; charset=UTF-8"/>
   </encoding>
</cache>

Programmatic examples

  • Use the same encoding for keys and values:
ConfigurationBuilder cfg = new ConfigurationBuilder();

cfg
  .encoding()
    .mediaType("application/x-protostream")
  .build());
  • Use a different encoding for keys and values:
ConfigurationBuilder cfg = new ConfigurationBuilder();

cfg.encoding().key().mediaType("text/plain");
cfg.encoding().value().mediaType("application/json");

4.3. Storing Data in Protobuf Format

Storing data in the cache as Protobuf encoded entries provides a platform independent configuration that enables you to perform cache operations from any client.

Note

When you configure indexing for Data Grid Search, Data Grid automatically stores keys and values with the application/x-protostream media type.

Procedure

  1. Specify application/x-protostream as the MediaType for keys and values as follows:

    <distributed-cache name="mycache">
       <encoding>
          <key media-type="application/x-protostream"/>
          <value media-type="application/x-protostream"/>
       </encoding>
    </distributed-cache>
  2. Configure your clients.

Hot Rod clients must register Protocol Buffers schema definitions that describe entities and client marshallers.

Data Grid converts between application/x-protostream and application/json so REST clients only need to send the following headers to read and write JSON formatted data:

  • Accept: application/json for read operations.
  • Content-Type: application/json for write operations.

4.4. Storing Data in Text-Based Formats

Configure Data Grid to store data in a text-based format such as text/ plain, application/json, or application/xml.

Procedure

  1. Specify a text-based storage format as the MediaType for keys and values.
  2. Optionally specify a character set such as UTF-8.

    The following example configures Data Grid to store entries with the text/plain; charset=UTF-8 format:

    <cache>
       <encoding>
          <key media-type="text/plain; charset=UTF-8"/>
          <value media-type="text/plain; charset=UTF-8"/>
       </encoding>
    </cache>
  3. Configure your clients.

Hot Rod clients can use org.infinispan.commons.marshall.StringMarshaller to handle plain text, JSON, XML, or any other text-based format.

You can also use text-based formats with the ProtoStream marshaller. ProtoStream can handle String and byte[] types natively, without the need to create Serialization Contexts and register Protobuf schemas (.proto files).

REST clients must send the correct headers with requests:

  • Accept: text/plain; charset=UTF-8 for read operations.
  • Content-Type: text/plain; charset=UTF-8 for write operations.

4.5. Storing Marshalled Java Objects

Java Hot Rod clients can handle Java objects that represent entities and perform marshalling to serialize and deserialize objects into byte[] arrays. C++, C#, and Javascript Hot Rod clients can also handle objects in the respective languages.

If you store entries in the cache as marshalled Java objects, you should configure the cache with the MediaType of the marshalled storage.

Procedure

  1. Specify the MediaType that matches your marshaller implementation.

    • Protostream marshaller: Configure the MediaType as application/x-protostream.
    • JBoss marshalling: Configure the MediaType as application/x-jboss-marshalling.
    • Java serialization: Configure the MediaType as application/x-java-serialized-object.
  2. Configure your clients.

Because REST clients are most suitable for handling text formats, you should use primitives such as java.lang.String for keys. Otherwise, REST clients must handle keys as bytes[] using a supported binary encoding.

REST clients can read values for cache entries in XML or JSON format.

Equality Considerations

When storing data in binary format, Data Grid uses the WrappedBytes interface for keys and values. This wrapper class transparently takes care of serialization and deserialization on demand, and internally may have a reference to the object itself being wrapped, or the serialized, byte array representation of the object. This has an effect on the behavior of equality, which is important to note if you implement an equals() methods on keys.

The equals() method of the wrapper class either compares binary representations (byte arrays) or delegates to the wrapped object instance’s equals() method, depending on whether both instances being compared are in serialized or deserialized form at the time of comparison. If one of the instances being compared is in one form and the other in another form, then one instance is either serialized or deserialized.

4.6. Storing Unmarshalled Java Objects

You can store data as deserialized Plain Old Java Objects (POJO) instead of storing data in a binary format.

Storing POJO instead of binary format is not recommended because it requires Data Grid to serialize data on client read operations and deserialize data on write operations. To handle client interoperability with custom code you should convert data on demand.

Procedure

  1. Specify application/x-java-object as the MediaType for keys and values as follows:

    <distributed-cache name="my-cache">
       <encoding>
          <key media-type="application/x-java-object"/>
          <value media-type="application/x-java-object"/>
       </encoding>
    </distributed-cache>
  2. Put class files for all custom objects on the Data Grid server classpath.

    Add JAR files that contain custom classes and/or service providers for marshaller implementations in the server/lib directory.

    ├── server
    │   ├── lib
    │   │   ├── UserObjects.jar
    │       └── README.txt
  3. Configure your clients.

There are no changes required for Hot Rod clients. The only requirement is that the marshaller used in the client is available in the server/lib directory so Data Grid can de-serialize the objects.

Note

ProtoStream and Java Serialization marshallers are already available on the server.

REST clients must use either JSON or XML so Data Grid can convert to and from Java objects.

4.7. Data Encoding

Encoding is the data conversion operation done by Data Grid caches before storing data, and when reading back from storage.

4.7.1. Overview

Encoding allows dealing with a certain data format during API calls (map, listeners, stream, etc) while the format effectively stored is different.

The data conversions are handled by instances of org.infinispan.commons.dataconversion.Encoder :

public interface Encoder {

   /**
    * Convert data in the read/write format to the storage format.
    *
    * @param content data to be converted, never null.
    * @return Object in the storage format.
    */
   Object toStorage(Object content);

   /**
    * Convert from storage format to the read/write format.
    *
    * @param content data as stored in the cache, never null.
    * @return data in the read/write format
    */
   Object fromStorage(Object content);

   /**
     * Returns the {@link MediaType} produced by this encoder or null if the storage format is not known.
     */
   MediaType getStorageFormat();
}

4.7.2. Default encoders

Data Grid automatically picks the Encoder depending on the cache configuration. The table below shows which internal Encoder is used for several configurations:

ModeConfigurationEncoderDescription

Embedded/Server

Default

IdentityEncoder

Passthrough encoder, no conversion done

Embedded

StorageType.OFF_HEAP

GlobalMarshallerEncoder

Use the Data Grid internal marshaller to convert to byte[]. May delegate to the configured marshaller in the cache manager.

Embedded

StorageType.BINARY

BinaryEncoder

Use the Data Grid internal marshaller to convert to byte[], except for primitives and String.

Server

StorageType.OFF_HEAP

IdentityEncoder

Store byte[]s directly as received by remote clients

4.7.3. Overriding programmatically

It is possible to override programmatically the encoding used for both keys and values, by calling the .withEncoding() method variants from AdvancedCache.

Example, consider the following cache configured as OFF_HEAP:

// Read and write POJO, storage will be byte[] since for
// OFF_HEAP the GlobalMarshallerEncoder is used internally:
cache.put(1, new Pojo())
Pojo value = cache.get(1)

// Get the content in its stored format by overriding
// the internal encoder with a no-op encoder (IdentityEncoder)
Cache<?,?> rawContent = cache.getAdvancedCache().withEncoding(IdentityEncoder.class);
byte[] marshalled = (byte[]) rawContent.get(1);

The override can be useful if any operation in the cache does not require decoding, such as counting number of entries, or calculating the size of byte[] of an OFF_HEAP cache.

4.7.4. Defining Custom Encoders

A custom encoder can be registered in the EncoderRegistry.

Caution

Ensure that the registration is done in every node of the cluster, before starting the caches.

Consider a custom encoder used to compress/decompress with gzip:

public class GzipEncoder implements Encoder {

   @Override
   public Object toStorage(Object content) {
      assert content instanceof String;
      return compress(content.toString());
   }

   @Override
   public Object fromStorage(Object content) {
      assert content instanceof byte[];
      return decompress((byte[]) content);
   }

   private byte[] compress(String str) {
      try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
           GZIPOutputStream gis = new GZIPOutputStream(baos)) {
         gis.write(str.getBytes("UTF-8"));
         gis.close();
         return baos.toByteArray();
      } catch (IOException e) {
         throw new RuntimeException("Unabled to compress", e);
      }
   }

   private String decompress(byte[] compressed) {
      try (GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(compressed));
           BufferedReader bf = new BufferedReader(new InputStreamReader(gis, "UTF-8"))) {
         StringBuilder result = new StringBuilder();
         String line;
         while ((line = bf.readLine()) != null) {
            result.append(line);
         }
         return result.toString();
      } catch (IOException e) {
         throw new RuntimeException("Unable to decompress", e);
      }
   }

   @Override
   public MediaType getStorageFormat() {
      return MediaType.parse("application/gzip");
   }

   @Override
   public boolean isStorageFormatFilterable() {
      return false;
   }

   @Override
   public short id() {
      return 10000;
   }
}

It can be registered by:

GlobalComponentRegistry registry = cacheManager.getGlobalComponentRegistry();
EncoderRegistry encoderRegistry = registry.getComponent(EncoderRegistry.class);
encoderRegistry.registerEncoder(new GzipEncoder());

And then be used to write and read data from a cache:

AdvancedCache<String, String> cache = ...

// Decorate cache with the newly registered encoder, without encoding keys (IdentityEncoder)
// but compressing values
AdvancedCache<String, String> compressingCache = (AdvancedCache<String, String>) cache.withEncoding(IdentityEncoder.class, GzipEncoder.class);

// All values will be stored compressed...
compressingCache.put("297931749", "0412c789a37f5086f743255cfa693dd5");

// ... but API calls deals with String
String stringValue = compressingCache.get("297931749");

// Bypassing the value encoder to obtain the value as it is stored
Object value = compressingCache.withEncoding(IdentityEncoder.class).get("297931749");

// value is a byte[] which is the compressed value

4.8. Transcoders and Data Conversion

Data Grid uses org.infinispan.commons.dataconversion.Transcoder to convert data between MediaType formats.

public interface Transcoder {

   /**
    * Transcodes content between two different {@link MediaType}.
    *
    * @param content         Content to transcode.
    * @param contentType     The {@link MediaType} of the content.
    * @param destinationType The target {@link MediaType} to convert.
    * @return the transcoded content.
    */
   Object transcode(Object content, MediaType contentType, MediaType destinationType);

   /**
    * @return all the {@link MediaType} handled by this Transcoder.
    */
   Set<MediaType> getSupportedMediaTypes();
}

4.8.1. Converting Data on Demand

You can deploy and run custom code on Data Grid, such as tasks, listeners, and merge policies. Custom code on Data Grid can directly access data but must also interoperate with clients that access the same data through different endpoints. For example, you can create tasks that handle custom objects while Hot Rod clients read and write data in binary format.

In this case, you can configure application/x-protostream as the cache encoding to store data in binary format then configure your custom code to perform cache operations using a different MediaType.

For example:

DefaultCacheManager cacheManager = new DefaultCacheManager();

// The cache will store POJO for keys and values
ConfigurationBuilder cfg = new ConfigurationBuilder();
cfg.encoding().key().mediaType("application/x-java-object");
cfg.encoding().value().mediaType("application/x-java-object");

cacheManager.defineConfiguration("mycache", cfg.build());

Cache<Integer, Person> cache = cacheManager.getCache("mycache");

cache.put(1, new Person("John","Doe"));

// Wraps cache using 'application/x-java-object' for keys but JSON for values
Cache<Integer, byte[]> jsonValuesCache = (Cache<Integer, byte[]>) cache.getAdvancedCache().withMediaType("application/x-java-object", "application/json");

byte[] json = jsonValuesCache.get(1);

Will return the value in JSON format:

{
   "_type":"org.infinispan.sample.Person",
   "name":"John",
   "surname":"Doe"
}

4.8.2. Installing Transcoders in Embedded Deloyments

Data Grid Server includes transcoders by default. However, when running Data Grid as a library, you must add the following to your project:

org.infinispan:infinispan-server-core

4.8.3. Transcoders and Encoders

Usually there will be none or only one data conversion involved in a cache operation:

  • No conversion by default on caches using in embedded or server mode;
  • Encoder based conversion for embedded caches without MediaType configured, but using OFF_HEAP or BINARY;
  • Transcoder based conversion for caches used in server mode with multiple REST and Hot Rod clients sending and receiving data in different formats. Those caches will have MediaType configured describing the storage.

But it’s possible to have both encoders and transcoders being used simultaneously for advanced use cases.

Consider an example, a cache that stores marshalled objects (with jboss marshaller) content but for security reasons a transparent encryption layer should be added in order to avoid storing "plain" data to an external store. Clients should be able to read and write data in multiple formats.

This can be achieved by configuring the cache with the the MediaType that describes the storage regardless of the encoding layer:

ConfigurationBuilder cfg = new ConfigurationBuilder();
cfg.encoding().key().mediaType("application/x-jboss-marshalling");
cfg.encoding().key().mediaType("application/x-jboss-marshalling");

The transparent encryption can be added by decorating the cache with a special Encoder that encrypts/decrypts with storing/retrieving, for example:

class Scrambler implements Encoder {

   public Object toStorage(Object content) {
   // Encrypt data
   }

   public Object fromStorage(Object content) {
   // Decrypt data
   }

   @Override
   public boolean isStorageFormatFilterable() {

   }

   public MediaType getStorageFormat() {
   return new MediaType("application", "scrambled");
   }

   @Override
   public short id() {
   //return id
   }
}

To make sure all data written to the cache will be stored encrypted, it’s necessary to decorate the cache with the Encoder above and perform all cache operations in this decorated cache:

Cache<?,?> secureStorageCache = cache.getAdvancedCache().withEncoding(Scrambler.class).put(k,v);

The capability of reading data in multiple formats can be added by decorating the cache with the desired MediaType:

// Obtain a stream of values in XML format from the secure cache
secureStorageCache.getAdvancedCache().withMediaType("application/xml","application/xml").values().stream();

Internally, Data Grid will first apply the encoder fromStorage operation to obtain the entries, that will be in "application/x-jboss-marshalling" format and then apply a successive conversion to "application/xml" by using the adequate Transcoder.

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.