Chapter 5. Data Encoding and MediaTypes
Encoding is the data conversion operation done by Data Grid caches before storing data, and when reading back from storage.
5.1. Overview
Encoding allows dealing with a certain data format during API calls (map, listeners, stream, etc) while the format effectively stored is different.
The data conversions are handled by instances of org.infinispan.commons.dataconversion.Encoder :
public interface Encoder { /** * Convert data in the read/write format to the storage format. * * @param content data to be converted, never null. * @return Object in the storage format. */ Object toStorage(Object content); /** * Convert from storage format to the read/write format. * * @param content data as stored in the cache, never null. * @return data in the read/write format */ Object fromStorage(Object content); /** * Returns the {@link MediaType} produced by this encoder or null if the storage format is not known. */ MediaType getStorageFormat(); }
5.2. Default encoders
Data Grid automatically picks the Encoder depending on the cache configuration. The table below shows which internal Encoder is used for several configurations:
Mode | Configuration | Encoder | Description |
---|---|---|---|
Embedded/Server | Default | IdentityEncoder | Passthrough encoder, no conversion done |
Embedded | StorageType.OFF_HEAP | GlobalMarshallerEncoder | Use the Data Grid internal marshaller to convert to byte[]. May delegate to the configured marshaller in the cache manager. |
Embedded | StorageType.BINARY | BinaryEncoder | Use the Data Grid internal marshaller to convert to byte[], except for primitives and String. |
Server | StorageType.OFF_HEAP | IdentityEncoder | Store byte[]s directly as received by remote clients |
5.3. Overriding programmatically
It is possible to override programmatically the encoding used for both keys and values, by calling the .withEncoding() method variants from AdvancedCache.
Example, consider the following cache configured as OFF_HEAP:
// Read and write POJO, storage will be byte[] since for // OFF_HEAP the GlobalMarshallerEncoder is used internally: cache.put(1, new Pojo()) Pojo value = cache.get(1) // Get the content in its stored format by overriding // the internal encoder with a no-op encoder (IdentityEncoder) Cache<?,?> rawContent = cache.getAdvancedCache().withEncoding(IdentityEncoder.class); byte[] marshalled = (byte[]) rawContent.get(1);
The override can be useful if any operation in the cache does not require decoding, such as counting number of entries, or calculating the size of byte[] of an OFF_HEAP cache.
5.4. Defining custom Encoders
A custom encoder can be registered in the EncoderRegistry.
Ensure that the registration is done in every node of the cluster, before starting the caches.
Consider a custom encoder used to compress/decompress with gzip:
public class GzipEncoder implements Encoder { @Override public Object toStorage(Object content) { assert content instanceof String; return compress(content.toString()); } @Override public Object fromStorage(Object content) { assert content instanceof byte[]; return decompress((byte[]) content); } private byte[] compress(String str) { try (ByteArrayOutputStream baos = new ByteArrayOutputStream(); GZIPOutputStream gis = new GZIPOutputStream(baos)) { gis.write(str.getBytes("UTF-8")); gis.close(); return baos.toByteArray(); } catch (IOException e) { throw new RuntimeException("Unabled to compress", e); } } private String decompress(byte[] compressed) { try (GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(compressed)); BufferedReader bf = new BufferedReader(new InputStreamReader(gis, "UTF-8"))) { StringBuilder result = new StringBuilder(); String line; while ((line = bf.readLine()) != null) { result.append(line); } return result.toString(); } catch (IOException e) { throw new RuntimeException("Unable to decompress", e); } } @Override public MediaType getStorageFormat() { return MediaType.parse("application/gzip"); } @Override public boolean isStorageFormatFilterable() { return false; } @Override public short id() { return 10000; } }
It can be registered by:
GlobalComponentRegistry registry = cacheManager.getGlobalComponentRegistry(); EncoderRegistry encoderRegistry = registry.getComponent(EncoderRegistry.class); encoderRegistry.registerEncoder(new GzipEncoder());
And then be used to write and read data from a cache:
AdvancedCache<String, String> cache = ... // Decorate cache with the newly registered encoder, without encoding keys (IdentityEncoder) // but compressing values AdvancedCache<String, String> compressingCache = (AdvancedCache<String, String>) cache.withEncoding(IdentityEncoder.class, GzipEncoder.class); // All values will be stored compressed... compressingCache.put("297931749", "0412c789a37f5086f743255cfa693dd5"); // ... but API calls deals with String String stringValue = compressingCache.get("297931749"); // Bypassing the value encoder to obtain the value as it is stored Object value = compressingCache.withEncoding(IdentityEncoder.class).get("297931749"); // value is a byte[] which is the compressed value
5.5. MediaType
A Cache can optionally be configured with a org.infinispan.commons.dataconversion.MediaType
for keys and values. By describing the data format of the cache, Data Grid is able to convert data on the fly during cache operations.
The MediaType configuration is more suitable when storing binary data. When using server mode, it’s common to have a MediaType configured and clients such as REST or Hot Rod reading and writing in different formats.
The data conversion between MediaType formats are handled by instances of org.infinispan.commons.dataconversion.Transcoder
public interface Transcoder { /** * Transcodes content between two different {@link MediaType}. * * @param content Content to transcode. * @param contentType The {@link MediaType} of the content. * @param destinationType The target {@link MediaType} to convert. * @return the transcoded content. */ Object transcode(Object content, MediaType contentType, MediaType destinationType); /** * @return all the {@link MediaType} handled by this Transcoder. */ Set<MediaType> getSupportedMediaTypes(); }
5.5.1. Configuration
Declarative:
<cache> <encoding> <key media-type="application/x-java-object; type=java.lang.Integer"/> <value media-type="application/xml; charset=UTF-8"/> </encoding> </cache>
Programmatic:
ConfigurationBuilder cfg = new ConfigurationBuilder(); cfg.encoding().key().mediaType("text/plain"); cfg.encoding().value().mediaType("application/json");
5.5.2. Overriding the MediaType Programmatically
It’s possible to decorate the Cache with a different MediaType, allowing cache operations to be executed sending and receiving different data formats.
Example:
DefaultCacheManager cacheManager = new DefaultCacheManager(); // The cache will store POJO for keys and values ConfigurationBuilder cfg = new ConfigurationBuilder(); cfg.encoding().key().mediaType("application/x-java-object"); cfg.encoding().value().mediaType("application/x-java-object"); cacheManager.defineConfiguration("mycache", cfg.build()); Cache<Integer, Person> cache = cacheManager.getCache("mycache"); cache.put(1, new Person("John","Doe")); // Wraps cache using 'application/x-java-object' for keys but JSON for values Cache<Integer, byte[]> jsonValuesCache = (Cache<Integer, byte[]>) cache.getAdvancedCache().withMediaType("application/x-java-object", "application/json"); byte[] json = jsonValuesCache.get(1);
Will return the value in JSON format:
{ "_type":"org.infinispan.sample.Person", "name":"John", "surname":"Doe" }
Most Transcoders are installed when server mode is used; when using library mode, an extra dependency, org.infinispan:infinispan-server-core should be added to the project.
5.5.3. Transcoders and Encoders
Usually there will be none or only one data conversion involved in a cache operation:
- No conversion by default on caches using in embedded or server mode;
- Encoder based conversion for embedded caches without MediaType configured, but using OFF_HEAP or BINARY;
- Transcoder based conversion for caches used in server mode with multiple REST and Hot Rod clients sending and receiving data in different formats. Those caches will have MediaType configured describing the storage.
But it’s possible to have both encoders and transcoders being used simultaneously for advanced use cases.
Consider an example, a cache that stores marshalled objects (with jboss marshaller) content but for security reasons a transparent encryption layer should be added in order to avoid storing "plain" data to an external store. Clients should be able to read and write data in multiple formats.
This can be achieved by configuring the cache with the the MediaType that describes the storage regardless of the encoding layer:
ConfigurationBuilder cfg = new ConfigurationBuilder(); cfg.encoding().key().mediaType("application/x-jboss-marshalling"); cfg.encoding().key().mediaType("application/x-jboss-marshalling");
The transparent encryption can be added by decorating the cache with a special Encoder that encrypts/decrypts with storing/retrieving, for example:
class Scrambler implements Encoder { public Object toStorage(Object content) { // Encrypt data } public Object fromStorage(Object content) { // Decrypt data } @Override public boolean isStorageFormatFilterable() { } public MediaType getStorageFormat() { return new MediaType("application", "scrambled"); } @Override public short id() { //return id } }
To make sure all data written to the cache will be stored encrypted, it’s necessary to decorate the cache with the Encoder above and perform all cache operations in this decorated cache:
Cache<?,?> secureStorageCache = cache.getAdvancedCache().withEncoding(Scrambler.class).put(k,v);
The capability of reading data in multiple formats can be added by decorating the cache with the desired MediaType:
// Obtain a stream of values in XML format from the secure cache secureStorageCache.getAdvancedCache().withMediaType("application/xml","application/xml").values().stream();
Internally, Data Grid will first apply the encoder fromStorage operation to obtain the entries, that will be in "application/x-jboss-marshalling" format and then apply a successive conversion to "application/xml" by using the adequate Transcoder.