Chapter 4. Configuring Cache Encoding
Data Grid saves your data in a specific format that can be converted on-the-fly when you read and write to and from caches. configure the storage format by specifying a MediaType for keys and values, which describes the format of the data.
Data Grid can also convert data between different storage formats to handle interoperability between different client protocols and when using custom code to process data.
4.1. Cache Encoding and Client Interoperability
The encoding that you use for your data affects client interoperability and capabilities such as Data Grid Search.
Store data in Protobuf format to use it with… | |
---|---|
Data Grid Console | Yes |
REST clients | Yes |
Java Hot Rod clients | Yes |
Non-Java Hot Rod clients | Yes |
Data Grid Search | Yes |
Custom Java objects | Yes |
Store data in a text-based format to use it with… | |
---|---|
Data Grid Console | Yes |
REST clients | Yes |
Java Hot Rod clients | Yes |
Non-Java Hot Rod clients | Yes |
Data Grid Search | No |
Custom Java objects | No |
Marshalled Java objects are compatible with… | |
---|---|
Data Grid Console | No |
REST clients | Yes |
Java Hot Rod clients | Yes |
Non-Java Hot Rod clients | No |
Data Grid Search | No |
Plain Old Java Objects (POJOs) are not recommended but compatible with… | |
---|---|
Data Grid Console | No |
REST clients | Yes |
Java Hot Rod clients | Yes |
Non-Java Hot Rod clients | No |
Data Grid Search | Yes. However, you must annotate entities to search with POJOs and make your classes available to Data Grid Server. |
Custom Java objects | Yes |
4.1.1. Configuring Cache Encoding for Memcached Clients
Data Grid Server disables the Memcached endpoint by default. If you enable the Memcached endpoint, you should configure caches with a suitable encoding for Memcached clients.
The Memcached endpoint does not support authentication. For security purposes you should use dedicated caches for Memcached clients. You should not use REST or Hot Rod clients to interact on the same data set as Memcached clients.
Procedure
-
Configure cache encoding to use
text/plain
for keys. Specify any appropriate MediaType, other than
application/x-java- object
, for values.Memcached clients can handle keys as
text/plain
only. Values can be any MediaType that Data Grid stores asbyte[]
, which can be Protobuf, marshalled Java objects, or a text-based format.<encoding> <key media-type="text/plain"/> <value media-type="application/x-protostream"/> </encoding>
The Memcached endpoint includes a client-encoding
attribute that converts the encoding of values.
For example, as in the preceding configuration example, you store values encoded as Protobuf. If you want Memcached clients to read and write values as JSON, you can use the following configuration:
<memcached-connector cache="memcachedCache" client-encoding="application/json">
4.2. Configuring Encoding for Data Grid Caches
Define the MediaType that Data Grid uses to encode your data when writing and reading to and from the cache.
When you define a MediaType, you specify the format of your data to Data Grid.
If you want to use the Data Grid Console, Hot Rod clients, and REST clients interchangeably, specify the application/x-protostream
MediaType so Data Grid encodes data in Protobuf format.
Procedure
Specify a MediaType for key and values in your Data Grid cache configuration.
-
Declaratively: Set the
encoding
attribute. -
Programmatically: Use the
encoding()
method.
-
Declaratively: Set the
Declarative examples
- Use the same encoding for keys and values:
<local-cache> <encoding media-type="application/x-protostream"/> </local-cache>
- Use a different encoding for keys and values:
<cache> <encoding> <key media-type="application/x-java-object"/> <value media-type="application/xml; charset=UTF-8"/> </encoding> </cache>
Programmatic examples
- Use the same encoding for keys and values:
ConfigurationBuilder cfg = new ConfigurationBuilder(); cfg .encoding() .mediaType("application/x-protostream") .build());
- Use a different encoding for keys and values:
ConfigurationBuilder cfg = new ConfigurationBuilder(); cfg.encoding().key().mediaType("text/plain"); cfg.encoding().value().mediaType("application/json");
4.3. Storing Data in Protobuf Format
Storing data in the cache as Protobuf encoded entries provides a platform independent configuration that enables you to perform cache operations from any client.
When you configure indexing for Data Grid Search, Data Grid automatically stores keys and values with the application/x-protostream
media type.
Procedure
Specify
application/x-protostream
as the MediaType for keys and values as follows:<distributed-cache name="mycache"> <encoding> <key media-type="application/x-protostream"/> <value media-type="application/x-protostream"/> </encoding> </distributed-cache>
- Configure your clients.
Hot Rod clients must register Protocol Buffers schema definitions that describe entities and client marshallers.
Data Grid converts between application/x-protostream
and application/json
so REST clients only need to send the following headers to read and write JSON formatted data:
-
Accept: application/json
for read operations. -
Content-Type: application/json
for write operations.
4.4. Storing Data in Text-Based Formats
Configure Data Grid to store data in a text-based format such as text/ plain
, application/json
, or application/xml
.
Procedure
- Specify a text-based storage format as the MediaType for keys and values.
Optionally specify a character set such as
UTF-8
.The following example configures Data Grid to store entries with the
text/plain; charset=UTF-8
format:<cache> <encoding> <key media-type="text/plain; charset=UTF-8"/> <value media-type="text/plain; charset=UTF-8"/> </encoding> </cache>
- Configure your clients.
Hot Rod clients can use org.infinispan.commons.marshall.StringMarshaller
to handle plain text, JSON, XML, or any other text-based format.
You can also use text-based formats with the ProtoStream marshaller. ProtoStream can handle String
and byte[]
types natively, without the need to create Serialization Contexts and register Protobuf schemas (.proto
files).
REST clients must send the correct headers with requests:
-
Accept: text/plain; charset=UTF-8
for read operations. -
Content-Type: text/plain; charset=UTF-8
for write operations.
4.5. Storing Marshalled Java Objects
Java Hot Rod clients can handle Java objects that represent entities and perform marshalling to serialize and deserialize objects into byte[]
arrays. C++, C#, and Javascript Hot Rod clients can also handle objects in the respective languages.
If you store entries in the cache as marshalled Java objects, you should configure the cache with the MediaType of the marshalled storage.
Procedure
Specify the MediaType that matches your marshaller implementation.
-
Protostream marshaller: Configure the MediaType as
application/x-protostream
. -
JBoss marshalling: Configure the MediaType as
application/x-jboss-marshalling
. -
Java serialization: Configure the MediaType as
application/x-java-serialized-object
.
-
Protostream marshaller: Configure the MediaType as
- Configure your clients.
Because REST clients are most suitable for handling text formats, you should use primitives such as java.lang.String
for keys. Otherwise, REST clients must handle keys as bytes[]
using a supported binary encoding.
REST clients can read values for cache entries in XML or JSON format.
Equality Considerations
When storing data in binary format, Data Grid uses the WrappedBytes
interface for keys and values. This wrapper class transparently takes care of serialization and deserialization on demand, and internally may have a reference to the object itself being wrapped, or the serialized, byte array representation of the object. This has an effect on the behavior of equality, which is important to note if you implement an equals()
methods on keys.
The equals()
method of the wrapper class either compares binary representations (byte arrays) or delegates to the wrapped object instance’s equals()
method, depending on whether both instances being compared are in serialized or deserialized form at the time of comparison. If one of the instances being compared is in one form and the other in another form, then one instance is either serialized or deserialized.
4.6. Storing Unmarshalled Java Objects
You can store data as deserialized Plain Old Java Objects (POJO) instead of storing data in a binary format.
Storing POJO instead of binary format is not recommended because it requires Data Grid to serialize data on client read operations and deserialize data on write operations. To handle client interoperability with custom code you should convert data on demand.
Procedure
Specify
application/x-java-object
as the MediaType for keys and values as follows:<distributed-cache name="my-cache"> <encoding> <key media-type="application/x-java-object"/> <value media-type="application/x-java-object"/> </encoding> </distributed-cache>
Put class files for all custom objects on the Data Grid server classpath.
Add JAR files that contain custom classes and/or service providers for marshaller implementations in the
server/lib
directory.├── server │ ├── lib │ │ ├── UserObjects.jar │ └── README.txt
- Configure your clients.
There are no changes required for Hot Rod clients. The only requirement is that the marshaller used in the client is available in the server/lib
directory so Data Grid can de-serialize the objects.
ProtoStream and Java Serialization marshallers are already available on the server.
REST clients must use either JSON or XML so Data Grid can convert to and from Java objects.
4.7. Data Encoding
Encoding is the data conversion operation done by Data Grid caches before storing data, and when reading back from storage.
4.7.1. Overview
Encoding allows dealing with a certain data format during API calls (map, listeners, stream, etc) while the format effectively stored is different.
The data conversions are handled by instances of org.infinispan.commons.dataconversion.Encoder :
public interface Encoder { /** * Convert data in the read/write format to the storage format. * * @param content data to be converted, never null. * @return Object in the storage format. */ Object toStorage(Object content); /** * Convert from storage format to the read/write format. * * @param content data as stored in the cache, never null. * @return data in the read/write format */ Object fromStorage(Object content); /** * Returns the {@link MediaType} produced by this encoder or null if the storage format is not known. */ MediaType getStorageFormat(); }
4.7.2. Default encoders
Data Grid automatically picks the Encoder depending on the cache configuration. The table below shows which internal Encoder is used for several configurations:
Mode | Configuration | Encoder | Description |
---|---|---|---|
Embedded/Server | Default | IdentityEncoder | Passthrough encoder, no conversion done |
Embedded | StorageType.OFF_HEAP | GlobalMarshallerEncoder | Use the Data Grid internal marshaller to convert to byte[]. May delegate to the configured marshaller in the cache manager. |
Embedded | StorageType.BINARY | BinaryEncoder | Use the Data Grid internal marshaller to convert to byte[], except for primitives and String. |
Server | StorageType.OFF_HEAP | IdentityEncoder | Store byte[]s directly as received by remote clients |
4.7.3. Overriding programmatically
It is possible to override programmatically the encoding used for both keys and values, by calling the .withEncoding() method variants from AdvancedCache.
Example, consider the following cache configured as OFF_HEAP:
// Read and write POJO, storage will be byte[] since for // OFF_HEAP the GlobalMarshallerEncoder is used internally: cache.put(1, new Pojo()) Pojo value = cache.get(1) // Get the content in its stored format by overriding // the internal encoder with a no-op encoder (IdentityEncoder) Cache<?,?> rawContent = cache.getAdvancedCache().withEncoding(IdentityEncoder.class); byte[] marshalled = (byte[]) rawContent.get(1);
The override can be useful if any operation in the cache does not require decoding, such as counting number of entries, or calculating the size of byte[] of an OFF_HEAP cache.
4.7.4. Defining Custom Encoders
A custom encoder can be registered in the EncoderRegistry.
Ensure that the registration is done in every node of the cluster, before starting the caches.
Consider a custom encoder used to compress/decompress with gzip:
public class GzipEncoder implements Encoder { @Override public Object toStorage(Object content) { assert content instanceof String; return compress(content.toString()); } @Override public Object fromStorage(Object content) { assert content instanceof byte[]; return decompress((byte[]) content); } private byte[] compress(String str) { try (ByteArrayOutputStream baos = new ByteArrayOutputStream(); GZIPOutputStream gis = new GZIPOutputStream(baos)) { gis.write(str.getBytes("UTF-8")); gis.close(); return baos.toByteArray(); } catch (IOException e) { throw new RuntimeException("Unabled to compress", e); } } private String decompress(byte[] compressed) { try (GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(compressed)); BufferedReader bf = new BufferedReader(new InputStreamReader(gis, "UTF-8"))) { StringBuilder result = new StringBuilder(); String line; while ((line = bf.readLine()) != null) { result.append(line); } return result.toString(); } catch (IOException e) { throw new RuntimeException("Unable to decompress", e); } } @Override public MediaType getStorageFormat() { return MediaType.parse("application/gzip"); } @Override public boolean isStorageFormatFilterable() { return false; } @Override public short id() { return 10000; } }
It can be registered by:
GlobalComponentRegistry registry = cacheManager.getGlobalComponentRegistry(); EncoderRegistry encoderRegistry = registry.getComponent(EncoderRegistry.class); encoderRegistry.registerEncoder(new GzipEncoder());
And then be used to write and read data from a cache:
AdvancedCache<String, String> cache = ... // Decorate cache with the newly registered encoder, without encoding keys (IdentityEncoder) // but compressing values AdvancedCache<String, String> compressingCache = (AdvancedCache<String, String>) cache.withEncoding(IdentityEncoder.class, GzipEncoder.class); // All values will be stored compressed... compressingCache.put("297931749", "0412c789a37f5086f743255cfa693dd5"); // ... but API calls deals with String String stringValue = compressingCache.get("297931749"); // Bypassing the value encoder to obtain the value as it is stored Object value = compressingCache.withEncoding(IdentityEncoder.class).get("297931749"); // value is a byte[] which is the compressed value
4.8. Transcoders and Data Conversion
Data Grid uses org.infinispan.commons.dataconversion.Transcoder
to convert data between MediaType formats.
public interface Transcoder { /** * Transcodes content between two different {@link MediaType}. * * @param content Content to transcode. * @param contentType The {@link MediaType} of the content. * @param destinationType The target {@link MediaType} to convert. * @return the transcoded content. */ Object transcode(Object content, MediaType contentType, MediaType destinationType); /** * @return all the {@link MediaType} handled by this Transcoder. */ Set<MediaType> getSupportedMediaTypes(); }
4.8.1. Converting Data on Demand
You can deploy and run custom code on Data Grid, such as tasks, listeners, and merge policies. Custom code on Data Grid can directly access data but must also interoperate with clients that access the same data through different endpoints. For example, you can create tasks that handle custom objects while Hot Rod clients read and write data in binary format.
In this case, you can configure application/x-protostream
as the cache encoding to store data in binary format then configure your custom code to perform cache operations using a different MediaType.
For example:
DefaultCacheManager cacheManager = new DefaultCacheManager(); // The cache will store POJO for keys and values ConfigurationBuilder cfg = new ConfigurationBuilder(); cfg.encoding().key().mediaType("application/x-java-object"); cfg.encoding().value().mediaType("application/x-java-object"); cacheManager.defineConfiguration("mycache", cfg.build()); Cache<Integer, Person> cache = cacheManager.getCache("mycache"); cache.put(1, new Person("John","Doe")); // Wraps cache using 'application/x-java-object' for keys but JSON for values Cache<Integer, byte[]> jsonValuesCache = (Cache<Integer, byte[]>) cache.getAdvancedCache().withMediaType("application/x-java-object", "application/json"); byte[] json = jsonValuesCache.get(1);
Will return the value in JSON format:
{ "_type":"org.infinispan.sample.Person", "name":"John", "surname":"Doe" }
4.8.2. Installing Transcoders in Embedded Deloyments
Data Grid Server includes transcoders by default. However, when running Data Grid as a library, you must add the following to your project:
org.infinispan:infinispan-server-core
4.8.3. Transcoders and Encoders
Usually there will be none or only one data conversion involved in a cache operation:
- No conversion by default on caches using in embedded or server mode;
- Encoder based conversion for embedded caches without MediaType configured, but using OFF_HEAP or BINARY;
- Transcoder based conversion for caches used in server mode with multiple REST and Hot Rod clients sending and receiving data in different formats. Those caches will have MediaType configured describing the storage.
But it’s possible to have both encoders and transcoders being used simultaneously for advanced use cases.
Consider an example, a cache that stores marshalled objects (with jboss marshaller) content but for security reasons a transparent encryption layer should be added in order to avoid storing "plain" data to an external store. Clients should be able to read and write data in multiple formats.
This can be achieved by configuring the cache with the the MediaType that describes the storage regardless of the encoding layer:
ConfigurationBuilder cfg = new ConfigurationBuilder(); cfg.encoding().key().mediaType("application/x-jboss-marshalling"); cfg.encoding().key().mediaType("application/x-jboss-marshalling");
The transparent encryption can be added by decorating the cache with a special Encoder that encrypts/decrypts with storing/retrieving, for example:
class Scrambler implements Encoder { public Object toStorage(Object content) { // Encrypt data } public Object fromStorage(Object content) { // Decrypt data } @Override public boolean isStorageFormatFilterable() { } public MediaType getStorageFormat() { return new MediaType("application", "scrambled"); } @Override public short id() { //return id } }
To make sure all data written to the cache will be stored encrypted, it’s necessary to decorate the cache with the Encoder above and perform all cache operations in this decorated cache:
Cache<?,?> secureStorageCache = cache.getAdvancedCache().withEncoding(Scrambler.class).put(k,v);
The capability of reading data in multiple formats can be added by decorating the cache with the desired MediaType:
// Obtain a stream of values in XML format from the secure cache secureStorageCache.getAdvancedCache().withMediaType("application/xml","application/xml").values().stream();
Internally, Data Grid will first apply the encoder fromStorage operation to obtain the entries, that will be in "application/x-jboss-marshalling" format and then apply a successive conversion to "application/xml" by using the adequate Transcoder.