Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
15.3. Custom Text Extractors
15.3.1. The Text Extraction Framework Link kopierenLink in die Zwischenablage kopiert!
Link kopierenLink in die Zwischenablage kopiert!
A text extractor is actually a plain old Java object (POJO). To create an extractor, you create a Java class that extends a single abstract class, called
TextExtractor
:
The abstract class also contains fields and getters (not shown above) for the name and logger that are automatically set by the hierarchical database during repository initialization.
There are two abstract methods that must be implemented:
supportsMimeType(...)
and extractFrom(...)
. The first is fairly obvious: return true for all of the MIME types for which the extractor is capable of processing. The extractFrom
method is the meat of the implementation, and should process the BINARY value's contents and write the searchable text to the supplied Output
object.
Note that the
processStream(...)
method is a utility that can be called by the extractFrom
and that properly opens the BINARY value's stream, processes the content, and ensures that the stream is always closed. Your implementation can therefore implement the extractFrom
method as follows:
This can make your implementation a little easier, but feel free to implement the
extractFrom
method directly process the stream.