Package org.apache.uima.cas.impl
Class BinaryCasSerDes6
- java.lang.Object
-
- org.apache.uima.cas.impl.BinaryCasSerDes6
-
public class BinaryCasSerDes6 extends java.lang.Object
User callable serialization and deserialization of the CAS in a compressed Binary Format This serializes/deserializes the state of the CAS. It has the capability to map type systems, so the sending and receiving type systems do not have to be the same. - types and features are matched by name, and features must have the same range (slot kind) - types and/or features in one type system not in the other are skipped over Header specifies to reader the format, and the compression level. How to Serialize: 1) create an instance of this class a) if doing a delta serialization, pass in the mark and a ReuseInfo object that was created after deserializing this CAS initially. b) if serializaing to a target with a different type system, pass the target's type system impl object so the serialization can filter the types for the target. 2) call serialize() to serialize the CAS 3) If doing serialization to a target from which you expect to receive back a delta CAS, create a ReuseInfo object from this object and reuse it for deserializing the delta CAS. TypeSystemImpl objects are lazily augmented by customized TypeInfo instances for each type encountered in serializing or deserializing. These are preserved for future calls, so their setup / initialization is only needed the first time. TypeSystemImpl objects are also lazily augmented by typeMappers for individual different target typesystems; these too are preserved and reused on future calls. Compressed Binary CASes are designed to be "self-describing" - The format of the compressed binary CAS, including version info, is inserted at the beginning so that a proper deserialization method can be automatically chosen. Compressed Binary format implemented by this class supports type system mapping. Types in the source which are not in the target (or vice versa) are omitted. Types with "extra" features have their extra features omitted (or on deserialization, they are set to their default value - null, or 0, etc.). Feature slots which hold references to types not in the target type system are replaced with 0 (null). How to Deserialize: 1) get an appropriate CAS to deserialize into. For delta CAS, it does not have to be empty, but it must be the originating CAS from which the delta was produced. 2) If the case is one where the target type system == the CAS's, and the serialized for is not Delta, then, call aCAS.reinit(source). Otherwise, create an instance of this class -%gt; xxx a) Assuming the object being deserialized has a different type system, set the "target" type system to the TypeSystemImpl instance of the object being deserialized. a) if delta deserializing, pass in the ReuseInfo object created when the CAS was serialized 3) call xxx.deserialize(inputStream) Compression/Decompression Works in two stages: application of Zip/Unzip to particular sub-collections of CAS data, grouped according to similar data distribution collection of like kinds of data (to make the zipping more effective) There can be up to ~20 of these collections, such as control info, float-exponents, string chars Deserialization: Read all bytes, create separate ByteArrayInputStreams for each segment create appropriate unzip data input streams for these Slow but expensive data: extra type system info - lazily created and added to shared TypeSystemImpl object set up per type actually referenced mapper for type system - lazily created and added to shared TypeSystemImpl object in identity-map cache (size limit = 10 per source type system?) - key is target typesystemimpl. Defaulting: flags: doMeasurements, compressLevel, CompressStrategy Per serialize call: cas, output, [target ts], [mark for delta] Per deserialize call: cas, input, [target ts], whether-to-save-info-for-delta-serialization CASImpl has instance method with defaulting args for serialization. CASImpl has reinit which works with compressed binary serialization objects if no type mapping If type mapping, (new BinaryCasSerDes6(cas, marker-or-null, targetTypeSystem (for stream being deserialized), reuseInfo-or-null) .deserialize(in-stream) Use Cases, filtering and delta ************************************************************************** * (de)serialize * filter? * delta? * Use case ************************************************************************** * serialize * N * N * Saving a Cas, * * * * sending Cas to service with identical ts ************************************************************************** * serialize * Y * N * sending Cas to service with * * * * different ts (a guaranteed subset) ************************************************************************** * serialize * N * Y * returning Cas to client * * * * uses info saved when deserializing * * * * (?? saving just a delta to disk??) ************************************************************************** * serialize * Y * Y * NOT SUPPORTED (not needed) ************************************************************************** * deserialize * N * N * reading/(receiving) CAS, identical TS ************************************************************************** * deserialize * Y * N * reading/receiving CAS, different TS * * * * ts not guaranteed to be superset * * * * for "reading" case. ************************************************************************** * deserialize * N * Y * receiving CAS, identical TS * * * * uses info saved when serializing ************************************************************************** * deserialize * Y * Y * receiving CAS, different TS (tgt a feature subset) * * * * uses info saved when serializing **************************************************************************
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BinaryCasSerDes6.CompressLevel
Compression alternativesstatic class
BinaryCasSerDes6.CompressStrat
static class
BinaryCasSerDes6.ReuseInfo
Info reused for 1) multiple serializations of same cas to multiple targets (a speedup), or 2) for delta cas serialization, where it represents the fsStartIndex info before any mods were done which could change that info, or 3) for deserializing with a delta cas, where it represents the fsStartIndex info at the time the CAS was serialized out..
-
Constructor Summary
Constructors Constructor Description BinaryCasSerDes6(AbstractCas cas)
Setup to serialize (not delta) or deserialize (not delta) using binary compression, no type mapping but only processing reachable Feature StructuresBinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs)
Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping and only processing reachable Feature StructuresBinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs, boolean storeTS, boolean storeTSI)
Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping, optionally storing TSI, and only processing reachable Feature StructuresBinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs)
Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature StructuresBinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements)
Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures, output measurementsBinaryCasSerDes6(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy)
Setup to serialize or deserialize using binary compression, with (optional) type mapping and only processing reachable Feature StructuresBinaryCasSerDes6(AbstractCas cas, TypeSystemImpl tgtTs)
Setup to serialize (not delta) or deserialize (not delta) using binary compression, with type mapping and only processing reachable Feature Structures
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
compareCASes(CASImpl c1, CASImpl c2)
Compare 2 CASes, with perhaps different type systems.void
deserialize(java.io.InputStream istream)
void
deserialize(java.io.InputStream istream, AllowPreexistingFS allowPreexistingFS)
Version used by uima-as to read delta cas from remote parallel stepsvoid
deserializeAfterVersion(java.io.DataInputStream istream, boolean isDelta, AllowPreexistingFS allowPreexistingFS)
BinaryCasSerDes6.ReuseInfo
getReuseInfo()
SerializationMeasures
serialize(java.lang.Object out)
S E R I A L I Z E
-
-
-
Constructor Detail
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy) throws ResourceInitializationException
Setup to serialize or deserialize using binary compression, with (optional) type mapping and only processing reachable Feature Structures- Parameters:
aCas
- required - refs the CAS being serialized or deserialized intomark
- if not null is the serialization mark for delta serialization. Unused for deserialization.tgtTs
- if not null is the target type system. For serialization - this is a subset of the CASs TSrfs
- For delta serialization - must be not null, and the saved value after deserializing the original before any modifications / additions made. For normal serialization - can be null, but if not, is used in place of re-calculating, for speed up For delta deserialization - must not be null, and is the saved value after serializing to the service For normal deserialization - must be nulldoMeasurements
- if true, measurements are done (on serialization)compressLevel
- if not null, specifies enum instance for compress levelcompressStrategy
- if not null, specifies enum instance for compress strategy- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas) throws ResourceInitializationException
Setup to serialize (not delta) or deserialize (not delta) using binary compression, no type mapping but only processing reachable Feature Structures- Parameters:
cas
- -- Throws:
ResourceInitializationException
- never thrown
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, TypeSystemImpl tgtTs) throws ResourceInitializationException
Setup to serialize (not delta) or deserialize (not delta) using binary compression, with type mapping and only processing reachable Feature Structures- Parameters:
cas
- -tgtTs
- -- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs) throws ResourceInitializationException
Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures- Parameters:
cas
- -mark
- -tgtTs
- -rfs
- Reused Feature Structure information - required for both delta serialization and delta deserialization- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements) throws ResourceInitializationException
Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures, output measurements- Parameters:
cas
- -mark
- -tgtTs
- -rfs
- Reused Feature Structure information - speed up on serialization, required on delta deserializationdoMeasurements
- -- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs) throws ResourceInitializationException
Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping and only processing reachable Feature Structures- Parameters:
cas
- -rfs
- -- Throws:
ResourceInitializationException
- never thrown
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs, boolean storeTS, boolean storeTSI) throws ResourceInitializationException
Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping, optionally storing TSI, and only processing reachable Feature Structures- Parameters:
cas
- -rfs
- -storeTS
- -storeTSI
- -- Throws:
ResourceInitializationException
- never thrown
-
-
Method Detail
-
getReuseInfo
public BinaryCasSerDes6.ReuseInfo getReuseInfo()
-
serialize
public SerializationMeasures serialize(java.lang.Object out) throws java.io.IOException
S E R I A L I Z E- Parameters:
out
- -- Returns:
- null or serialization measurements (depending on setting of doMeasurements)
- Throws:
java.io.IOException
- passthru
-
deserialize
public void deserialize(java.io.InputStream istream) throws java.io.IOException
- Parameters:
istream
- -- Throws:
java.io.IOException
- -
-
deserialize
public void deserialize(java.io.InputStream istream, AllowPreexistingFS allowPreexistingFS) throws java.io.IOException
Version used by uima-as to read delta cas from remote parallel steps- Parameters:
istream
- input streamallowPreexistingFS
- what to do if item already exists below the mark- Throws:
java.io.IOException
- passthru
-
deserializeAfterVersion
public void deserializeAfterVersion(java.io.DataInputStream istream, boolean isDelta, AllowPreexistingFS allowPreexistingFS) throws java.io.IOException
- Throws:
java.io.IOException
-
compareCASes
public boolean compareCASes(CASImpl c1, CASImpl c2)
Compare 2 CASes, with perhaps different type systems. If the type systems are different, construct a type mapper and use that to selectively ignore types or features not in other type system The Mapper filters C1 -%gt; C2. Compare only feature structures reachable via indexes or refs The order must match- Parameters:
c1
- CAS to comparec2
- CAS to compare- Returns:
- true if equal (for types / features in both)
-
-