Package com.fasterxml.jackson.core.sym
Class ByteQuadsCanonicalizer
- java.lang.Object
-
- com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer
-
public final class ByteQuadsCanonicalizer extends java.lang.ObjectReplacement forBytesToNameCanonicalizerwhich aims at more localized memory access due to flattening of name quad data. Performance improvement modest for simple JSON document data binding (maybe 3%), but should help more for larger symbol tables, or for binary formats like Smile.Hash area is divided into 4 sections:
- Primary area (1/2 of total size), direct match from hash (LSB)
- Secondary area (1/4 of total size), match from
hash (LSB) >> 1 - Tertiary area (1/8 of total size), match from
hash (LSB) >> 2 - Spill-over area (remaining 1/8) with linear scan, insertion order
ints, where 1 - 3 ints contain 1 - 12 UTF-8 encoded bytes of name (null-padded), and last int is offset in_namesthat contains actual name Strings.- Since:
- 2.6
-
-
Field Summary
Fields Modifier and Type Field Description protected int_countTotal number of Strings in the symbol table; only used for child tables.protected boolean_failOnDoSFlag that indicates whether we should throw an exception if enough hash collisions are detected (true); or just worked around (false).protected int[]_hashAreaPrimary hash information area: consists of2 * _hashSizeentries of 16 bytes (4 ints), arranged in a cascading lookup structure (details of which may be tweaked depending on expected rates of collisions).protected boolean_hashSharedFlag that indicates whether underlying data structures for the main hash area are shared or not.protected int_hashSizeNumber of slots for primary entries within_hashArea; which is at most1/8of actual size of the underlying array (4-int slots, primary covers only half of the area; plus, additional area for longer symbols after hash area).protected boolean_internWhether canonical symbol Strings are to be intern()ed before added to the table or not.protected int_longNameOffsetOffset within_hashAreathat follows main slots and contains quads for longer names (13 bytes or longer), and points to the first available int that may be used for appending quads of the next long name.protected java.lang.String[]_namesArray that containsStringinstances matching entries in_hashArea.protected ByteQuadsCanonicalizer_parentReference to the root symbol table, for child tables, so that they can merge table information back as necessary.protected int_secondaryStartOffset within_hashAreawhere secondary entries startprotected int_seedSeed value we use as the base to make hash codes non-static between different runs, but still stable for lifetime of a single symbol table instance.protected int_spilloverEndPointer to the offset within spill-over area where there is room for more spilled over entries (if any).protected java.util.concurrent.atomic.AtomicReference<com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.TableInfo>_tableInfoMember that is only used by the root table instance: root passes immutable state info child instances, and children may return new state if they add entries to the table.protected int_tertiaryShiftConstant that determines size of buckets for tertiary entries:1 << _tertiaryShiftis the size, and shift value is also used for translating from primary offset into tertiary bucket (shift right by4 + _tertiaryShift).protected int_tertiaryStartOffset within_hashAreawhere tertiary entries start
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void_reportTooManyCollisions()java.lang.StringaddName(java.lang.String name, int q1)java.lang.StringaddName(java.lang.String name, int[] q, int qlen)java.lang.StringaddName(java.lang.String name, int q1, int q2)java.lang.StringaddName(java.lang.String name, int q1, int q2, int q3)intbucketCount()intcalcHash(int q1)intcalcHash(int[] q, int qlen)intcalcHash(int q1, int q2)intcalcHash(int q1, int q2, int q3)static ByteQuadsCanonicalizercreateRoot()Factory method to call to create a symbol table instance with a randomized seed value.protected static ByteQuadsCanonicalizercreateRoot(int seed)java.lang.StringfindName(int q1)java.lang.StringfindName(int[] q, int qlen)java.lang.StringfindName(int q1, int q2)java.lang.StringfindName(int q1, int q2, int q3)inthashSeed()ByteQuadsCanonicalizermakeChild(int flags)Factory method used to create actual symbol table instance to use for parsing.booleanmaybeDirty()Method called to check to quickly see if a child symbol table may have gotten additional entries.intprimaryCount()Method mostly needed by unit tests; calculates number of entries that are in the primary slot set.voidrelease()Method called by the using code to indicate it is done with this instance.intsecondaryCount()Method mostly needed by unit tests; calculates number of entries in secondary bucketsintsize()intspilloverCount()Method mostly needed by unit tests; calculates number of entries in shared spill-over areainttertiaryCount()Method mostly needed by unit tests; calculates number of entries in tertiary bucketsjava.lang.StringtoString()inttotalCount()
-
-
-
Field Detail
-
_parent
protected final ByteQuadsCanonicalizer _parent
Reference to the root symbol table, for child tables, so that they can merge table information back as necessary.
-
_tableInfo
protected final java.util.concurrent.atomic.AtomicReference<com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.TableInfo> _tableInfo
Member that is only used by the root table instance: root passes immutable state info child instances, and children may return new state if they add entries to the table. Child tables do NOT use the reference.
-
_seed
protected final int _seed
Seed value we use as the base to make hash codes non-static between different runs, but still stable for lifetime of a single symbol table instance. This is done for security reasons, to avoid potential DoS attack via hash collisions.
-
_intern
protected boolean _intern
Whether canonical symbol Strings are to be intern()ed before added to the table or not.NOTE: non-final to allow disabling intern()ing in case of excessive collisions.
-
_failOnDoS
protected final boolean _failOnDoS
Flag that indicates whether we should throw an exception if enough hash collisions are detected (true); or just worked around (false).- Since:
- 2.4
-
_hashArea
protected int[] _hashArea
Primary hash information area: consists of2 * _hashSizeentries of 16 bytes (4 ints), arranged in a cascading lookup structure (details of which may be tweaked depending on expected rates of collisions).
-
_hashSize
protected int _hashSize
Number of slots for primary entries within_hashArea; which is at most1/8of actual size of the underlying array (4-int slots, primary covers only half of the area; plus, additional area for longer symbols after hash area).
-
_secondaryStart
protected int _secondaryStart
Offset within_hashAreawhere secondary entries start
-
_tertiaryStart
protected int _tertiaryStart
Offset within_hashAreawhere tertiary entries start
-
_tertiaryShift
protected int _tertiaryShift
Constant that determines size of buckets for tertiary entries:1 << _tertiaryShiftis the size, and shift value is also used for translating from primary offset into tertiary bucket (shift right by4 + _tertiaryShift).Default value is 2, for buckets of 4 slots; grows bigger with bigger table sizes.
-
_count
protected int _count
Total number of Strings in the symbol table; only used for child tables.
-
_names
protected java.lang.String[] _names
-
_spilloverEnd
protected int _spilloverEnd
Pointer to the offset within spill-over area where there is room for more spilled over entries (if any). Spill over area is within fixed-size portion of_hashArea.
-
_longNameOffset
protected int _longNameOffset
-
_hashShared
protected boolean _hashShared
Flag that indicates whether underlying data structures for the main hash area are shared or not. If they are, then they need to be handled in copy-on-write way, i.e. if they need to be modified, a copy needs to be made first; at this point it will not be shared any more, and can be modified.This flag needs to be checked both when adding new main entries, and when adding new collision list queues (i.e. creating a new collision list head entry)
-
-
Method Detail
-
createRoot
public static ByteQuadsCanonicalizer createRoot()
Factory method to call to create a symbol table instance with a randomized seed value.- Returns:
- Root instance to use for constructing new child instances
-
createRoot
protected static ByteQuadsCanonicalizer createRoot(int seed)
-
makeChild
public ByteQuadsCanonicalizer makeChild(int flags)
Factory method used to create actual symbol table instance to use for parsing.- Parameters:
flags- Bit flags of activeJsonFactory.Features enabled.- Returns:
- Actual canonicalizer instance that can be used by a parser
-
release
public void release()
Method called by the using code to indicate it is done with this instance. This lets instance merge accumulated changes into parent (if need be), safely and efficiently, and without calling code having to know about parent information.
-
size
public int size()
- Returns:
- Number of symbol entries contained by this canonicalizer instance
-
bucketCount
public int bucketCount()
- Returns:
- number of primary slots table has currently
-
maybeDirty
public boolean maybeDirty()
Method called to check to quickly see if a child symbol table may have gotten additional entries. Used for checking to see if a child table should be merged into shared table.- Returns:
- Whether main hash area has been modified
-
hashSeed
public int hashSeed()
-
primaryCount
public int primaryCount()
Method mostly needed by unit tests; calculates number of entries that are in the primary slot set. These are "perfect" entries, accessible with a single lookup- Returns:
- Number of entries in the primary hash area
-
secondaryCount
public int secondaryCount()
Method mostly needed by unit tests; calculates number of entries in secondary buckets- Returns:
- Number of entries in the secondary hash area
-
tertiaryCount
public int tertiaryCount()
Method mostly needed by unit tests; calculates number of entries in tertiary buckets- Returns:
- Number of entries in the tertiary hash area
-
spilloverCount
public int spilloverCount()
Method mostly needed by unit tests; calculates number of entries in shared spill-over area- Returns:
- Number of entries in the linear spill-over areay
-
totalCount
public int totalCount()
-
toString
public java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
findName
public java.lang.String findName(int q1)
-
findName
public java.lang.String findName(int q1, int q2)
-
findName
public java.lang.String findName(int q1, int q2, int q3)
-
findName
public java.lang.String findName(int[] q, int qlen)
-
addName
public java.lang.String addName(java.lang.String name, int q1)
-
addName
public java.lang.String addName(java.lang.String name, int q1, int q2)
-
addName
public java.lang.String addName(java.lang.String name, int q1, int q2, int q3)
-
addName
public java.lang.String addName(java.lang.String name, int[] q, int qlen)
-
calcHash
public int calcHash(int q1)
-
calcHash
public int calcHash(int q1, int q2)
-
calcHash
public int calcHash(int q1, int q2, int q3)
-
calcHash
public int calcHash(int[] q, int qlen)
-
_reportTooManyCollisions
protected void _reportTooManyCollisions()
-
-