Class PackageParser

java.lang.Object
org.apache.tika.parser.AbstractParser
org.apache.tika.parser.AbstractEncodingDetectorParser
org.apache.tika.parser.pkg.PackageParser
All Implemented Interfaces:
Serializable, org.apache.tika.parser.Parser

public class PackageParser extends org.apache.tika.parser.AbstractEncodingDetectorParser
Parser for various packaging formats. Package entries will be written to the XHTML event stream as <div class="package-entry"> elements that contain the (optional) entry name as a <h1> element and the full structured body content of the parsed entry.

User must have JCE Unlimited Strength jars installed for encryption to work with 7Z files (see: COMPRESS-299 and TIKA-1521). If the jars are not installed, an IOException will be thrown, and potentially wrapped in a TikaException.

See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
     
    PackageParser(org.apache.tika.detect.EncodingDetector encodingDetector)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    Set<org.apache.tika.mime.MediaType>
    getSupportedTypes(org.apache.tika.parser.ParseContext context)
     
    protected static org.apache.tika.metadata.Metadata
    handleEntryMetadata(String name, Date createAt, Date modifiedAt, Long size, org.apache.tika.sax.XHTMLContentHandler xhtml)
     
    boolean
     
    void
    parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context)
     
    void
    setDetectCharsetsInEntryNames(boolean detectCharsetsInEntryNames)
    Whether or not to run the default charset detector against entry names in ZipFiles.

    Methods inherited from class org.apache.tika.parser.AbstractEncodingDetectorParser

    getEncodingDetector, getEncodingDetector, setEncodingDetector

    Methods inherited from class org.apache.tika.parser.AbstractParser

    parse

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • PackageParser

      public PackageParser()
    • PackageParser

      public PackageParser(org.apache.tika.detect.EncodingDetector encodingDetector)
  • Method Details

    • handleEntryMetadata

      protected static org.apache.tika.metadata.Metadata handleEntryMetadata(String name, Date createAt, Date modifiedAt, Long size, org.apache.tika.sax.XHTMLContentHandler xhtml) throws SAXException, IOException, org.apache.tika.exception.TikaException
      Throws:
      SAXException
      IOException
      org.apache.tika.exception.TikaException
    • getSupportedTypes

      public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
    • parse

      public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException
      Throws:
      IOException
      SAXException
      org.apache.tika.exception.TikaException
    • setDetectCharsetsInEntryNames

      @Field public void setDetectCharsetsInEntryNames(boolean detectCharsetsInEntryNames)
      Whether or not to run the default charset detector against entry names in ZipFiles. The default is true.
      Parameters:
      detectCharsetsInEntryNames -
    • isDetectCharsetsInEntryNames

      public boolean isDetectCharsetsInEntryNames()