Skip to content

fix TIKA-2623: "get embedded resources in PDF/doc files" (by OhadR)#233

Open
OhadR wants to merge 2 commits intoapache:masterfrom
OhadR:master
Open

fix TIKA-2623: "get embedded resources in PDF/doc files" (by OhadR)#233
OhadR wants to merge 2 commits intoapache:masterfrom
OhadR:master

Conversation

@OhadR
Copy link
Copy Markdown

@OhadR OhadR commented Apr 5, 2018

The motivation: support embedded files in PDF, Word's doc/docx, etc.

i have refactored FileEmbeddedDocumentExtractor: moved it from tika-cli to tika-parsers, so applications that are dependent on tika-parser, but not on tika-app, can use it.

maybe tika-core could be a better place to put this file (like 'ParsingEnbeddedDocumentExtractor'), but then the pom.xml needs to be changed: need to add dependencies of apache-common-io, poi, etc. and we do not wanna do that, because we do not want to add dependencies to tika-core.

CELLEBRITE\OhadR added 2 commits April 3, 2018 03:56
…io and apache-poi that tika-parser already use (tika-core needs to change its pom.xml if we want to place it in core)
@OhadR OhadR changed the title fix TIKA-2623: "get embedded resources in doc files" (by OhadR) fix TIKA-2623: "get embedded resources in PDF/doc files" (by OhadR) Apr 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant