PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Prevent copyright page in PDF from being indexed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Prevent copyright page in PDF from being indexed

    We have a lot of PDF files on our website and the Zoom indexer works well with them. I would like to exclude the Copyright page and Table of Contents at the beginning of each file because these are of no value in the search index. I tried the following in the Content Filtering section but it seems to be filtering out the entire PDF file, rather than the two pages that I want excluded.

    -copyright
    -contents

    Is there something else I can do?
    Thanks

  • #2
    With a HTML page it is possible to exclude parts of a page.

    But with a PDF file you can't exclude certain pages within a PDF.

    Comment


    • #3
      It might be possible but I suspect it would be quite resource intensive and may cripple some of the utility of the PDF document. It might also depend on how the document was originally created. If you can create the original document as images rather than text then you can get Acrobat to OCR just the pages that you want indexed. (it sounds a bit clunky but it depends on exactly what you're trying to achieve and how many documents are involved - I haven't experimented but you might even be able to get Acrobats batch commands to automate it.)

      Comment


      • #4
        Thanks for the responses. The copyright notice is half a page of text so I could probably put it in the file as an image but I was hoping there might be another way.

        Comment

        Working...
        X