PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

PDF Index Strategy

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PDF Index Strategy

    Im evaluating the product and don't have the PDF plugin to test this yet, so not sure if this would work.

    If I have a simple php page that presents the PDF file with a little description and a link to the pdf, I would like zoom to index the PDF file, but return the corresponding PHP file as the link in the results page instead of the PDF file.

    Is that possible? Anyone have a potential workaround for that scenario?

  • #2
    Why don't you want a link to the PDF in the resutls?

    Some possible solutions are,
    1) Using the rewrite links feature in Zoom. But this will only work if you can some up with a search and replace pattern that transforms the URLs in the manner you require for the site.
    2) Including a key text extract of the PDF file on the PHP file. Then don't index the PDF file at all.
    3) Use .desc files to add meta data to the PDF file then allow direct links to the PDF file. This assumes the reason for the PHP page was to associated extra keyword with the PDF file.

    Comment


    • #3
      Thanks, I will explore the url rewriting and other options.

      The reason for having them go to a php page instead of directly to the Pdf is twofold:
      1) to provide a landing page that gives an overview of the pdf to help the user determine if that is indeed what they were looking for. Its not as easy to tell by going directly to the PDF.
      2) A security check to see if they have permission to actually view the pdf file.

      Comment


      • #4
        In that case I would add a record to your server's .htaccess file to do a server side redirect from the PDF file to the PHP file. Then only provide access to the PDF via a PHP script. Or a scheme similar to this. This assumes you are using Apache and not IIS.

        Zoom is not a file security system. And it would be very naive to believe that your PDF documents are secure becuase the search engine doesn't return a link to them.

        Comment


        • #5
          Although I am naive, I was aware that this would not fully protect the pdf files

          However, in the .htaccess scenario, wouldn't that also prohibit zoom from indexing the pdf files? Or would I have to get fancy with my .htaccess file and give permission to zoom for indexing.

          I was also thinking that for mp3 files, even from a non-security standpoint, I would want to be able to index the mp3 meta, but go to a landing page for that mp3 in the search instead of directly to the mp3. I guess that is doable by having the meta content also be in the php. For a pdf, though, I would prefer to index the entire pdf file instead of having to duplicate it on the php landing page.

          Thanks

          Comment

          Working...
          X