PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Scanning versus indexing

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scanning versus indexing

    I'm wondering if I'm missing something obvious...

    Our site is driven by a patched version of VBulletin, and the interesting content is stored as documents (usually pdfs). Access to everything is script controlled, so (for example) people will get at a particular document with something like getscript.php?action=retrieve... which grabs and downloads the content to the browser. The raw pdfs are not visible in the site.

    Zoom does a super job of finding everything, but along the way it also indexes all the surrounding junk (usernames, etc) in the forums, headers and footers, etc. I've read the instructions for integrating with bulletin boards, which helps, but...

    What I think I'd like to do is scan everything and follow links, but only index the content which is retrieved by the final getscript.php?action=retrieve

    Is this straightforward to set up?

    I suppose I could achieve the same result by pre-scanning my database to produce a dynamic search config file, telling zoom exactly which scripts to index.

    Is there another way?

    thanks

  • #2
    To avoid indexing the content in the VBulletin DB but to just follow links to the PDF's, make a small customisation to VBulletin.

    Add <!--ZOOMSTOP--> and <!--ZOOMRESTART--> tags to the header and footer of the page.

    You still need to use the skip list recomended for forums to skip unwanted pages however.

    Comment

    Working...
    X