PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Not indexing subfolders

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Not indexing subfolders

    I just upgraded to v6.0 from 5.1 and subfolders seem to be downloaded by the indexer but files within those folders aren't being found or indexed (nothing showing as "skipped").

    I have a library.php page in my root directory that links to /library/, which is where I'm starting the indexer. That folder has 127 subfolders and a few PDF files in it. The PDF files are being indexed but nothing in the subfolders.

    If I start indexing at (root)/library.php, then everything in (root) gets indexed but only files in (root), no subfolders. I want to start in the /library/ subfolder and index everything within that folder and all subfolders.

    I embedded search.php in a zoom_search.php page with requisite header and footer includes. In addition to no results (other than the PDF files in /library/), I am getting the following error above <?php virtual("search.php"); ?> in zoom_search.php:

    Warning: Cannot modify header information - headers already sent by (output started at /home/wwwturf/public_html/library/zoom/zoom_search.php:9) in /home/wwwturf/public_html/library/zoom/search.php on line 99

    The index results indicate:
    Files indexed: 16 (the PDF files in the /library/ folder)
    Files skipped: 68
    URLs visited by spider: 146
    .php files found: 2

    Any help would be appreciated.

  • #2
    The header warning is due to your missing the step to check the "Disable charset enforcing..." option as instructed in this FAQ:
    http://www.wrensoft.com/zoom/support/faq_ssi.html

    As for the spidering questions, this FAQ should be of help:
    Q. I am indexing with spider mode but it is not finding all the pages on my web site

    Without seeing your website, it doesn't tell us much that "library.php ... links to /library/". This is because "/library/" may or may not contain a page which links to files, or it may have an automated directory listing generated by the server. Note that not all servers will list files within a directory when it is linked in such a manner.

    You need to check (or show us) if the page at "/library/" contain links to the subfolders you are expecting the spider to follow.

    Spider mode requires HTML links for it to find subfolders, and files, etc. as explained in the above FAQ. There is no way that it can "know" all the subfolders within a folder unless your server is actually returning a directory listing at the URL given. For this behaviour, you would need to use Offline Mode instead (which would not index PHP and other dynamically generated pages).
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      clarification

      Ray - I changed things around a bit and put an index.php file within the main folder that I want to index. Index.php contains links (to subfolders that actually contain the files to be indexed) within an html form like below:

      <form method="POST" action="index.php">
      <TABLE>
      <tr><td colspan="2">&nbsp;<b>2009</b></td></tr>
      <tr><TD></TD>
      <td><SELECT NAME="to">
      <OPTION value="0409/">April/May 2009
      <OPTION value="0209/">February/March 2009
      <OPTION value="0109/">January 2009

      </SELECT></TD><TD><input type="submit" name="form_submit" value="Go!"></TD></TR></TABLE></FORM>

      The form works navigationally, but when I start the indexing function in Zoom it tells me to check if the URL exists and satisfies... I have it pointed to /library/, in which the index.php file resides.

      If I remove the slash after /library the indexer goes to my root folder.

      ??

      Thanks.

      Comment


      • #4
        Well, that description makes it much clearer. Forms are spider unfriendly. A spider crawler works by following HTML links on a page. It cannot automatically submit every form it comes across and go through every combination of options possible with that form to find every possible input permutation. In most cases, it's simply impractical: it may be a search form for example, and you can't guess every word to go into the text box. Or worse yet, it might be a login/password form!

        The rest of the FAQ I previously linked to applies here (we'll probably update it to say that form-based navigation has a similar problem with JS based navigation).

        You'll need to have HTML anchor links through out your site if you want them to be indexed through Spider Mode from a single start point. If this isn't possible, consider specifying additional start points by clicking on the "More" button next to the Start spider URL (see the Users Guide for more information).
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          OK, thanks Ray. I'll have to rework things a bit to put hard links in the index page.

          Comment

          Working...
          X