PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

"No files found to spider" problem

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • "No files found to spider" problem

    Hello folks,

    I'm trying to spider a group of *.pdf files residing at www.hortinfo.co.nz/climate/ - Zoom 6 tells me that there are no files to be found at that URL - but they are there!

    I've installed the pdf plugin. I have successfully spidered those files on my development (local) machine which contains an exact duplicate of the production machine. I don't believe I'm having a filewall problem as I can spider files at the root level of the production site.

    What am I doing wrong?

    Thanks/Bruce

  • #2
    I went to that site but just got this error, which probably explains your problem.

    Forbidden

    You don't have permission to access /climate/ on this server.

    Comment


    • #3
      Listing folder contents

      Thanks for the response.

      Do I take it now that folders within a site must be set up to display contents in order for Zoom Search to do its job?

      The response you report is - as far as I am aware - simply a security option that stops "the public" getting a list of the contents of a web folder - a fairly typical security set up.

      Cheers/Bruce

      Comment


      • #4
        If the files can not be accessed on the web site, then they can not be indexed.

        If you are using spider mode then files need to be linked to in order for them to be found.

        See also these FAQ,
        Q. Why are some of my pages being skipped by the indexer?

        Q. Why are links in my Javascript menus being skipped?

        Q. I am indexing with spider mode but it is not finding all the pages on my web site

        Comment


        • #5
          Originally posted by thomasBaine View Post
          Do I take it now that folders within a site must be set up to display contents in order for Zoom Search to do its job?
          No, if you use Offline Mode, you can index the files in a local folder and you would not need a directory listing from your web server to be made available.

          Originally posted by thomasBaine View Post
          The response you report is - as far as I am aware - simply a security option that stops "the public" getting a list of the contents of a web folder - a fairly typical security set up.
          No, a fairly typical setup would have these documents well-linked from various parts of the web site which is accessible by the user, authenticated or otherwise.

          If you are using Spider Mode, you are indexing files served by your web server. A spider can only access whatever your web server makes available. If there is a page which links to the files within these folders, then you can give the spider this page as a start point for it to follow the links.

          Otherwise, assuming you don't have any dynamically generated pages (e.g. PHP or ASP pages), you can use Offline Mode which is not dependent on your web server.

          Please see section 2.1 in the Users Guide to better understand the difference between Spider Mode and Offline Mode indexing.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            Thanks for the illumination

            Thanks both for your responses.

            My lack of clarity arose from the fact that I could index the collection of unlinked pdf files in the folder on my development machine using Spider mode - and the folder permissions on that site were exactly the same as those on the production site. I got my "workaround" by transferring Zoom's index files from the development machine to the production machine and everything works as I had expected.

            Cheers/Bruce

            Comment


            • #7
              If you were using the same/similar URL to the folder, then it is most likely your production server had "Directory browsing" enabled for that folder, and your live server didn't.

              If the URL to the development server is different to the production server, you may want to make sure the links in your search results point to your production server and not to the development server. If this is the case, you should use the "Rewrite links" feature (found under "Configure"->"Indexing options"). Click "Help" on that panel for examples and more information on exactly this usage scenario.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment

              Working...
              X