PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Possible issue with base URL in Spider URL dialog

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Possible issue with base URL in Spider URL dialog

    Spider mode indexing for ASP.NET platform

    Start options:
    Spider URL: https://myApacheBugzillaServer.com/
    Base URL: https://myApacheBugzillaServer.com/

    Settings: All defaults, including: Index page and follow internal*links (Default)
    Indexing Options: All except Dublin core and Parm tag values.
    Synonyms: bug, bugzilla, etc.

    Search pattern: Bugzilla

    Results: A link to a file that doesn't exist: https://myApacheBugzillaServer.com/index.html
    It should be either:
    https://myApacheBugzillaServer.com/
    or
    https://myApacheBugzillaServer.com/index.cgi

    Note: Have not tried starting with Spider URL: https://myApacheBugzillaServer.com/index.cqi
    Richard

  • #2
    Zoom doesn't do anything that would append "index.html" to a URL. Zoom is just following links it found on your site.

    So if you are getting a URL with "index.html" added that you believe you shouldn't -- then it's very likely it's either coming from:
    a) One of the pages on your site linking to such a URL
    b) Your site's URL Rewrite or redirection rules
    c) Javascript within your pages may have caused Zoom to believe there is a possible link to "index.html" and it crawls the link (and then your server would likely respond because most sites are configured to treat "index.html" the same as "index.cgi", etc.) You can avoid this by disabling "Parse for links in Javascript" under "Configure"->"Spidering options".

    You should be able to look at the spider crawling history to figure out where this URL is coming from. Easier if you reduce the number of threads to 1 and save your log to disk and view from text editor ("Configure"->"Index log"->"Save index log to file").
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment

    Working...
    X