PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Protocol agnostic external links

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Protocol agnostic external links

    We have not long upgraded to v7.1 and I have noticed in our logs that zoom isn't handling protocol agnostic links correctly, it is possible that this was also true on v6 (which we have used for years) but we never noticed.

    The problem occurs when zoom comes across a link like <a href="//www.example.com/index.htm">external link</a>; instead of treating this as an external link it is using it as an internal link and ignoring the // at the start. So zoom tries to GET /siteroot/directory/www.example.com/index.htm which gives a 404 error of course.

    Is there some setting which deals with this issue? Or is it a bug?

  • #2
    We have never noticed the issue either.

    There are some good arguments for not using protocol relative URLs but that doesn't mean Zoom should ignore them.

    We'll have a deeper look at it next week.

    Comment


    • #3
      We've investigated this and confirmed that it's a problem introduced in V7.1 build 1020 to 1021.

      The problem does not affect the links being considered as part of the crawl (Zoom is still correctly applying the protocol relative URLs for the links), but it does attempt additional HTTP queries on these URLs. So the only downside is time and some extra 404 queries on your server.

      It is related to "improvements" made to the "Parse for links in Javascript" feature, which finds URLs that are not in the form of hypertext links.

      So a workaround until the next build would be to disable this feature (under "Configure"->"Spider options"). You should find that it will stop looking for these URLs once disabled.

      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        Thanks for the prompt response, we can work around it for now that we know what is happening.

        Comment


        • #5
          We just reran the indexing with the Parse for links in Javascript feature turned off and it also solved another issue which I thought was unrelated.

          The indexing process was submitting forms from the website, the only link to the submission page is from the the action parameter on the form so we were surprised to see forms being submitted, I was going to raise it as another question but it appears that it may be caused by the same feature as the protocol agnostic links issue.

          Comment

          Working...
          X