PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Please introduce canonical link support!

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Please introduce canonical link support!

    I know there's CRC but this really doesnt work for much, unless you exclude every part of dynamic content which in some cases is negative.

    Example: /shoes/casual/sandals.html and /shoes/summer/sandals.html are the same. But CRC cannot exclude one or the other, because all the images and links to products on those pages are dynamically generated, linking to /shoes/casual/sandals/cork-soled-sandles.html and /shoes/summer/sandals/cork-soled-sandles.html

    This means that unless I exclude those vital H1 product link tags with ZOOMSTOP/ZOOMRESTART (which I dont want to do as this makes up a lot of page content % on an ecommerce store, helping give zoom relevant content to spider), these pages will always duplicate in the search; but I don't want them to.

    /shoes/casual/sandals.html and /shoes/summer/sandals.html both have canonical tags to /sandles.html

    /shoes/casual/sandals/cork-soled-sandles.html and /shoes/summer/sandals/cork-soled-sandles.html both have canonical tags to /cork-soled-sandles.html

    So with a basic 'respect canonical links' function, it would only index /sandals.html and /cork-soled-sandles.html

    Instead, my search result will currently be
    /cork-soled-sandles.html
    /shoes/casual/sandals/cork-soled-sandles.html
    /shoes/summer/sandals/cork-soled-sandles.html
    /sandals.html
    /shoes/casual/sandals.html
    /shoes/summer/sandals.html

    - six results when there should only be two.

    I understand there *is* a way for me to stop this, my 'zoomstop'-ing pretty much everything, but this will ruin the natural weighting of the pages as my weighting is very content-based.

    I strongly urge you to consider this request seriously, it would not be a difficult function to implement, all it's going to do is replicate the CRC function but on a much more basic level, looking for a canonical tag, checking it's not a match for the current URL, and dealing with it accordingly (skip the page or index the canonical page if it's not already found in the index).

    I really hope you can do something with this idea.

    Many thanks,
    Jack

  • #2
    Hi Jack,

    Which version and build of Zoom are you using? For some time now, Zoom only calculates the CRC based on the content of the page, excluding HTML tags and links. This means if the only difference on a page content is the links, then it would not impair the CRC ability to determine the page is the same.

    However, chances are, there's more than that difference. It's common now for pages to have shopping carts, calendars, login information, etc. so it could be different in other ways.

    We agree that the <link rel="canonical" href="..." /> tag would be beneficial in this regard, and we're planning to add support for this in a future build. However, it won't be the immediate future as we're currently working on higher priority tasks. But we'll try to get it in there as soon as our development schedule allows.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Hi Ray,

      Thanks for that - I am using build 1024 with CRC enabled.

      The only things that change on our pages are the URLs, breadcrumbs, adverts and non-visible HTML / JS. But as far as I can see, any visible differences like the breadcrumbs etc are all omitted by use of zoomstop / zoomrestart.

      Kind regards,
      Jack
      Last edited by LOL_Jack; Jun-01-2016, 10:28 AM.

      Comment

      Working...
      X