PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

duplicate results -- any way to weed them out?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • duplicate results -- any way to weed them out?

    Hi,

    Apologies if this has been asked before, but I couldn't find it discussed... We ran a test search and found that the same page shows up repeatedly in the search results, something like this:

    About Our Company
    Blah blah blah blah...
    URL: http://www.oursite.com/about/index.html

    About Our Company
    Blah blah blah blah...
    URL: http://www.oursite.com/about/

    About Our Company
    Blah blah blah blah...
    URL: http://www.oursite.com/About/

    About Our Company
    Blah blah blah blah...
    URL: http://www.oursite.com/About/index.html

    Our tech guru says that I need to go through the entire site and make sure every link is exactly the same -- that the duplicates are happening because sometimes we link directly to the index page and sometimes just point to the folder, and because the case is sometimes upper and lower and sometimes just lower.

    Before I do that (and it's not going to be easy to maintain this level of consistency in the future!), I'm wondering if there's a way to change the settings in this search engine to make it case-insensitive, and also to interpret a link to a folder and a link to the index page in that folder as the same thing. Am I making sense?

  • #2
    Your tech guru is correct. Linux and Unix machines are case sensitive and so this means URLs are also case sensitive.

    On most web hsots. /About/ and /about/ are different folders.

    it's not going to be easy to maintain this level of consistency in the future
    Yes, if you are coding pages by hand, you need to be careful. If you are using a tool like Dreamweaver, then your links should match the file name every time without much effort.

    Or you can do what a lot of web desigers do. Make a rule that every file and every link must have lower case names.

    What you can do in Zoom is turn on CRC-32 checksum duplicate page checking (from the config window / scan options tab). This should solve some or all of your problem but will lead to slower indexing than fixing the root cause of the problem.

    -------
    David

    Comment

    Working...
    X