PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Indexing Filenames Only With Spaces

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indexing Filenames Only With Spaces

    I have just recently upgraded to v6 and am loving it. I have found a lot of ways to setup my local site that will be so much more efficient. Below is my configuration on my problem.

    I have a local site with a large number of .xls files (thousands) that change and are moved around frequently. In the past I have been able to index these slowly using the .xls plugin but I have not been able to index them all because of the frequency of change I need to reindex nightly.

    With v6 I am able to index the filenames only which is unbelievably fast but it has caused a new problem. The filenames have spaces in them which are being translated into %20 in the index. Needless to say all of my searches are failing because of this.

    Is there a configuration or setting somewhere that I have missed that will allow it to index the space instead of the %20?? I currently have it setup for CGI/Win32 but could change this if necessary.

    Thank you in advance for any help or suggestions!!

  • #2
    You can't have a space in a URL. Spaces are always replaced by %20, which is a representation of a space character. But from the indexing point of view if you have multiple words in a file name, then should be searchable.

    We'll test this out and post our findings.

    Comment


    • #3
      Below is what I am getting for results when I search for the work 'torque'. if I search for team torque I get zero results.

      Please let me know what if any information I can provide to help figure this out.

      1. file:///Q:/Softcopy_System/Archive/2...UE%20INC.1.xls
      Terms matched: 1 - 7 Apr 2008 - URL: file:///Q:/Softcopy_System/Archive/2008/April/032408.TEAM%20TORQUE%20INC.4/032408.TEAM%20TORQUE%20INC.1.xls

      2. file:///Q:/Softcopy_System/Archive/2...UE%20INC.2.xls
      Terms matched: 1 - 26 Mar 2008 - URL: file:///Q:/Softcopy_System/Archive/2008/April/032408.TEAM%20TORQUE%20INC.4/032408.TEAM%20TORQUE%20INC.2.xls

      3. file:///Q:/Softcopy_System/Archive/2...UE%20INC.3.xls
      Terms matched: 1 - 26 Mar 2008 - URL: file:///Q:/Softcopy_System/Archive/2008/April/032408.TEAM%20TORQUE%20INC.4/032408.TEAM%20TORQUE%20INC.3.xls

      Comment


      • #4
        You should be able to get what you want with better configuration settings.

        I presume you have added ".mp3" with a file type of "Binary (Filename only)" on the "Scan options" panel.

        Try this:

        (1) Go to "Configure"->"Results Layout" in Zoom. Check the option for "Title of page". Your search result titles will appear in a tidier manner with just the filename (i.e. "032408.TEAM TORQUE INC.1.xls"), rather than the full URL with %20's in place.

        (2) The reason that "team torque" does not appear is not because of the %20's. It is in fact because you must have "dots" enabled for joining words such that "team" is not indexed as a single word, but rather "032408.team" is. You can change this behaviour by clicking on "Configure"->"Indexing options" and clearing the checkbox for "Dots".

        You will need to re-index for these changes to take effect.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          That was exactly it!!

          Thank you so much for your help. With version 6 it now takes something that was running 12+ hours and I can now do it in 5 minutes!!

          Comment

          Working...
          X