PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Search terms within "x" words of one another

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Search terms within "x" words of one another

    Is there a way to index and require the search terms to be within "x" number of words of each other?

    example... search terms dog, cat, and flea The search is set for all words.
    I am looking for "dog or cat in relation to fleas"

    The results pages will display and highlight all standalone instances of the three search terms.
    Much of the results will be totally irrelevant. ie the search term dog could be highlighted in a sentence related to burying bones or a dog act in a circus or it "might" be related to fleas which is what I was looking for.

    Is there a way to have the search terms only report the search terms found within "X" numbers of words of each other.??.
    example- the sentence '.... small dogs have less chance of fleas if they live indoors. The new neighbors in the house next door has just adopted a new cat and ...x.... etc"

    Here if I could set the search terms dog, cat, and flea, to be within 10 words of each other, I would get the search terms dog and fleas highlighted while not highlighting the search term cat, as it has more than 10 words between it and another search term, and best of all, I will have found the result narrowed to my search criteria ..

    Thanks,
    Anne

  • #2
    The ranking system includes a weighting for proximity, and for exact phrase searches, the words obviously need to appear next to each other.

    But there is no way to specify X words must occur between the search terms.

    We did some research years back about how people do searches. Nobody uses this type of advanced search syntax. The vast majority of people can't even be bothered to do a two word search. So it doesn't make sense to expend development effort on a feature no one uses.

    Comment


    • #3
      [QUOTE=wrensoft;33173]The ranking system includes a weighting for proximity, and for exact phrase searches, the words obviously need to appear next to each other.

      But there is no way to specify X words must occur between the search terms.

      I understand about development time, thanks... Do you happen to know of a search engine that will do what I want ?
      I really like your search engine ,but for use in some of my other applications, so I will keep the program,
      HOWEVER,, I do need a search engine that will search as I explained because of the application.

      Exact phrase searching is too restrictive, and multiple search words result in massive irrelevant data because of the format of the original data (constant references to other search terms in the text, but not available on the indexed page)

      example.... say I am looking for a 'rockola barrel band'.
      If I use these search terms I will get every instance of 'rockola barrel band', every instance of 'rockola', every instance of 'rockola barrel', every instance of "barrel band',every instance of 'band', and every instance of each of the three words as standalone search terms,
      as well as every instance of 'barrel', 'barrel band', and 'band', as it relates to the 15 other manufacturers of barrel bands, other than rockola. That amounts to massive irrelevant data being displayed.

      I tried your weighting system and found no difference in the results... maybe I am doing it wrong.. I will try again.

      BTW, have you found why I can not spider my website and have to do all this locally and then upload? I have had no response from my last email that had the cfg file and other data attached...

      Anne

      Comment


      • #4
        If I use these search terms I will get every instance of 'rockola barrel band', every instance of 'rockola',
        No, this is not the case. The default behavior is to do a boolean AND on all the search words.
        So you only get pages that contain all 3 terms, ....by default.

        ...have you found why I can not spider my website
        I am not aware of the issue, but maybe someone else is looking at it. I can check on Monday.

        Comment


        • #5
          Originally posted by Annie Sixgun View Post
          BTW, have you found why I can not spider my website and have to do all this locally and then upload? I have had no response from my last email that had the cfg file and other data attached...
          I replied to your email on the 17th of June. It seems like our emails to you keep getting lost.

          Please check your spam box for our missing emails.

          This is the 3rd time this has happened and the 2nd time I've addressed this.

          I mentioned this in your previous forum thread here too:
          http://www.wrensoft.com/forum/showth...-sub-directory

          I also gave you an explanation within that forum thread that was never addressed. Namely, you needed to add the ".html" extension.

          I will quote my most recent email to you below:

          Subject: Re: INDEXING problems
          Date: Fri, 17 Jun 2016 18:30:04 +1000


          Hi Anne,

          If I understand you correctly, the log file and .zcfg file you are sending me is from your previous attempts using Spider Mode. But you now say you have switched over to indexing the data locally (presuming you mean Offline Mode)? Please elaborate when you say it "doesn't function as needed".

          As for your spider indexing problems:

          The log file confirms the problems that I have described in my previous emails.

          1) You only have ".pdf" file extension in your scan list. This won't allow the Spider to crawl .html pages to find the necessary links to the other .pdf files. You need to add ".html" file extension.

          2) You have entered this URL as your spider URL:
          http://yoursite.com/cnl/pdfa/

          The error in the log says:
          16:31:30 - [WARNING] Could not download file: http://yoursite.com/cnl/pdfa/ (Forbidden)

          This means authentication must've been enabled and it was requiring a password.

          I go to this address on your web server now, and I get a 404 file not found error.

          I also get 404 at http://yoursite.com/

          Again, you are changing your website rapidly, so it is hard to debug your problem for you.

          When Zoom reported Forbidden, it is likely your website (or that folder) had authentication enabled. Note that if you have already entered your credentials, then you may not be seeing it in your browser since your browser has authenticated.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            I have sent you multiple emails since the email of the 17th you quoted was sent. I have also sent you logs, config files etc since the email of th3 17th you keep referring to as not being addressed.. It HAS and I sent an email with the results.. Other emails with information you have asked for have been sent.. SO, if you have sent a reply to any of the 'other' emails then yes I guess they are getting lost.... however you keep making reference to this email of the 17th, so I guess either you did not get my emails or they are getting lost at your end !

            SO, right now I index on the local machine and manually upload, and BTW, I get no errors of any kind locally, however I get loads of errors about content being password protected etc.... They are the same files, just on the server.

            I also told you that by adding an html path to the search engine I also got the html files in the search results. I also asked why I can not as described, enter the location of the directory containing the pdf files, instead of starting from the base domain. And other questions.......on why things do not work as described. I have progressed so much further than your email of the 17th, and the emails I sent describe a whole new set of problems other than the 'html' problem which was addressed, and caused another problem, which was sent.


            I know that at 78 I do forget things once in a while but I have answered ALL emails I have received, and sent many questions to you, that have gone unanswered.... I do not appreciate the inference that I 'lost' your emails or have not acted on them.






            Originally posted by Ray View Post
            I replied to your email on the 17th of June. It seems like our emails to you keep getting lost.

            Please check your spam box for our missing emails.

            This is the 3rd time this has happened and the 2nd time I've addressed this.

            I mentioned this in your previous forum thread here too:
            http://www.wrensoft.com/forum/showth...-sub-directory

            I also gave you an explanation within that forum thread that was never addressed. Namely, you needed to add the ".html" extension.

            I will quote my most recent email to you below:

            Comment


            • #7
              Originally posted by Annie Sixgun View Post
              I have sent you multiple emails since the email of the 17th you quoted was sent.
              We have not received any emails from you since the 17th of June. So something is wrong there. We haven't had any reported email issues on our end. But let's focus on the problem at hand.

              It would be easiest to continue the dialogue here in the forum. Since I haven't received anything from you after 17th of June, you may need to reiterate anything that was said in those lost emails.

              Originally posted by Annie Sixgun View Post
              I also told you that by adding an html path to the search engine I also got the html files in the search results.
              That's good news.

              Originally posted by Annie Sixgun View Post
              I have progressed so much further than your email of the 17th, and the emails I sent describe a whole new set of problems other than the 'html' problem which was addressed, and caused another problem, which was sent.
              Feel free to copy and paste your previous outstanding questions here, and we'll address them.

              Originally posted by Annie Sixgun View Post
              SO, right now I index on the local machine and manually upload, and BTW, I get no errors of any kind locally, however I get loads of errors about content being password protected etc.... They are the same files, just on the server.
              If these are PDF files and the error is seen in the Index Log, then it means the PDF file has been created with a special setting that do not allow the content to be extracted unless you specify a password.

              To address this, you can either:

              (a) Specify the password necessary for these PDF files, under "Configure"->"Scan options"->Select the '.pdf' extension and click "Configure" then check "Use following password to decrypt and index protected PDF files" and enter your password.

              (b) Re-save those PDF files in Adobe Acrobat with the protection feature disabled.

              Originally posted by Annie Sixgun View Post
              I know that at 78 I do forget things once in a while but I have answered ALL emails I have received, and sent many questions to you, that have gone unanswered.... I do not appreciate the inference that I 'lost' your emails or have not acted on them.
              I apologize for the unintentional inference. My persistence on the issue of lost emails is not implying your fault -- emails are usually lost by technical issues -- by spam filters that do not work accurately (they are far from perfect and all are expected to fail regularly) -- and by network issues and email servers. I simply meant we need to take a closer look at all the possible points of failure by the technology.

              On that note -- let's work around those technical difficulties, and if you can repeat any of your unresolved questions here, I will address them one by one.
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment

              Working...
              X