PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

fuzzy search?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fuzzy search?

    I see some search engine thath have the functionality of

    "Fuzzy search - the program's search algorithm can look for documents that match the given words and some variation around them "

    so i can try to search "animated"
    but the document contains "anomated"

    or with the name is fantastic... try to search in PDF "andrea" and some write "andre"

    and works!

    beutiful is possible with some expression in zoom serach?!

    Thanks

  • #2
    Why would anyone want to get results for anomated when they are searching for animated? Spelling suggestions in Zoom might suggest "animated" however, depending on what words are on your site.

    The 2nd example you describe can either be solved with either wild card searches, andre* or with the automatic stemming, depending on the word being searched for.

    Comment


    • #3
      Zoom provides a number of ways in which words and variations of words are either automatically matched (in the case of stemming) or suggested as alternative searches (spelling suggestions).

      So yes, by that definition, Zoom does "fuzzy searches".

      The exact behaviour of each query will depend on the words and the variation in question. There is no magical way to always guess the right word when a word is misspelt. For example, there's no way to know that someone typing "Vat" actually meant "Cat" even though it's just one character away. Spelling suggestion may suggest one over another because it occurs more often in the index.
      --Ray
      Wrensoft Web Software
      Sydney, Australia
      Zoom Search Engine

      Comment


      • #4
        The fuzzy can help to Correct a problem...

        When i Do an' OCR of the file typical is not 100% exact... So sometimes the same word is not the same that the PDF layer of the text.

        In this case I have 3 PDF files that have the same information like "michael" "miclael" "micael" and for me is exact if the seacrh engine find 3 documents and not to suggest me another search.

        I must write "michael" "miclael" "micael" in the search box and i don't now what is the wrong char, so i can use a wildcar like ? or * if it possible to include the suggest word in a flag is for me a fuzzy search.

        In the fuzzy search i can define also the n. of char es: 1,2 or more that in this case is the error of the word in the OCR process.

        thanks

        p.s. yes i have a problem to explain in english..

        Comment


        • #5
          Before I made the excellent decision to move our search engine to Wrensoft I used htdig, an open source package. That used fuzzy logic search and it was really good.

          Bob
          Robert Isaac
          Volvo Owners Club

          Comment


          • #6
            Yes, I can see the need in some niche cases.

            My understanding is that Htdig used soundex or metaphone key database to implement fuzzy search. This is pretty much the same as how our spelling suggestions work.

            The difference of course is that we only suggest our suggestions, rather than doing an automatic search on them.

            Even the HtDig documentation states that you are going to get "weird" matches with the soundex fuzzy search algorithm. And by "weird", they mean "wrong".

            The most accurate searching is going to be obtained by having the OCR job being as accurate as possible.

            Comment


            • #7
              this is crazy!!

              "The most accurate searching is going to be obtained by having the OCR job being as accurate as possible. "

              You have any idea to how much is difficoult to correct an ocr process!? for milion of documents?! i think one year with 10 person!?

              The solution is fuzzy search in the engine... try to understand the problem in some cases like justice or legal document. in this document is often and realistic case that the word is write not in correct mode... But we wont to find the information or the document. So the precision of the result in this case is a limit...


              Thanks Bye

              Comment


              • #8
                I think you will understand more once you actually use a "fuzzy" search engine that does what you think you want it to do, and find that you get really bad search results.

                "Fuzzy search" is just a vague name for any attempt at matching an approximate word. It is not an actual, specific, magical process.

                Zoom provides fuzzy searching in the form of stemming (supporting derivatives of words), synonyms, spelling suggestions, and wildcard patterns.

                Firstrebel above mentioned htdig's fuzzy search logic. Looking them up, it says they provide soundex, metaphone, accents, endings, and synonyms for this search logic. Zoom actually employs all of these algorithms already (and more) in the above features we mentioned. We actually already do the same stuff. But they don't do what tommyk is asking for in his OP because the words he expects to match are not considered similar based on the soundex/metaphone alogrithm (which is phonetically based - and "animated" does not sound like "anomated").

                In most practical cases, approximate matching (as described in the OP) only makes sense for providing spelling suggestions and NOT actual matches. If words like "cot" will match "coat" and "cost" because they are close in "edit distance" (as in your idea of "andrea" and "andre"), then your search results will be incredibly cluttered and it will be really hard to find what you are actually looking for. Far too many words are similar in appearance or editing, but have completely unrelated meanings. You will be swamping your results with irrelevant hits. It will be the opposite of an effective search engine that return relevant results.
                --Ray
                Wrensoft Web Software
                Sydney, Australia
                Zoom Search Engine

                Comment


                • #9
                  Remember also that htdig and that sort of software technology was back in 2003/4 and things have come on quite a bit now. It was good at the time, and the purpose we put it to, but it would not work well now. Search returns with Zoom are very accurate provided things are set up properly. It is all in the preparation, just like a good meal.

                  Bob
                  Robert Isaac
                  Volvo Owners Club

                  Comment

                  Working...
                  X