PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Automatic Login to DNN Website - Some pages not indexed properly

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Automatic Login to DNN Website - Some pages not indexed properly

    Hello,

    We have a DNN website which automatically logs you in, based on your IP. So if you are from Europe, it will detect this from your IP and then log you into DNN as a "European" user. If you are from US, it will log you in as a generic "US" user.

    It also stores this information in a persistent cookie, so next time you visit the site it will log you in as the correct user (in case you chose to manually choose a different region).

    We're trying to get Zoom to index the site, in two different instances i.e. logged in as a "European" user, or logged in as a "US" user. This way, a European user will be able to search for European specific pages, and US users can search for US specific pages.

    To do the auto-login, we are trying to use the "manually login with IE and use its cookies" method. So, I navigate to the website in IE, change it to US user and then index with zoom. I then change it to Europe (so the cookie gets set to Europe) and then index again (I copy the .zdat files to appropriate US/Europe folders before re-indexing).

    Problem is, some pages are not being indexed properly. The log says it finds the correct URL, and downloads it, and indexes it, but then when we make a search for words on those certain pages, it cannot find anything. The strange thing is, every now and then (after re-indexing a few times), it will index it properly. And all search terms on that page will be found. Searching as a European user will find the correct European pages, and searching as a US user will find the correct US pages.

    Is there any reason why the spider crawler/indexer occassionally picks up everything on that page, and sometimes doesn't?

    Thanks in advance.

  • #2
    Check that you are not indexing from the cache. Under "Configure"->"Spider options" make sure to check "Reload all files (do not use cache)".

    There's a good chance that's the cause. Beyond that it's hard to say without knowing exactly how your login method is implemented. If you can change the way your website detects the different users, you may want to modify it so that it can identify Zoom via the User-Agent string. You can modify the User-Agent string if you have the Enterprise Edition ("Configure"->"Advanced"->"Spider User-Agent") . This way you can simply set one configuration to identify itself as an European spider, and one as an English spider (somewhat an amusing visual image there) and your website can log them in accordingly without any manual login procedure.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      I thought it may be reloading from cache as well, but "Reload all files (do not use cache)" is definitely checked.

      I've also tried clearing the cache manually, before indexing.

      I don't understand why sometimes it indexes the content on the page, and sometimes it doesn't. The log however always says that zoom has found, downloaded and indexed the page.

      Do you have any idea why it seems to index the page, but not the content on the page sometimes?

      Comment


      • #4
        My guess is that it's not logging in as you expect, and the server is not returning the page content you are expecting. But I can only guess without seeing anything.
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          Alternatively, do you know if it is possible to use the "Automatic login on following page(URL):" feature to let Zoom login to a standard DNN login page?

          I have used it previously on a classic ASP form, but will it work for this?

          Comment


          • #6
            We've had some users report difficulty with trying to automatically login to a DNN page that requires authentication. It turned out that some of them required an extra parameter to be specified, besides just the username and password. This is described in further detail here:
            http://www.wrensoft.com/forum/showthread.php?t=3530

            Due to the fact that these pages often have multiple submit buttons, we would thus need the user to specify the appropriate button for Zoom to "login". We currently do not have the option for this in the UI. As such, it would require more involved changes to the GUI and the software, and has been scheduled as a V7 feature.

            Having said that, if you know how your authentication is implemented, then you should verify whether this is necessary for your site. So long as the login process only require a username and password to be submitted, Zoom can do this.
            --Ray
            Wrensoft Web Software
            Sydney, Australia
            Zoom Search Engine

            Comment


            • #7
              Is there a work-around until version 7 comes out?
              Cheers,
              David

              Comment


              • #8
                I have not personally pryed into the latest version of DNN to see how their authentication method is implemented or what they allow in terms of configuring exceptions.

                My understanding is that they support installing different Authentication Providers so that you can use various methods. If there is one (or you can create one) which would allow you to specify exceptions (e.g. by identifying User-Agent or by IP address), then you can allow the ZoomSpider access to the content, and bypass the authentication method. Or, use an authentication provider which does not have the implementation issues mentioned above.
                --Ray
                Wrensoft Web Software
                Sydney, Australia
                Zoom Search Engine

                Comment


                • #9
                  Thanks for the reply. I'll work on it.

                  David
                  Sector Website Hosting
                  Last edited by David Morgan; Jul-06-2010, 12:22 AM.
                  Cheers,
                  David

                  Comment

                  Working...
                  X