PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

[ELCOLOMBIANO] - Indexing huge web site.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [ELCOLOMBIANO] - Indexing huge web site.

    We have wrensoft zoom search enterprise Order number: WS73NF3137 Date: 22/Feb/2013

    Last Friday we have run a full index with at most 63 start points, each of them for at most one month of links (in the format http://www.elcolombiano.com/BancoMedios/zoom/sitemap-2008-01.html, that is, one HTML file per month since 2008 up to 2013, some files have data, others don't).

    Inside each HTML file of data there is a bunch of links to pages using the classic ASP. In this configuration we have an estimate of roughly 116455 links to index. I have set the Max Files To index set to 65500. I then was expecting that if having the Enterprise Edition, this limit would not be considered, that was my reasoning, but not, today I saw that that limit was reached and the indexing did not successfully completed by stalling at the 48/63 start point, thus, can you help us on this as we have a huge ASP site we need to index to have it set up for production? That is, we need to make a full index scan for about 63 months (one start point per month) with 116455 links/pages and counting up, then Max. Files to index is limiting all this indexing. NOTE: Your documentation says to use CGI, which is not possible in our situation as it is ASP files.

  • #2
    The documentation that suggests using the CGI, is for our script only. It isn't a suggestion for your source pages. The scripting choice for the search function and the rest of the web site is independent.

    Checkout these benchmarks,
    http://www.wrensoft.com/zoom/benchmarks.html

    For 60,000 pages the ASP search times are already in the 5 sec range. And you want to double this.
    For the CGI option the search times on the same data are, 0.1 seconds. So the CGI option is 50 times faster and will also used less RAM, less CPU time and do less disk I/O.

    It is 50 times faster as ASP is maybe the most inefficient programming language ever invented (and has been discontinued in any case due to this problem and others).

    There is also the ASP.NET option that we offer. The speeds and capacity are similar to the CGI.

    Comment


    • #3
      No. It was not the answer I was hoping.
      What I need is a way to full index of a huge bunch of ASP pages (more or less 116566 and more counting) and bypass the Max. File limit of 65500 when trying to index 116566 in one session (I programmed 63 start points as I said before, to be indexed in one run or one session. That is, we need more explanation in the Max Files to index regarding the one session run? Or whatsoever? Bear in mind we have the Enterprise Edition, or else, how do you accomplish ASP indexing for more than 65500 limit (Max Files to index)?
      Last edited by carlosor; Mar-11-2013, 10:32 PM. Reason: Additional information

      Comment


      • #4
        What I need is a way to full index of a huge bunch of ASP pages
        I understand your request.

        My first point was that doing this (if there wasn't the limit) would most likely result in a really really bad user experience. 10+ sec search times, or even more in the case of overlapped searches. What hardware are you using on the server?

        My second point is that is doesn't matter what scripting language your pages are in. It is pretty much irrelevant. You can use the ASP.NET search option regardless of if the rest of your site is in ASP, PHP, CF, ASPX or something else.

        The reason for the limit (besides the awful performance of ASP) is that we allocate just 2 bytes as a page index (~65,000 pages). This is done to keep the index small, and search times down. In the more powerful ASP.NET and CGI options there is a 4 byte index.

        Comment


        • #5
          Carlos, please note there is a difference between:

          (1) Indexing ASP pages

          and

          (2) Selecting the "ASP" platform in the Zoom Indexer and providing a search page/script in ASP.

          As noted above, you can actually index ASP pages (these are the pages that make up your website), and select the "CGI" or "ASP.NET" platform in Zoom Indexer, and provide a search function in CGI or ASP.NET.
          --Ray
          Wrensoft Web Software
          Sydney, Australia
          Zoom Search Engine

          Comment


          • #6
            Thank you. Now I see the difference.
            Well then, my next step is as follow and some guidance is required herein:

            I will index the ASP pages that comprises the EL COLOMBIANO web site using several start points per month having links to one ASP page but the indexing is to use one of the CGI or ASP.NET option but I need to know how to integrate CGI or ASP.NET inside an ASP page which will be my landing page for the search engine.

            Comment


            • #7
              Please see this FAQ, in particular the section bolded "To embed search.cgi within an ASP file":
              http://www.wrensoft.com/zoom/support...i.html#ssi_cgi

              But first, make sure you can get the CGI version working (without the ASP page). This should be a fully working search page when you access "search.cgi" from your browser, before you attempt to embed the CGI within an ASP search page.

              You will find details on running the CGI on your IIS server here:
              http://www.wrensoft.com/zoom/support/faq_cgi_iis.html
              --Ray
              Wrensoft Web Software
              Sydney, Australia
              Zoom Search Engine

              Comment

              Working...
              X