PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Settings that impact Performance

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Settings that impact Performance

    I have been a long time customer, but just recently upgraded from V5 to V8. I started an index of my website. 60 hours later, it is still running with over 10K urls in the queue. I was hoping to see performance improvements, but it is not happening.

    Thoughts on optimal configuration to get the software to run faster?

  • #2
    How many URLs were indexed with V5?

    There's been many changes between V5 and V8. My first guess would be that V8 is actually picking up alot more links that V5 did not, and is actually indexing more pages than before.

    So you should confirm if it's indexing the same number of pages or not, before assuming it's a performance issue.

    If, in fact, it is crawling alot more pages than before, then the question would be, is it crawling too many pages? Or are they actually necessary pages that were excluded from your index before?

    One feature that wasn't in V5 (or at least, drastically changed) is the "Parse for links in Javascript code" option (under "Configure"->"Spider options"). It is quite possible this is now picking up alot more URLs than it did before. If you want to quickly test this, you can disable this option and try again.

    "60 hours with 10K urls in the queue" would to me, sound like you have a dynamically generated site which has URLs that may go off infinitely (e.g. a calendar module with "Next/previous month" links that will keep going backwards and forwards indefinitely). You need to make sure such links are not followed (use the "Skip options") and/or it may have just been avoided in the past versions because it wasn't as good at picking up every possible link on the site.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      It appears this version of the software obeys the crawl-delay value in the robots.txt file. I need to write a special rule for internal search. What is the official name the v8 spider to control via robots.txt

      Comment


      • #4
        Originally posted by jcocking View Post
        It appears this version of the software obeys the crawl-delay value in the robots.txt file. I need to write a special rule for internal search. What is the official name the v8 spider to control via robots.txt
        You can disable robots.txt support by unchecking the option under Configure->Spider options->"Enable 'robots.txt' support".

        Otherwise, you can identify the spider by its User-Agent string: "ZoomSpider - zoomsearchengine.com [ZSEBOT]".

        If you have the Enterprise Edition, you can change its user-agent string under "Configure"->"Advanced"
        --Ray
        Wrensoft Web Software
        Sydney, Australia
        Zoom Search Engine

        Comment


        • #5
          After getting all the configuration files correct and the robots.txt file issue resolved, the performance of V8 is fantastic.

          Server Specs: AWS EC2 t2.medium server with 2 vCPU and 4 GB of memory with Windows Server 2012
          • Files Indexed: 46,828
          • URLs Visited: 51,828
          • Files Skipped:345,456
          • Total Words Found: 29,486,347
          • Average Words per page: 683
          • Threads: 10
          • Total bytes scanned/downloaded: 1.30GB
          • Elapsed Time: 1 hour 20 minutes and 18 seconds

          The total monthly cost of the server to index my site weekly will be less than $1.00 per month.
          • Storage: $0.20
          • Indexing: 4 weeks @ 3 hours per week @ $0.0644 = $0.78
          I did not want to leave a post about performance without mentioning the end results.

          jeff

          Comment


          • #6
            Thanks for the update.

            Comment

            Working...
            X