PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Not spidering my Wordpress content

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Not spidering my Wordpress content

    Hi,

    I just installed Zoom 6 pro and successfully indexed my entire site except for one important area (my blog) which Zoom isn't indexing.

    The bulk of my site consists of static html pages organized in folders. Zoom finds and indexes these all (5000+ pages).

    http://www.johnkane.com/

    However, my blog runs on Wordpress and all pages are dynamically generated. The permalink structure doesn't even contain pages (with or without extensions), and end in slashes:

    http://www.johnkane.com/blog/
    http://www.johnkane.com/blog/2010-02-03-mushroom-macros/

    There are no files to scan for, only urls that return pages when called. The urls are specified on the root and subsequent urls.

    Zoom's log returns the error:

    X No files to spider from http://www.johnkane.com/blog/

    In a twist, it seems I can partially solve this by unchecking the "Enable robots.txt support" under Advanced spider mode options on the Spider options tab. This even though my robots.txt file (in the root of my domain) doesn't restrict the /blog/ folder.

    The problem of this approach (ignore robots.txt) is that zoom will crawl content in folders not referenced by my site, which I don't want. For instance, orphan or admin pages.

    Hoping I can reconfigure Zoom or Wordpress or both to resolve this. Any suggestions appreciated!

  • #2
    When I had a look at the HTML source code on the home page of your blog I found this,
    <meta name='robots' content='noindex,nofollow' />

    So this is the reason Zoom doesn't index the page. You asked all engines to skip the page. This information also appears in the Zoom log.

    "Not indexing content on page: http://www.johnkane.com/blog/ (meta robots "noindex" tag found)"

    Comment


    • #3
      You are correct(!) Thanks for the help.

      My blog section is new and the code is dynamically generated. The code:

      <!-- Theme Hook -->
      <meta name='robots' content='noindex,nofollow' />

      wasn't in the top level header.php file (which I customized) so I didn't notice that the theme inserts this code on final output. However, I can and will edit the underlying file.

      One tweak should fix this for the entire blog section of my site. Interestingly, this wasn't in a robots.txt file (where I was looking) but in a meta tag in my wordpress blog html (but the Zoom checkbox toggled it on and off).

      It also explains why Google wasn't seeing my new blog.

      Cheers,

      jk

      Comment


      • #4
        In fact even easier to fix.

        I found the meta name='robots' content='noindex,nofollow' code in the function noindex in the general-template.php file. It didn't say but seemed to suggest that this was controlled from the admin panel and sure enough there was a Wordpress privacy setting I hadn't seen.

        Privacy Settings / Blog Visibility

        - I would like my blog to be visible to everyone, including search engines (like Google, Bing, Technorati) and archivers

        - I would like to block search engines, but allow normal visitors

        I had the 2nd option checked (presume the default?); selecting the first options removed the offending meta tag.

        Thanks again,

        jk

        Comment

        Working...
        X