PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Help – Links to Documents in PHP Web Page Not Indexed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help – Links to Documents in PHP Web Page Not Indexed

    Details:
    Intranet - All files referred to are on an intranet
    Zoom Pro v4.2 build 1000
    Indexing – Spider mode
    Plugins – Successfully downloaded and installation confirmed (pdf, doc, xls and ppt)
    Configuration – file types added to Scan Options

    I am having trouble with my Zoom installation and a FileName.php web page.

    My IT Support personnel have built a web page that scans a directory of documents and lists them as links. Indexing produces successful searches for text in the name of the hypertext link, but will not follow the hypertext link to index any of the files in the directory.

    I need to continue with the dynamic web page in order for non-technical staff to publish their content without having to edit html (EG: Put or take files from this intranet folder).

    I understand from posts in the forum that Zoom indexing should be able to follow those links on the dynamic php webpage to the documents.

    The index log shows that other documents, linked by hard coded hypertext links are properly processed.

    I have reached the limit of my expertise and options. Provided below is a copy of a simplified version of the php web page for review.

    Code:
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
    <html>
    <head>
    <?php
        $dir = opendir&#40;'../Documents/Advisories/avian_flu'&#41;;
     
        $files = array&#40;&#41;;
     
        while &#40;$file = readdir&#40;$dir&#41;&#41; &#123;
            // If file name begins with a period, skip it
            if &#40;preg_match&#40;"/^\./", $file&#41;&#41; &#123;
                continue;
            &#125;
     
            // If file is a directory, skip it
            if &#40;is_dir&#40;$file&#41;&#41; &#123;
                continue;
            &#125;
     
            // Show only doc or pdf or xls or txt
            if &#40;!preg_match&#40;"/\.&#40;doc|pdf|xls|txt&#41;$/i", $file&#41;&#41; &#123;
                continue;
            &#125;
     
            array_push&#40;$files, $file&#41;;
        &#125;
     
        sort&#40;$files&#41;;
    ?>
    
    </head>
     
     <table border="0" cellpadding="0" cellspacing="0" width="95%">
       <tr>
         <td>
             
    
    
    <span class="subhead1"><font size=+1>Avian Flu</font></span>
    [list=1]
        <?php foreach &#40;$files as $file&#41;&#58; ?>[*][url="/Documents/Advisories/avian_flu/<?= $file; ?>"]<?= preg_replace&#40;"/\.\w+?$/", "", $file&#41;; ?>[/url]
        <?php endforeach; ?>[/list]
         </td>
       </tr>
    
     </table>
     </table>
     
     </td>
    </tr>
    </table>
    
    
    </div>
    </html>

  • #2
    Yes, Zoom will follow valid HTML links that link within your site. It doesn't matter if the link was made using a PHP script or not when using Spider mode.

    To start with turn on verbose mode in Zoom. Then index your site. You'll probably see that Zoom skips your links and gives a reason why the link was skipped.

    I didn't spend a lot of time looking at your script. But a superficial look would seem to indicate that it produces links like,
    <a href="/Documents/Advisories/avian_flu/mydocument.doc">

    So maybe the plugins for Word / Excel / Etc are not installed. Maybe the URL in the link is invalid. Maybe the link appear to be outside of your site (as determined by the base URL).

    Can you E-Mail us the verbose Zoom log file, or post part of it here.

    -------
    David

    Comment


    • #3
      Really appreciate your prompt reply and suggestions.

      Unfortunately, verbose mode only shows shows two lines.... downloading and scanning the php file. I will attempt to send the full log file as a private message to you.

      Your assumption about the link that is created is correct and is borne out by two methods. Hovering over the link and clicking vew source from the contstructed web page.

      The top of the log file shows successful installation of the plugins.

      The URL in the link does pull up the document so I am not suspecting that as the cause.

      I am going to try another experiment. I will move the document folder in as a subfolder directly below the location where the php file is saved. I thought zoom wold have followed the link over to a folder at the same level on the same server in the same intranet. Test results to follow.

      Kim

      Comment


      • #4
        Solution - Add to the Base URL

        My problem.... I didn't actually have verbose mode On.

        Verbose mode indicated that the documents were classified as external according to the Base URL.

        The solution was to add to the Base URL entry using the format

        Code:
        http&#58;//www.mysite.com/advisories/;http&#58;//www.mysite.com/documents/
        Thanks for pointing me in the right direction.

        Kim

        Comment

        Working...
        X