PassMark Logo
Home » Forum

Announcement

Collapse
No announcement yet.

Getting "Suspected invalid HTML (possible unmatched quote characters)" on php files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting "Suspected invalid HTML (possible unmatched quote characters)" on php files

    This is only happening on php pages that have enough other code embedded in them that have to mix single and double quotes, and often include snippets of source code.
    However, it is only on a couple out of hundreds of similar pages, so presumably the Warning is exactly accurate, and it is just figuring out how indexer calculates line number.
    The pages are unfortunately internal and confidential or could provide sample.

    When the indexer specifies a line number as the unmatched quotes being after that, how is that line number calculated?

    07:31:06 - [WARNING] Suspected invalid HTML (possible unmatched quote characters) after line 3133 on page: http://*******/fubar.php (content may not be correctly indexed)

    Reason I ask is that there aren't any mismatched quotes within several dozen of that line number in the source file [in either "vim" or Dw6] and not seeing OBVIOUS booboos with vim set to check open/end quotes.
    The line in question is not near any php includes.

    Even dumping the page as a file with curl http://*******/fubar.php > bloopers.txt isn't helping so far.

    Is the line number calculated by HTML rules for <br /> tags and other "line ending" html tags, or just higher level tags like </p>, </pre> </td> etc?
    Is the header, metadata, etc, included in calculating the line number, or only within the body?

    Thanks...


  • #2
    Line numbers are based on newline control characters (i.e. "\n") of the "source". But since this is a PHP script being indexed over Spider Mode, the "source" file would not be "fubar.php", but rather the HTML output generated by the script and returned to the client.

    So yes, dumping with curl should get you something close to it (but it may also be very different if your script changes behaviour based on user-agent, or if there are additional parameters whilst crawling the site - including but not limited to authentication, cookies, etc.). Another way is to actually browse to that page using a browser and "View source".

    Note also it's not reporting where the opening quote is found, only where the expected character is missing. This could be an expected ">" character to close a HTML tag, as well as a missing end quote (single or double) character.

    Might be best to get the curl dump and/or saved source from a browser, and send it to a validator such as W3C to take a look.
    --Ray
    Wrensoft Web Software
    Sydney, Australia
    Zoom Search Engine

    Comment


    • #3
      Thanks, will poke again. I can't export the file or the curl dump as it is confidential. Hopefully I won't have to resurrect an old PERL "find the bad html" utility.
      I'm certain it will be a headsmacker when I find it.

      Comment


      • #4
        Found it. In an include for navigation rollovers that moves with current context. Had to change fonts in DW6 to see it. backtick !=single quote.
        Thanks for the tip. curl dump definitely required.

        Comment

        Working...
        X