Results 1 to 9 of 9
  1. #1

    How to delete all javascripts in PhP

    Hi,

    I tried to use regular expression to delete any matches of script

    $content=eregi_replace("(<script)[\s\S]*(script>)","hello",$content);


    It doesn't work. I wonder why?

  2. #2
    Join Date
    Mar 2004
    Location
    USA
    Posts
    4,342
    From:
    http://www.phptricks.com/lesson.php?id=16

    PHP Code:
    $content preg_replace("'<script[^>]*?>.*?</script>'si",'Hello',$content); 
    Peace,
    Testing 1.. Testing 1..2.. Testing 1..2..3...

  3. #3
    Join Date
    Feb 2003
    Location
    L.A. C.A.
    Posts
    335
    azizny, you missed a / before si...

    You could also just use htmlspecialchars() function which will stop javascript.

  4. #4
    Let me try to understand.

    '<script[^>]*?>.*?</script>'

    What does ' for?

    I think they try to find <script, followed by anything that is not > and then a question mark?

    The question mark is equivalent with {0,1} which explain <script[^>]*.

    Now, the .* is suspicious. The . do not match newline characters. Some scripts span several line. how can it match scripts with several lines?

    Also, what does /si on the back do?

  5. #5
    What's the difference between preg_replace and ereg_replace?

  6. #6
    Join Date
    Feb 2005
    Location
    Australia
    Posts
    5,842
    Quote Originally Posted by hardjoko
    '<script[^>]*?>.*?</script>' What does ' for?
    It's just a character used instead of / to define the ends of the regexp.

    Quote Originally Posted by hardjoko
    I think they try to find <script, followed by anything that is not > and then a question mark?

    The question mark is equivalent with {0,1} which explain <script[^>]*.
    Yes, that's looking for anything that's not > followed by a >. I think the ? may be redundant here but it's still a special character - it's not part of the search string.

    Quote Originally Posted by hardjoko
    Now, the .* is suspicious. The . do not match newline characters. Some scripts span several line. how can it match scripts with several lines?
    .*? means search for any number of any characters, but don't be greedy, so you can still match on the </script> afterwards.

    Quote Originally Posted by hardjoko
    Also, what does /si on the back do?
    s treats the multi-line string as a single line, i makes it case-insensitive.

    http://php.net/manual/en/ref.pcre.php
    Chris

    "Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them." - Laurence J. Peter

  7. #7
    I think you need the m modifier.

    This works:

    PHP Code:
    <?
    $html
    ='
    <html>
    <div id="myid">
    There is no JavaScript here.
    </div>
    <script language="JavaScript">
    var s="There is JavaScript on this page.";
    var e=document.getElementById("myid");
    if (e) e.innerHTML=s;
    </script>
    </html>'
    ;

    $html=preg_replace('/(<script)(.+)(<\/script>)/mis','',$html);

    echo 
    $html;
    ?>

  8. #8
    Join Date
    Mar 2006
    Posts
    965
    This line:

    PHP Code:
    $html=preg_replace('/(<script)(.+)(<\/script>)/mis','',$html); 
    could probably be used like this too:

    PHP Code:
    $html=preg_replace('/(<script)(.+)(<\/script>)/siU','',$html); 

  9. #9
    Join Date
    Feb 2005
    Location
    Australia
    Posts
    5,842
    [Sigh] The original as quoted works correctly, unlike most of the corrections

    It's not missing the end / and it does not need the m modifier, but if you change it and leave out the rather important .*? then it does need the U modifier - as horizon's post - to make the matches non-greedy.
    Chris

    "Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them." - Laurence J. Peter

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •