Phoenix Business Technologies Group

Phoenix Business Technologies Group

 

Posts tagged "tools"

Strip Attributes from HTML

Posted by Jim Podroskey on 2009-07-17 tagged html, php, regex, tools

Sometimes when you're redoing a site, you've got a lot of moldy old HTML you want to convert. At times like this tools like Tidy can really come in handy.

Just recently I found the need to remove all id, class, and style tags from some spaghetti HTML. It was a few simple regexps, but I figured I would release the script I wrote incase anyone ever needs it.

The regular expressions I used is:

  1.   $attribute = 'style'; //or 'class', or anything
  2.   $pattern = '/ '.$attribute.'=[\B\n\r\f ]*"[\w :#;-]+"/';
  3.   return preg_replace($pattern, '', $text);

This takes care of the attribute and the value. It'll also get the new lines that Tidy likes to put in after the '=' sign in the attribute definition.

Here's how to use the script:

      Strip class or id attributes from an X/HTML document.
      Usage:
        php stripattributes.php [-ic] filename
      Options:
        -i: strip all IDs
        -c: strip all classes
        -s: strip all styles

        -m: modify the original file and do not write the buffer to stdout
      Example:
        php phpstrip -ic myfile.html - Strips all the classes and IDs from
                                      myfile.html and returns it to stdout

You can download it here:

http://pbtg.com/uploads/stripattributes.php.zip

Just run that from the command line with whatever options you want and it should be ok. I would refrain from using the m option since it will overwrite your original file. 

Please let me know if you find any obvious instances where this doesn't work.

Thanks!