Posts tagged "tools"
Strip Attributes from HTML
Sometimes when you're redoing a site, you've got a lot of moldy old HTML you want to convert. At times like this tools like Tidy can really come in handy.
Just recently I found the need to remove all id, class, and style tags from some spaghetti HTML. It was a few simple regexps, but I figured I would release the script I wrote incase anyone ever needs it.
The regular expressions I used is:
- $attribute = 'style'; //or 'class', or anything
- $pattern = '/ '.$attribute.'=[\B\n\r\f ]*"[\w :#;-]+"/';
This takes care of the attribute and the value. It'll also get the new lines that Tidy likes to put in after the '=' sign in the attribute definition.
Here's how to use the script:
Strip class or id attributes from an X/HTML document. Usage: php stripattributes.php [-ic] filename Options: -i: strip all IDs -c: strip all classes -s: strip all styles -m: modify the original file and do not write the buffer to stdout Example: php phpstrip -ic myfile.html - Strips all the classes and IDs from myfile.html and returns it to stdout
You can download it here:
http://pbtg.com/uploads/stripattributes.php.zip
Just run that from the command line with whatever options you want and it should be ok. I would refrain from using the m option since it will overwrite your original file.
Please let me know if you find any obvious instances where this doesn't work.
Thanks!





