So there I was taking forever to fix content from a Word Perfect document converted to Word then pasted as HTML. I was spending a while removing all the bullshit non Ascii Characters and I thought to myself and said…Wes, you idiot, you are a programmer, do this the right way and filter it out. So I did and here is a simple way of doing it through regular expressions and php.
<?php
//Your nasty string of word and non ascii chars
$Contentz = "These are shitty chars „ and we dont like them nor want them.";
//Array of content I want to make a space
$badContent = array(" ");
//Replace the bad arrays with a space
$Contentz = trim(str_replace($badContent," ",$Contentz));
//Specific string replaces for ellipsis, etc that you dont want removed but replaced
$theBad = array("“","”","‘","’","…","—","–");
$theGood = array("\"","\"","'","'","...","-","-");
$Contentz = str_replace($theBad,$theGood,$Contentz);
//Whatever might be left over...
//Remove all non ascii chars (aka: bad Microsoft Word and Word Perfect Shit shit)
$Contentz = preg_replace('/[^(\x20-\x7F)\x0A]*/','', $Contentz);
echo $Contentz;
?>
$Contentz will show up removing the characters.
Cheers,
Wes S .Ray
Wow…I’ve needed something like this before. Way to go, and thanks!
I don’t understand why you haven’t made a million dollars in book sales yet!
why do you keep sharing company secrets?
1 minute: Why are non-ascii characters in you target document? Could it be possible while converting from the old word perfect document loosing some format information? Or there was france or arabian text in it?
For your solution simply use
> strings mydoc.wpf > mydoc.txt
Good night …
Christian, look at the title (with php and regular expressions). Why do people create huge php classes to zip files when you can use the “zip” in a CLI interface. This is just a solution for people who are new to programming and paste text into a IDE like dreamweaver leaving characters that look like spaces but parse as ╤⌂Ñ weird shit.
[...] not be published) (required) Website. Theme by Justin Winslow | WordPress | Entries and Comments …Strip and Remove Non Ascii Characters using PHP and Regular …Strip and Remove Non Ascii Characters using PHP from sources like Word Perfect and Microsoft Word. [...]
comment maigrir vite…
[...] Pure perte de poids, anciennement connu sous le nom LA Weight Loss, a annoncé la semaine dernière qu’il allait fermer plus de 400 de leurs centres de perte de poids [...]…
[...] I have been looking for a great function to do this for awhile and I finally found one… does the job perfectly! I only need to concern myself with English so am not worried about losing non-ascii characters that might make up Arabic or some other language. Kudos to Wes for originally writing and posting this on his blog here: http://www.wessray.com/php/strip-and-remove-non-ascii-characters-using-php-regular-expressions/ [...]
Stem Cell-Based Topical Serums for Renewal of Aging Skin | Secrets ……
presentation two appropriate lifestyle, diet, carrying out, plus supplementation are crtitical representing anti-aging clothed in….
Buy best breakfast tea by my site…
one site of mine is about buying best breakfast tea online…
Buy best breakfast tea by my site…
The site is about buying best breakfast tea online…
Top 7 Common Lawn Care Errors …
There are tips for caring for your lawn. Equally important tips for lawn care is a lawn care mistakes that we should know how to avoid. Lawn care and….
Proactol For Easy Weight Loss…
Proactol like a healthy way of making it easier to take charge of your easy weight loss and eating habits. …
how to lose weight during for holidays…
how to trying to lose weight my hubby wants me skinny…