Web Hosting Talk







View Full Version : ms word-doc processing


Angelo
05-27-2006, 07:30 PM
Hello,

I need some tool that accepts direct formatting from Microsoft Word (copy & paste) and process it into a html file. What i currently use is

http://www.dynamicdrive.com/dynamicindex16/richtexteditor/index.htm

However it puts so much <xml> style tags, that (i think) is corrupting the view. I am replacing xml tags with empty strings, but that does not make a clear view too. Is there any smooth way of doing this, pasting a doc formatting into a form area and process it to save to an html file to be included in my dynamic publisher script.

Thanks

mwatkins
05-28-2006, 12:20 AM
HTML Tidy... install it if not already on your system. Call it from your script. To see what options it supports: tidy -help-config

You'll be somewhat interested in the "word-2000" options and a few others.

In Python I'd write the contents of your form to a file or tempfile, and open tidy like so (see os.open in this snippet of code):
options = {}
# we want to force these defaults regardless
options['quiet'] = 1
options['force_output'] = 1
options['tidy_mark'] = 0
options['markup'] = 1
if encoding:
options['char_encoding'] = encoding
else:
encoding = 'utf-8'
options['char_encoding'] = encoding

# ... other stuff
fp = os.popen("/usr/bin/env tidy %s '%s' 2>&1" % \
(options, htmlfile.name), "r")
# ... other stuff

There's a bit more to it than that but the point I'm making here is Tidy is useful for cleaning cruft out of cut and pasted html, including from Word, and you can find a way to integrate that capability into your app with your language of choice.

Angelo
05-28-2006, 05:54 AM
I do not have system access to the system, using my application. Does that keep the formatting as it is, like table background, fonts and colors in MS Word?

Burhan
05-28-2006, 06:04 AM
Yes, it just tidies up the HTML, just to be safe, try it out first.

mwatkins
05-28-2006, 09:33 AM
Visit http://tidy.sourceforge.net/

If you don't have a unix machine to test on, someone has kindly made Windows builds. Whatever you do, you'll find that its a part way measure, but at least tidy will take you a good part of the distance.