This may be a silly question, but it's something that's bothered me ever since S...

natrius · on July 12, 2007

My first guess would be Abiword. In the course of putting together an open source word processor that can handle a couple of different closed file formats, they've spun off their code into libraries. The wvWare library handles Word files. http://abiword.com/projects/

As far as I know, the whole Abiword project is GPL, though that shouldn't matter much for server-side code, unless you're letting your customers host the service themselves, like Versionate seems to be planning on doing... I guess you could just pipe the output from a thin wrapper around the library to the rest of your code.

martin · on July 12, 2007

It could be done this way, but frankly, Abiword's Word importer isn't very good. OpenOffice's, while not perfect, is much better. Unfortunately, unlike Abiword, OOo doesn't come with a nice command-line utility for doing those conversions. OOo has a VBA-esque language that allows you to automate tasks like that, but it's a lot more suitable for more "interactive" purposes than as part of another app's backend.

Another thing to note is that some of Scribd's backend is written in C#. Maybe if your app is Windows based, there are Office API calls that let you do stuff like this. Just a guess, though.