OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

Use the .doc reader and .ps writer code as standalone

 
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Macros and API
View previous topic :: View next topic  
Author Message
hthb
General User
General User


Joined: 11 Feb 2004
Posts: 6
Location: Iceland

PostPosted: Wed Feb 11, 2004 9:05 am    Post subject: Use the .doc reader and .ps writer code as standalone Reply with quote

I have this website http://www.doc2pdf.net in which I utilise OpenOffice to convert .doc files to .pdf for free. The php/bash code is an ugly hack, utilising xvfb-run (to create a virtual framebuffer for OOo) and runs Openoffice from the command line as a special user (using sudo) with the -p option:
Code:

sudo -H -u openofficeuser xvfb-run -a --server-num=2 -e /var/www/doc2pdf/convert/logs/xvfb/xvfb.error.log ooffice -p /va
r/www/doc2pdf/convert/uploads/$1
.

Needless to say, this is quite hefty on my system. Each conversion takes a minimum of 15 seconds. When many people try to convert at once, I run out of virtual displays in X. Still the service mostly does what is intended of it.
I would like to convert the Java code which reads and parses .doc files without running the whole OpenOffice suite. I would like it to be more like a web service. Can someone point me in the right direction? Thanks alot.[/code]
_________________
Look at www.doc2pdf.net for instant,, free, no need to register, conversion of .doc to .pdf files.
Back to top
View user's profile Send private message Visit poster's website
DannyB
Moderator
Moderator


Joined: 02 Apr 2003
Posts: 3991
Location: Lawrence, Kansas, USA

PostPosted: Wed Feb 11, 2004 10:56 am    Post subject: Reply with quote

I really believe that OpenOffice.org is going to offer you one of the very best ways to convert DOC to PDF.

What you have done so far is impressive, at least in terms of the knowledge and effort expended.

May I suggest another architecture.

Write a web page in JSP (Java, using servlets or Java Server Pages.). This is really not particularly difficult to do in Tomcat.

From your java code, accept the upload file. Now place a "job" structure onto a queue. A single java thread services all jobs in the queue and processes them in turn. A page, waits until its entry in the queue has been processed, and then streams the resulting PDF file back to the client browser, and then deletes the temp files containing the DOC and PDF.

The service thread just takes entries off the queue and processes them one at a time. (This way, you are never trying to make OOo do more than one converstion at a time. It is not, apparently from past discussions here, thread safe.)

The service thread, connects to OOo, trivially does the conversion and then sets appropriate flags / the result PDF pathname, etc. in the job structure on the queue.

OOo would need to be continuously running. Or, the service thread could launch OOo if it is not already running, process however many jobs are in the job queue. If the service thread notices that after awhile, there have been no jobs to process, then it could connect to OOo via. the API and call the terminate() method on the Desktop service to quit OOo.

You only need one copy of OOo and one virtual X framebuffer at any given time. It is thread safe. It processes as many requests as you want. It could even do multiple types of conversions....

Excel --> PDF
Word --> PDF
PowerPoint --> PDF
PowerPoint --> Flash
etc., etc.

Finally, you could write a "web service" front end to the same service thread, rather than a normal web page.
_________________
Want to make OOo Drawings like the colored flower design to the left?
Back to top
View user's profile Send private message
Guest






PostPosted: Wed Feb 11, 2004 11:48 am    Post subject: I would rather like to use the power of the GPL... Reply with quote

... and just rip the code from off OOo and use it standalone, if that's possible that is.
Back to top
DannyB
Moderator
Moderator


Joined: 02 Apr 2003
Posts: 3991
Location: Lawrence, Kansas, USA

PostPosted: Wed Feb 11, 2004 1:52 pm    Post subject: Re: I would rather like to use the power of the GPL... Reply with quote

Anonymous wrote:
... and just rip the code from off OOo and use it standalone, if that's possible that is.


As long as you obey the license, this would not be a problem.

Finding the relevant code is perhaps the first obstacle. (I've never looked at the source to OOo.) You're after the code for Import and Export Filters. The import filter for Word, and the export filter for PDF.

I suspect that the Word import filter simply generates the necessary XML or internal DOM structure for an OOo document. The Export filter conversely takes the DOM structure, or an XML set of events (sax) and generates a PDF (possibly as postscript first?)

By the time you implement all of the infrastructure to support both filters, you have implemented much of the document model of an OOo Writer document. That is, if you want to support ALL of the import features that Writer's Word Import filter already supports. And if you want to support ALL of the current Export features. You pretty much need an intermediate form that represents everything that a Writer document can have.

Is this really an easier implementation than what I suggested above? I don't think so, but of course it would be interesting to see someone do it.
_________________
Want to make OOo Drawings like the colored flower design to the left?
Back to top
View user's profile Send private message
Guest






PostPosted: Fri Feb 13, 2004 3:42 am    Post subject: probably harder work Reply with quote

It's probably a harder work but then I could run the service without needing to load X, and other OO stuff in memory, cutting off a lot of overhead. As I am going to move my site to a hosting company [linnode.com] which let's you have a virtual server with root access but not much memory (64 megs cheapest, max 256 megs) it would probably be a good idea. Thanks for the tips! I appreciate it.
Back to top
ken e
Guest





PostPosted: Tue Apr 06, 2004 7:27 am    Post subject: Reply with quote

Has anyone made any progress with this approach? I would like to be able to convert a powerpoint file to html (ideally, with the animations intact). I'd like to leverage the filters openoffice provides to incorporate this functionanlity into an existing product. Thus, I'd like to be able to simply link in the modules required. Any idea how difficult this would be?
Back to top
ouppsss
Newbie
Newbie


Joined: 09 Aug 2004
Posts: 3

PostPosted: Mon Aug 09, 2004 4:39 am    Post subject: Reply with quote

DannyB wrote:
I really believe that OpenOffice.org is going to offer you one of the very best ways to convert DOC to PDF.

What you have done so far is impressive, at least in terms of the knowledge and effort expended.

May I suggest another architecture.

Write a web page in JSP (Java, using servlets or Java Server Pages.). This is really not particularly difficult to do in Tomcat.

From your java code, accept the upload file. Now place a "job" structure onto a queue. A single java thread services all jobs in the queue and processes them in turn. A page, waits until its entry in the queue has been processed, and then streams the resulting PDF file back to the client browser, and then deletes the temp files containing the DOC and PDF.

The service thread just takes entries off the queue and processes them one at a time. (This way, you are never trying to make OOo do more than one converstion at a time. It is not, apparently from past discussions here, thread safe.)

The service thread, connects to OOo, trivially does the conversion and then sets appropriate flags / the result PDF pathname, etc. in the job structure on the queue.

OOo would need to be continuously running. Or, the service thread could launch OOo if it is not already running, process however many jobs are in the job queue. If the service thread notices that after awhile, there have been no jobs to process, then it could connect to OOo via. the API and call the terminate() method on the Desktop service to quit OOo.

You only need one copy of OOo and one virtual X framebuffer at any given time. It is thread safe. It processes as many requests as you want. It could even do multiple types of conversions....

Excel --> PDF
Word --> PDF
PowerPoint --> PDF
PowerPoint --> Flash
etc., etc.

Finally, you could write a "web service" front end to the same service thread, rather than a normal web page.



has anyone already done this in java? Does i t work with good performance?
Back to top
View user's profile Send private message
AndrewZ
Moderator
Moderator


Joined: 21 Jun 2004
Posts: 4140
Location: Colorado, USA

PostPosted: Wed Feb 27, 2008 3:39 pm    Post subject: Re: Use the .doc reader and .ps writer code as standalone Reply with quote

Quote:
I have this website http://www.doc2pdf.net


How is your site doing? I see it is still up!

hthb wrote:
The php/bash code is an ugly hack, utilising xvfb-run (to create a virtual framebuffer for OOo) and runs Openoffice from the command line as a special user (using


This is an old post, so the following is just for the record. People still read this stuff. Smile

OpenOffice.org 2.3.0+ Linux no longer requires an X server (not even Xvfb) if you install the headless RPM package and use the -headless option. IMO, -headless is easier than Xvfb.

Quote:
I would like to convert the Java code which reads and parses .doc files without running the whole OpenOffice suite. I would like it to be more like a web service. Can someone point me in the right direction? Thanks alot.[/code]


http://www.artofsolving.com/opensource/jodconverter is such a web service

Quote:
I would like to be able to convert a powerpoint file to html (ideally, with the animations intact).


Here is a simple, light-weight approach:
Batch command line file conversion with PyODConverter
_________________
<signature>
* Did you solve your problem? Do others a favor: Post the solution
* OpenOffice.org Ninja
* BleachBit
</signature>
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Macros and API All times are GMT - 8 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group