| View previous topic :: View next topic |
| Author |
Message |
hthb General User

Joined: 11 Feb 2004 Posts: 6 Location: Iceland
|
Posted: Wed Feb 11, 2004 9:05 am Post subject: Use the .doc reader and .ps writer code as standalone |
|
|
I have this website http://www.doc2pdf.net in which I utilise OpenOffice to convert .doc files to .pdf for free. The php/bash code is an ugly hack, utilising xvfb-run (to create a virtual framebuffer for OOo) and runs Openoffice from the command line as a special user (using sudo) with the -p option:
| Code: |
sudo -H -u openofficeuser xvfb-run -a --server-num=2 -e /var/www/doc2pdf/convert/logs/xvfb/xvfb.error.log ooffice -p /va
r/www/doc2pdf/convert/uploads/$1
| .
Needless to say, this is quite hefty on my system. Each conversion takes a minimum of 15 seconds. When many people try to convert at once, I run out of virtual displays in X. Still the service mostly does what is intended of it.
I would like to convert the Java code which reads and parses .doc files without running the whole OpenOffice suite. I would like it to be more like a web service. Can someone point me in the right direction? Thanks alot.[/code] _________________ Look at www.doc2pdf.net for instant,, free, no need to register, conversion of .doc to .pdf files. |
|
| Back to top |
|
 |
DannyB Moderator


Joined: 02 Apr 2003 Posts: 3991 Location: Lawrence, Kansas, USA
|
Posted: Wed Feb 11, 2004 10:56 am Post subject: |
|
|
I really believe that OpenOffice.org is going to offer you one of the very best ways to convert DOC to PDF.
What you have done so far is impressive, at least in terms of the knowledge and effort expended.
May I suggest another architecture.
Write a web page in JSP (Java, using servlets or Java Server Pages.). This is really not particularly difficult to do in Tomcat.
From your java code, accept the upload file. Now place a "job" structure onto a queue. A single java thread services all jobs in the queue and processes them in turn. A page, waits until its entry in the queue has been processed, and then streams the resulting PDF file back to the client browser, and then deletes the temp files containing the DOC and PDF.
The service thread just takes entries off the queue and processes them one at a time. (This way, you are never trying to make OOo do more than one converstion at a time. It is not, apparently from past discussions here, thread safe.)
The service thread, connects to OOo, trivially does the conversion and then sets appropriate flags / the result PDF pathname, etc. in the job structure on the queue.
OOo would need to be continuously running. Or, the service thread could launch OOo if it is not already running, process however many jobs are in the job queue. If the service thread notices that after awhile, there have been no jobs to process, then it could connect to OOo via. the API and call the terminate() method on the Desktop service to quit OOo.
You only need one copy of OOo and one virtual X framebuffer at any given time. It is thread safe. It processes as many requests as you want. It could even do multiple types of conversions....
Excel --> PDF
Word --> PDF
PowerPoint --> PDF
PowerPoint --> Flash
etc., etc.
Finally, you could write a "web service" front end to the same service thread, rather than a normal web page. _________________ Want to make OOo Drawings like the colored flower design to the left? |
|
| Back to top |
|
 |
Guest
|
Posted: Wed Feb 11, 2004 11:48 am Post subject: I would rather like to use the power of the GPL... |
|
|
| ... and just rip the code from off OOo and use it standalone, if that's possible that is. |
|
| Back to top |
|
 |
DannyB Moderator


Joined: 02 Apr 2003 Posts: 3991 Location: Lawrence, Kansas, USA
|
Posted: Wed Feb 11, 2004 1:52 pm Post subject: Re: I would rather like to use the power of the GPL... |
|
|
| Anonymous wrote: | | ... and just rip the code from off OOo and use it standalone, if that's possible that is. |
As long as you obey the license, this would not be a problem.
Finding the relevant code is perhaps the first obstacle. (I've never looked at the source to OOo.) You're after the code for Import and Export Filters. The import filter for Word, and the export filter for PDF.
I suspect that the Word import filter simply generates the necessary XML or internal DOM structure for an OOo document. The Export filter conversely takes the DOM structure, or an XML set of events (sax) and generates a PDF (possibly as postscript first?)
By the time you implement all of the infrastructure to support both filters, you have implemented much of the document model of an OOo Writer document. That is, if you want to support ALL of the import features that Writer's Word Import filter already supports. And if you want to support ALL of the current Export features. You pretty much need an intermediate form that represents everything that a Writer document can have.
Is this really an easier implementation than what I suggested above? I don't think so, but of course it would be interesting to see someone do it. _________________ Want to make OOo Drawings like the colored flower design to the left? |
|
| Back to top |
|
 |
Guest
|
Posted: Fri Feb 13, 2004 3:42 am Post subject: probably harder work |
|
|
| It's probably a harder work but then I could run the service without needing to load X, and other OO stuff in memory, cutting off a lot of overhead. As I am going to move my site to a hosting company [linnode.com] which let's you have a virtual server with root access but not much memory (64 megs cheapest, max 256 megs) it would probably be a good idea. Thanks for the tips! I appreciate it. |
|
| Back to top |
|
 |
ken e Guest
|
Posted: Tue Apr 06, 2004 7:27 am Post subject: |
|
|
| Has anyone made any progress with this approach? I would like to be able to convert a powerpoint file to html (ideally, with the animations intact). I'd like to leverage the filters openoffice provides to incorporate this functionanlity into an existing product. Thus, I'd like to be able to simply link in the modules required. Any idea how difficult this would be? |
|
| Back to top |
|
 |
ouppsss Newbie

Joined: 09 Aug 2004 Posts: 3
|
Posted: Mon Aug 09, 2004 4:39 am Post subject: |
|
|
| DannyB wrote: | I really believe that OpenOffice.org is going to offer you one of the very best ways to convert DOC to PDF.
What you have done so far is impressive, at least in terms of the knowledge and effort expended.
May I suggest another architecture.
Write a web page in JSP (Java, using servlets or Java Server Pages.). This is really not particularly difficult to do in Tomcat.
From your java code, accept the upload file. Now place a "job" structure onto a queue. A single java thread services all jobs in the queue and processes them in turn. A page, waits until its entry in the queue has been processed, and then streams the resulting PDF file back to the client browser, and then deletes the temp files containing the DOC and PDF.
The service thread just takes entries off the queue and processes them one at a time. (This way, you are never trying to make OOo do more than one converstion at a time. It is not, apparently from past discussions here, thread safe.)
The service thread, connects to OOo, trivially does the conversion and then sets appropriate flags / the result PDF pathname, etc. in the job structure on the queue.
OOo would need to be continuously running. Or, the service thread could launch OOo if it is not already running, process however many jobs are in the job queue. If the service thread notices that after awhile, there have been no jobs to process, then it could connect to OOo via. the API and call the terminate() method on the Desktop service to quit OOo.
You only need one copy of OOo and one virtual X framebuffer at any given time. It is thread safe. It processes as many requests as you want. It could even do multiple types of conversions....
Excel --> PDF
Word --> PDF
PowerPoint --> PDF
PowerPoint --> Flash
etc., etc.
Finally, you could write a "web service" front end to the same service thread, rather than a normal web page. |
has anyone already done this in java? Does i t work with good performance? |
|
| Back to top |
|
 |
AndrewZ Moderator


Joined: 21 Jun 2004 Posts: 4140 Location: Colorado, USA
|
Posted: Wed Feb 27, 2008 3:39 pm Post subject: Re: Use the .doc reader and .ps writer code as standalone |
|
|
How is your site doing? I see it is still up!
| hthb wrote: | | The php/bash code is an ugly hack, utilising xvfb-run (to create a virtual framebuffer for OOo) and runs Openoffice from the command line as a special user (using |
This is an old post, so the following is just for the record. People still read this stuff.
OpenOffice.org 2.3.0+ Linux no longer requires an X server (not even Xvfb) if you install the headless RPM package and use the -headless option. IMO, -headless is easier than Xvfb.
| Quote: | | I would like to convert the Java code which reads and parses .doc files without running the whole OpenOffice suite. I would like it to be more like a web service. Can someone point me in the right direction? Thanks alot.[/code] |
http://www.artofsolving.com/opensource/jodconverter is such a web service
| Quote: | | I would like to be able to convert a powerpoint file to html (ideally, with the animations intact). |
Here is a simple, light-weight approach:
Batch command line file conversion with PyODConverter _________________ <signature>
* Did you solve your problem? Do others a favor: Post the solution
* OpenOffice.org Ninja
* BleachBit
</signature> |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|