| View previous topic :: View next topic |
| Author |
Message |
mantera Guest
|
Posted: Mon Sep 15, 2003 6:40 am Post subject: convert PDF to OOo formats? |
|
|
| does anyone know a way to convert a PDF file to a format that OOo can edit? |
|
| Back to top |
|
 |
DannyB Moderator


Joined: 02 Apr 2003 Posts: 3991 Location: Lawrence, Kansas, USA
|
Posted: Mon Sep 15, 2003 6:57 am Post subject: |
|
|
I do not know of any way.
I have been daydreaming for a long time about building a program that would read and parse a PDF, and then create an OOo drawing of what is in the PDF. Each page of the PDF would end up as a separate page of the Drawing. Text, pictures, geometric shapes, etc. You then could edit the drawing, and you could re-export is as a PDF.
But like I said, daydreaming. I've read the PDF document spec before. There are still some aspects of OOo Draw that I need to learn better before I would begin a project like this. (Such as bezier curve shapes.) _________________ Want to make OOo Drawings like the colored flower design to the left? |
|
| Back to top |
|
 |
carl Super User


Joined: 21 Apr 2003 Posts: 920 Location: Germany
|
Posted: Mon Sep 15, 2003 8:55 am Post subject: |
|
|
the adobe product alwows you the saveas txt _________________ carl
Using OpenOffice.org 2 on XP sp2 |
|
| Back to top |
|
 |
KirkJobSluder Power User

Joined: 25 Apr 2003 Posts: 73
|
Posted: Mon Sep 15, 2003 10:40 am Post subject: |
|
|
A problem with pdf is that you never know how the text in encoded internally. It is popular within my department to scan entire articles and transmit pdfs which are basically a stack of bitmap files.
Depending on the internal coding, there are a couple of PDF->text utilities for both unix and windows. pdftotext is the one I use.
PDF is a difficult format to translate from because it is highly optimized for consistent printing (in fact much of it is wrapped postscript). |
|
| Back to top |
|
 |
ftack Moderator


Joined: 27 Jan 2003 Posts: 3102 Location: Belgium
|
Posted: Mon Sep 15, 2003 11:43 pm Post subject: |
|
|
Yes, it is a format designed for printing and viewing. A PDF is an "end product". You should have acces to the source document and recreate the PDF to change it. The source document can be anything from a text document over Latex code to a Writer or word document.
Adobe Acrobat lets you edit a PDF to some extent, but this is limited to correcting some typos and changes that do not affect the layout. We shold not expect that Writer at some point can open an PDF for editing, because it's really not designed for that. |
|
| Back to top |
|
 |
Guest
|
Posted: Wed Oct 29, 2003 2:50 pm Post subject: Why I Want Writer to Open a pdf or Postscript File |
|
|
| ftack wrote: | Yes, it is a format designed for printing and viewing. A PDF is an "end product". You should have acces to the source document and recreate the PDF to change it. The source document can be anything from a text document over Latex code to a Writer or word document.
Adobe Acrobat lets you edit a PDF to some extent, but this is limited to correcting some typos and changes that do not affect the layout. We shold not expect that Writer at some point can open an PDF for editing, because it's really not designed for that. |
But there are instances when one might want to open a pdf in Open Office. For example, I have a 500 page pdf and I want to send only page 320 to someone. Or I have a very detailed roadmap in pdf but it was created like a poster. When I try to print to letter size paper all of the detail is mashed into this small size. I want to print a small portion at 500%. I don't see these examples as editing-proper, but I need to open them in some type of editor to produce the results I want. |
|
| Back to top |
|
 |
DannyB Moderator


Joined: 02 Apr 2003 Posts: 3991 Location: Lawrence, Kansas, USA
|
Posted: Wed Oct 29, 2003 3:14 pm Post subject: |
|
|
I would also point out that PDF is not a bitmap format.
Yes it can contain bitmaps, because obviously, a page can contain pictures. Some software, such as scanning software, generates PDFs whose pages are nothing but giant bitmaps.
But really, fundamentally, PDF is a vector graphics format. Some of the "objects" on a page, can be bitmaps. Even a single bitmap per page which takes up the entire page.
In any event, it should be possible to parse a PDF and generate a Draw document whose pages mimic the PDF contents. Even if the PDf is just a large collection of scanned bitmaps in some cases. But in most cases, whenever a PDF did NOT come from a scanner, the PDF is vector graphics that are scalable.
I've skimmed through the PDF specification before. This is what got me to thinking many months ago about the possibility to import a PDF into Draw. _________________ Want to make OOo Drawings like the colored flower design to the left? |
|
| Back to top |
|
 |
ftack Moderator


Joined: 27 Jan 2003 Posts: 3102 Location: Belgium
|
Posted: Thu Oct 30, 2003 4:32 am Post subject: |
|
|
As we hear DannyB, there is a future for OOo Draw reading PDF. In the mean time
<quote>For example, I have a 500 page pdf and I want to send only page 320 to someone. </quote>
I'd load the PDF in Acrobat Viewer and print one single page to PDF using my Ghostscript/Redmon PDF Writer. Or you'd open it in Ghostview and save one page directly from Ghostview to PDF
<quote>Or I have a very detailed roadmap in pdf but it was created like a poster. ... I want to print a small portion at 500%. </quote>
This one's more difficult. I can think of displaying the file full screen, zooming in to the portion of interest and making a screenshot. Perhaps, one could create an eps from the map (using a postscript printer driver and print to file), read it in into draw and enlarge it such that the portion of interest fills the page. Otherways, print the map to a large bitmap, again using the ghostscript/Redmon combo, and crop the selection of interest. Problem would be the very large intermediate graphic, probably. |
|
| Back to top |
|
 |
Guest
|
Posted: Thu Oct 30, 2003 4:54 am Post subject: depends... |
|
|
If the pdf file contains text and embedded pictures, the pdf import plugin for koffice under Linux does a good job. For plain text it does anyway;-)
You'd end up with a koffice file that you can export as a RTF file which you could open in OpenOffice. All embedded pictures are saved as png files if I remember right.
What you can always do is print into a postscript file and then edit that postscript file after opening at a high enough resolution in tools like photoshop or the imp under Unix.
for tex only file pdf2txt does a quick and good job.
If you search freshmeat.net you find lots of conversion tools some create a ong file per page which basicallyis a screenshot per page of the pdf file others combine HTML with embedded pcitures.
Overall one has to say that the quility of all the conversion tools is often lousy. Try lots of them and pick the one that suits your needs best.
Juergen |
|
| Back to top |
|
 |
DannyB Moderator


Joined: 02 Apr 2003 Posts: 3991 Location: Lawrence, Kansas, USA
|
Posted: Thu Oct 30, 2003 6:14 am Post subject: Re: depends... |
|
|
| Anonymous wrote: | If the pdf file contains text and embedded pictures, the pdf import plugin for koffice under Linux does a good job. For plain text it does anyway;-)
You'd end up with a koffice file that you can export as a RTF file which you could open in OpenOffice. All embedded pictures are saved as png files if I remember right. |
It sounds like KOffice imports a PDF as a "word processing" document not as a "drawing" document. I believe that OOo's Draw and NOT Writer is the correct destination for an imported PDF.
PDF is a language for placing black (or color) marks onto paper. These marks can consist of commands such as:
* draw text "FooBar" at position such and so in SansSerif 15 pt.
* draw a triangle at position such and so, fill with puke green
* draw a dashed line over there
Essentially, a vector graphics language.
Now just imagine how you would "render" each of these commands as a draw object, instead of into an array of pixels.
A PDF import module would definitely have to parse the PDF doc, but the "rendering" would not consist of needing a graphics engine to draw pixels, it would instead consist of generating the most appropriate Draw shape for each PDF markup command. _________________ Want to make OOo Drawings like the colored flower design to the left? |
|
| Back to top |
|
 |
Guest
|
Posted: Thu Oct 30, 2003 9:27 pm Post subject: OCR the PDF |
|
|
Scansoft's most recent Ominpage claims to directly OCR PDF files, no print and scan required. I doubt they support OO native formats, but .DOC and RTF are supported.
Of course you may be able to simply print the PDF files to a 200 or 300 DPI resolution image format (ie. jpg) then feed them to any OCR app.
It's the only solution I have ever seen. |
|
| Back to top |
|
 |
dorpm General User

Joined: 13 Oct 2003 Posts: 29
|
|
| Back to top |
|
 |
chuck General User

Joined: 30 Nov 2003 Posts: 10
|
Posted: Sun Nov 30, 2003 7:14 am Post subject: maps |
|
|
The Macromedia draw program Freehand will do exactly what you want with your map and PDF pages. You can zoom into the map and print the zoomed section perfectly.
This is beyond the reach of anything Linux has or ever will! |
|
| Back to top |
|
 |
Lee Guest
|
Posted: Thu Jan 01, 2004 11:44 am Post subject: Editing PDFs |
|
|
It is to be noted that one can open a PDF file in The Gimp, the imaging software which is standard in pretty well Linux distributions. It treats it as an image, and you can then make changes to it using the tools available there. You can zoom in on the particular part of an image, clip out and copy the parts you want, and so on. You can also add text. If you have a PDF converter set up, you can then print back to PDF format.
Not perfect, but probably the best short term solution. |
|
| Back to top |
|
 |
DannyB Moderator


Joined: 02 Apr 2003 Posts: 3991 Location: Lawrence, Kansas, USA
|
Posted: Thu Jan 01, 2004 12:41 pm Post subject: |
|
|
When the GIMP imports a PDF, won't it rasterize it into pixels?
Therefore, any editing you do is really just editing pixels. The PDF you then save from the GIMP is just more pixels. It has lost any notion of text, fonts, lines, vectors, tables, shapes, etc. _________________ Want to make OOo Drawings like the colored flower design to the left? |
|
| Back to top |
|
 |
|