OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

Missing office.dtd when processing unzipped content.xml

 
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Macros and API
View previous topic :: View next topic  
Author Message
Piet
Guest





PostPosted: Wed Jul 30, 2003 7:00 am    Post subject: Missing office.dtd when processing unzipped content.xml Reply with quote

Hi there!
I would like to transform the ooo files to a specific, "real" XML format or better XML dialect without using the XML^export filters (I want to leave the original ooo files intact). For that purpose, I have extracted the content.xml files and tried to process them with the java version of xalan 2.5.1, but the stylesheet processor is unable to find the office.dtd. My XML-Editor (Bonfire 1.4) has a similar problem and can not display the file in tree view unless the dtd-part at the beginning is marked out. THe reason for this behaviour is that the reference to the dtd does not include any path names.
My question: Is there a way to tell ooo to give the dtd location in an absolute way including the complete path? Or maybe there is a xalan user amongst you who knows how to tell xalan the complete path Rolling Eyes ....
Many thanks for any help!
Piet
Back to top
DannyB
Moderator
Moderator


Joined: 02 Apr 2003
Posts: 3991
Location: Lawrence, Kansas, USA

PostPosted: Wed Jul 30, 2003 4:33 pm    Post subject: Reply with quote

I have encountered and then solved this very same problem.

I wrote a program in Java to read an OOo document. It then displays an unfoldable tree structure of the contents of the ZIP file. If you click on Content.xml, it then allows you to continue to unfold the tree view to inspect the nodes of the XML files within the ZIPfile.

The program I wrote is not accessible to the Internet, so I cannot at this moment, give you a URL to it. It is also in a different physical locateion from me right now. Probably tomorrow I can post another reply to this message that contains more useful information, including the source code to my program.

Here is the general idea of how I solved the problem. Recalling from memory. So don't quote me.

First, I took all of the OpenOffice DTD files, and combined them into a single DTD file. Then I embedded this DTD file directly into my JAR file. Now, if you have any text files, graphics, icons, etc. in your JAR file, you can get a read-only IOStream to one of those files by use of the class loader on any class tha came from the JAR file. For instance, MyClass.class, gets you the class. From the Class object, get the ClassLoader. You can then ask the ClassLoader to do something like getResource() or something like that. This allows you to access any "file" that is embeded within the JAR file of your application.

The second part of the solution involves writing an "entity resolver" or something like that, and then plugging your entity resolver into the XML Parser that you are using. Whenever the entity resolver is asked for certian DTD's, I just made it return the entire content of the combined set of DTD's that I described as part of the previous step.

Once I did this, I was able to read and validate an OOo document without errors.

Sorry for the lack of greater detail. I'll try to post another message pointing you to a working Java program, which is also useful as a "document inspector".
_________________
Want to make OOo Drawings like the colored flower design to the left?
Back to top
View user's profile Send private message
Piet
Guest





PostPosted: Fri Aug 01, 2003 8:31 am    Post subject: Sounds interesting... Reply with quote

Thank you for your hints!
M-m-m-m-m-maaaaaaybe I should have told you that I am not a programmers (not yet.... sounds as if will have to become one).
Anyway, I thinbk when I read your reply for trhe 20th time than I might be able to imagine whats is going on.
Keep on hacking ooorg!
Piet
Back to top
DannyB
Moderator
Moderator


Joined: 02 Apr 2003
Posts: 3991
Location: Lawrence, Kansas, USA

PostPosted: Sat Aug 02, 2003 5:45 am    Post subject: Reply with quote

Sorry I haven't posted a reply with a working example yet. I expect to do so in a few hours from now.
_________________
Want to make OOo Drawings like the colored flower design to the left?
Back to top
View user's profile Send private message
JohnV
Administrator
Administrator


Joined: 07 Mar 2003
Posts: 9183
Location: Lexinton, Kentucky, USA

PostPosted: Sat Aug 02, 2003 7:29 am    Post subject: Reply with quote

Hi Danny,

Lurking in the background, I have been waiting for this one! Hope your will also provide any tricks for using it. I haven't used java directly for some time.

Haven't tried the new calender program but expect to today.

JohnV
Back to top
View user's profile Send private message
DannyB
Moderator
Moderator


Joined: 02 Apr 2003
Posts: 3991
Location: Lawrence, Kansas, USA

PostPosted: Sat Aug 02, 2003 1:12 pm    Post subject: Reply with quote

Sorry Piet and JohnV for the delay.

As promised, here is what leads to the answer to your question.

See this....

http://kosh.datateamsys.com/~danny/OOo/Java-OOo/

This is where I keep two Java programs for OOo that I've worked on.

I wrote both of these in NetBeans.org, but I don't think that is relevant.

You might have real fun with the Maze Builder, but the Doc Explorer is the one you really are interested in here.

Download the OOoDocExplorer. Unzip it.

At the top level is the program Jar file. This is a useful utility in itself.

You launch it. It prompts you for a file. Pick any OOo document. It then presents a crude GUI window that allows you to explore the internal structure of the OOo Zip, including the XML files, as an unfoldable tree of nodes.

Source code included. (License is LGPL.) Inside the source file is a long README file that explains the theory of operation.

In a very small nutshell, it works like this. I create an XMLReader using SAX. I create a SaxHandler using JDOM which will receive the SAX events and build a JDOM tree as those events are fired. I plug the sax handler into the xmlReader.

Then, here is the real magic. I create a custom EntityResolver, and plug this into the xmlReader. No matter which XML parser you use (I was just using Sun's Crimson because it was built into JDK 1.4), when it needs to see a DTD, it will call your custom EntityResolver.

My custom EntityResolver, uses a static array of strings which are the filenames of the various OOo DTD files. When one of these is requested, I return an InputSource that simply wraps a java.io.InputStream object. The InputStream is returned from a single large DTD file embedded within my JAR file. I use the class loader to get an input stream from a file embedded within the JAR file. This is how I managed to package the entire program up as a single JAR file.

A more detailed explanation is included in the program's source code, in a README file.

I manually copied a bunch of source files out of my NetBeans project. I hope I got them all. I did not try to compile the source that I put here to see if anything was missing.
_________________
Want to make OOo Drawings like the colored flower design to the left?
Back to top
View user's profile Send private message
openmind
OOo Enthusiast
OOo Enthusiast


Joined: 28 Jun 2003
Posts: 106
Location: Switzerland

PostPosted: Sun Nov 30, 2003 6:35 am    Post subject: Reply with quote

Hi all,

i encounter a similar problem as you had. In my case the problem was solved more straightaway without implementing a EntityResolver:

Problem:
-------------------------
Parsing meta.xml for quering with XPath throws that Exception:

org.xml.sax.SAXParseException: Relative URI "office.dtd"; can not be resolved without a base URI


Solution:
---------------
Set the SystemId of the Sax InputSource

Code:

InputSource in = new InputSource(
            new FileInputStream( "/path/to/Meta/maybe/direct/from/zipstream/meta.xml" ) );

in.setSystemId( "file:///path/to/ooo/share/dtd/office-document/1.0/" );   



does the trick.

bye
Back to top
View user's profile Send private message
DannyB
Moderator
Moderator


Joined: 02 Apr 2003
Posts: 3991
Location: Lawrence, Kansas, USA

PostPosted: Sun Nov 30, 2003 9:06 am    Post subject: Reply with quote

You are providing a path to the OOo dtd's which are outside the program.

I needed a solution where I could incorporate the dtd within my jar file and distribute it. I did not want the user to need the dtd file, or even have OOo installed. Hence I used the entity resolver. It solved my problem.

Your solution is simpler if you can make the assumptions that I could not. (actually, end user requirements that I did not want to impose)
_________________
Want to make OOo Drawings like the colored flower design to the left?
Back to top
View user's profile Send private message
Namor
Guest





PostPosted: Wed Dec 03, 2003 4:45 am    Post subject: Reply with quote

Hello!

for everyone who isn't interested in validating the xml-files, there is a way to ignore the dtd before you parse it

Code:

    Document OOoDocument = null;
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    // Deactivating the validation of a OOo-document
    factory.setValidating(false);   
    factory.setNamespaceAware(true);
    try
    {
      DocumentBuilder builder = factory.newDocumentBuilder();
      // The following part prevent the loading of the dtd in the OOo-xml-document
      // found in:
      // http://forum.java.sun.com/thread.jsp?forum=34&thread=284209
      builder.setEntityResolver(new EntityResolver()
      {
        public InputSource resolveEntity(java.lang.String publicId, java.lang.String systemId)
        throws SAXException, java.io.IOException
        {
          if (publicId.equals("-//OpenOffice.org//DTD OfficeDocument 1.0//EN"))
          {
            return new InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
          }
          else
          {
            System.out.println(publicId.toString());
            return null;
          }
        }
      });
      // parsing the OOo-document
      OOoDocument = builder.parse(this.getOOoFileByName(name));


I hope that helps someone...

CU Roman
Back to top
smaath
Guest





PostPosted: Mon Feb 23, 2004 12:34 am    Post subject: Finding the dtd Reply with quote

Hello,

To find the dtd you can set the systemId to the path where the application can find the searched dtd by setting the systemId like this :

inputSource.setSystemId(Constants.OODTDPATH);

I hope it wil be helpful

Mathieu
Back to top
Guest






PostPosted: Fri Mar 26, 2004 4:16 am    Post subject: Reply with quote

openmind wrote:
Hi all,

i encounter a similar problem as you had. In my case the problem was solved more straightaway without implementing a EntityResolver:

Problem:
-------------------------
Parsing meta.xml for quering with XPath throws that Exception:

org.xml.sax.SAXParseException: Relative URI "office.dtd"; can not be resolved without a base URI


Solution:
---------------
Set the SystemId of the Sax InputSource

Code:

InputSource in = new InputSource(
            new FileInputStream( "/path/to/Meta/maybe/direct/from/zipstream/meta.xml" ) );

in.setSystemId( "file:///path/to/ooo/share/dtd/office-document/1.0/" );   



does the trick.

bye
Back to top
Guest






PostPosted: Thu Jun 24, 2004 8:55 pm    Post subject: Reply with quote

[quote="openmind"]Hi all,

i encounter a similar problem as you had. In my case the problem was solved more straightaway without implementing a EntityResolver:

Problem:
-------------------------
Parsing meta.xml for quering with XPath throws that Exception:

org.xml.sax.SAXParseException: Relative URI "office.dtd"; can not be resolved without a base URI


Solution:
---------------
Set the SystemId of the Sax InputSource

[code]
InputSource in = new InputSource(
new FileInputStream( "/path/to/Meta/maybe/direct/from/zipstream/meta.xml" ) );

in.setSystemId( "file:///path/to/ooo/share/dtd/office-document/1.0/" );

[/code]

does the trick.

bye[/quote]
Back to top
Piet
General User
General User


Joined: 22 Jan 2004
Posts: 6

PostPosted: Tue May 10, 2005 11:49 pm    Post subject: Re: resolveEntity not called Reply with quote

Hello all,
time to revitalize this thread.
I want to create drawings by programmatically generating/modifying ooDraw files and started to write a Java app that receives an ooo file, extracts it, and reads the different xml files. I stumbled over the same problem as some people before: the parser throws an exception saying that it can not resolve the relative URI of the DTD without an absolute URI. Since I want to skip the entire validation process, I followed the hint by Namor by using my own EntityResolver, but still got the same error. By placing some System.out.println-Statements inside my EntityResolver, I found that my resolveEntity-function is apparently never invoked. In the Java documentation (http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/EntityResolver.html), I found the following hint:
Frtom the interface description: "The XML reader will then allow the application to intercept any external entities (including the external DTD subset and external parameter entities, if any) before including them."
From the description of the resolveEntity-method:
"Allow the application to resolve external entities.
The Parser will call this method before opening any external entity except the top-level document entity (including the external DTD subset, external entities referenced within the DTD, and external entities referenced within the document element): the application may request that the parser resolve the entity itself, that it use an alternative URI, or that it use an entirely different input source."
So what does that mean? From the message description, I would assume that even if I implement my own EntityResolver, the parser will ignore it when it is handling the DTD declaration. That would explain my finding. But maybe this interpretation is wrong, and something else is going on?? In the original thread on forum.java.sun.com mentioned by Namor, someone asked for an explanation why simply setting DocumentBuilderFactory. newInstance().setValidating(false)" does not completely inhibit the validation process. I still feel a little unsatisfied about the question of how to skip the DTD validation when the DTD is not available. However,
InputSource in = new InputSource(
new FileInputStream( "/path/to/Meta/maybe/direct/from/zipstream/meta.xml" ) );
in.setSystemId( "file:///path/to/ooo/share/dtd/office-document/1.0/" );
worked for me, but the original plan (skip the entire validation process) is still unresolved.
Are there any more hints about this (very basic) topic?
Best regards
Piet
Back to top
View user's profile Send private message
Piet
General User
General User


Joined: 22 Jan 2004
Posts: 6

PostPosted: Sat Jul 02, 2005 1:23 am    Post subject: Re: Ignore DTD while parsing ooo documents with JAXP solved Reply with quote

Hello,
here is the solution I have been looking for. It is not perfect (since docs are generated without DTD declaration) but at least, I can now parse my ooorg documents. Please have a look at the following code:

import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import org.w3c.dom.*;
import org.xml.sax.*;

public class IgnoreDTDTester{
public final String XMLFILE = "<?xml version='1.0'?><!DOCTYPE root PUBLIC '-//OpenOffice.org//DTD OfficeDocument 1.0//EN' 'office.dtd'><_/>";
private DocumentBuilderFactory dbf;
private DocumentBuilder db;
private Document contentDoc;
private Node contentRoot;

public IgnoreDTDTester() throws java.io.IOException{
dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
dbf.setNamespaceAware(true);
try{
db = dbf.newDocumentBuilder();
db.setEntityResolver(new DTDIgnoringEntityResolver());
InputSource source = new InputSource(new ByteArrayInputStream(XMLFILE.getBytes()));
source.setSystemId("some arbitrary stuff");
this.contentDoc = db.parse(source);
contentRoot = contentDoc.getDocumentElement();
}
catch(SAXException saxe){
System.out.println("Error on parsing: "+saxe.getMessage());
}
catch (ParserConfigurationException pce){
System.out.println("Error on creating parser: "+pce.getMessage());
}
System.out.println("Content root: "+contentRoot.getNodeName());
}
public static void main(String[] ARGS) throws java.io.IOException{
IgnoreDTDTester dtdignorer = new IgnoreDTDTester();
}
}

class DTDIgnoringEntityResolver implements EntityResolver{
public DTDIgnoringEntityResolver(){
System.out.println("Setting entity resolver");
}
public InputSource resolveEntity(java.lang.String publicID, java.lang.String systemID) throws SAXException,IOException {
System.out.println("Public-ID: "+publicID.toString());
System.out.println("System-ID: "+systemID.toString());
return new InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
}
}
First, I didnīt have the bold line in my code at all. As a result, the resolveEntity-method of my EntityResolver was not called at all. However, after I explicitly set the SystemID of my InputSource, my resolveEntity method is called for some reason I have not yet understood. As an additional mystery, the resolveEntity-method reports the original SystemId from my DTD declaration though it looks like as I have explicitly overwritten it. Maybe these findings might be helpful for somebody having the same problem, and maybe somebody can explain this behaviour of the JAXP which, at this point, I donīt understand.
Best wishes
Piet
Back to top
View user's profile Send private message
elehenaff
Power User
Power User


Joined: 16 Apr 2004
Posts: 76
Location: paris - france

PostPosted: Fri Mar 17, 2006 6:39 am    Post subject: Reply with quote

another way to solve the problem is to create an empty file named office.dtd in the same folder of content.xml.
_________________
my box : ooo 2.4 french on winXPSP2
my home : ooo 3.0 beta on winXP
server : ooo 1.1.2 on a debian sarge box using ooo as a pdf generator for a web software
workstations : 150 windows XP workstations with ooo 2.4
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Macros and API All times are GMT - 8 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group