OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

Help Required: donate your OpenDocument Text files...

 
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Writer
View previous topic :: View next topic  
Author Message
tjwood
Newbie
Newbie


Joined: 13 Nov 2005
Posts: 3

PostPosted: Mon Dec 18, 2006 10:07 am    Post subject: Help Required: donate your OpenDocument Text files... Reply with quote

Hello

As part of my project for my degree in Computer Science at the University of Bristol, England, I am developing an application to identify differences in formatting and content between two OpenDocument Text files. For instance, if you had two versions of a report or a letter (maybe you sent it to someone else and they made some changes and sent it back to you, or maybe you saved a backup copy of a file before you made some changes), my application would allow you to quickly identify the differences between the two files.

In order to develop my application so that it works well with “real world” documents, I need to build a diverse set of various OpenDocument Text (*.odt) files that I can use for testing. More specifically, I need pairs of files, one of which is a modified version of the other. The changes can be major or minor, in formatting, content, or both, but it's important that there is at least some similarity between the two files in each pair.

If you have any real versioned files that you wouldn't mind me using to test my application, I'd really appreciate it. Letters, memos, reports, notes, whatever – all I need is pairs of ODT files, one of which is a modified version of the other (so, for instance, the first and second drafts of something would be ideal). Please send any pairs of files you think would be suitable to:

oddiff@gmail.com *

Thank you so much for helping with this project.

Tom Wood
Computer Science undergraduate
University of Bristol, England

* Anything you do send may be used to test my programs and algorithms, and I may read the files, and let other people see them in connection with my project, but I won't post them on the Internet or include them in my report or otherwise make them public without your explicit permission, and you will retain the copyright of them. Obviously don't send anything that is private, confidential, commercially sensitive, etc. or anything illegal or that contains “adult” content. Also please don't send files larger than 1 MB in size. Anyone who sends a file that I chose to use will receive an acknowledgement in my project report if they give me their name (first name and surname). If you don't want to be acknowledged, please let me know.

Any questions, send me a PM or email the abve address.
Back to top
View user's profile Send private message
rotomano
OOo Enthusiast
OOo Enthusiast


Joined: 13 Dec 2006
Posts: 198
Location: Greece

PostPosted: Mon Dec 18, 2006 2:09 pm    Post subject: Reply with quote

can we send files in languages other than english???
Back to top
View user's profile Send private message
9point9
Moderator
Moderator


Joined: 31 Aug 2004
Posts: 3875
Location: UK

PostPosted: Tue Dec 19, 2006 1:08 am    Post subject: Reply with quote

If you want lots of files then a good idea would be to browse the CVS repository for documentation sources. There have been many revisions of this so you can take pairs. All authors should have commented their changes which may help you. Have a browse:
http://documentation.openoffice.org/source/browse/documentation/www/setup_guide2/2.x/en/

If you are doing a simple diff then you might be better off using a tool like xmllint to rewrite the XML in a more compact form first. Otherwise the whitespace will overcomplicate it. I've done some odd things with compacting OpenDocument that might give you some ideas:
http://www.oooforum.org/forum/viewtopic.phtml?t=27339&highlight=
_________________
Arch Linux
OOo 3.2.0

OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/
Back to top
View user's profile Send private message Visit poster's website
tjwood
Newbie
Newbie


Joined: 13 Nov 2005
Posts: 3

PostPosted: Wed Dec 20, 2006 2:36 am    Post subject: Reply with quote

I guess there's no reason why the documents have to be English.


Thanks 9point9, I was wondering where I could find some public repositiories containing ODT files Smile It will be a lot more advanced than a "simple" diff, which returns pretty much useless results on XML or other structured data. As a starting point it will compare on a paragraph-by-paragraph basis, but also hopefully detecting changes within the paragraphs and elsewhere in the structure of the document.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Writer All times are GMT - 8 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group