OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

Optimising OpenDocument file sizes
Goto page 1, 2  Next
 
Post new topic   Reply to topic    OOoForum.org Forum Index -> General Discussion
View previous topic :: View next topic  
Author Message
9point9
Moderator
Moderator


Joined: 31 Aug 2004
Posts: 3875
Location: UK

PostPosted: Sat Nov 19, 2005 11:46 am    Post subject: Optimising OpenDocument file sizes Reply with quote

As the OpenDocument format is simply a set of ZIP'd files it is easy enough to extract and modify the files then recompress it. This may result in smaller file sizes which while not an issue for local files, is very important for files on the Internet and being emailed. This is a test that I've done which may be of use:

Original file
I downloaded version 1.23 of the OOo 2.0 setup guide from CVS:
http://documentation.openoffice.org/source/browse/documentation/www/setup_guide2/2.x/en/Attic/
File size: 868141
And when uncompressed...
Contents size: 1111415

Recompressing on maximum
It's widely known that OOo does not compress with the "Maximum" setting so the simplest thing to do is to recompress the extracted contents with a ZIP utility setting it to maximum. Doing this with Ontrack Powerdesk 5.0 resulted in the following:
File size: 735189

That's a 15% reduction. A couple of notes on this:
1. Don't use Windows XP's built in compression tools
2. Don't allow your OS to insert thumbnails, indexing files or anything like that in (you may not see them as they may be hidden but they'll be compressed)
3. Remember to enable paths in the ZIP file
4. You must include everything!

Optimising images
Many images are stored in the OpenDocument pictures subdirectory in PNG format. OOo does not produce optimal PNG files (nothing does) and the file sizes can be optimised without loss of quality. This has been discussed recently:
http://www.oooforum.org/forum/viewtopic.phtml?p=106214#106214
What I tried with the original file was to run PNGCrush on the pictures and thumbnails subdirectories with the Open Document file:
http://pmt.sourceforge.net/pngcrush/

Running the DOS MMX build of PNGCrush with
Code:
pngcrush -d [directory] -brute *.png

took half an hour on the old Pentium MMX that I was using. It could have taken a lot less on a modern machine and could have been run with quicker optimisation. Then again, file size is important here, not time!

The results were:
Uncompressed contents: 1057988
File size: 686523

That's 79% of the original size. It would be bigger with more images in the file.

Loss of compression when editing
As soon as you edit the file and save it, the ZIP compression is run at the standard OOo rate so the file gets bigger again. The PNG images aren't affected unless they were edited. Having done a Save As on the original file became:
File size: 815476
The uncompressed size is the same as after running PNGCrush.

Other ideas
The thumbnail is not essential but I'm not too sure if removing it would break OpenDocument standards. That's the last thing I'd want to do. Something to look in to further.
Some tags could be removed but this also might break OpenDocument. Again, something to look into.
Document versioning could be completely removed to save space.

PNGCrush (and of course zlib) are GPL'd so it would be possible to implement these optimisations in OOo. This could not be run on every save due to the extra time. It would have to be some kind of 'optimise file size' option in a menu. This would be very useful for people emailing and posting files on the web.

If anyone has any further file size optimisations particularly with different ZIP compression tools and PNG optimisers I would be interested.

Moderators: If you don't want this sticky I apologise.
_________________
Arch Linux
OOo 3.2.0

OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/


Last edited by 9point9 on Tue Sep 11, 2007 12:33 pm; edited 1 time in total
Back to top
View user's profile Send private message Visit poster's website
9point9
Moderator
Moderator


Joined: 31 Aug 2004
Posts: 3875
Location: UK

PostPosted: Sun Nov 20, 2005 3:14 pm    Post subject: Reply with quote

As an improvement i tried rezipping under Linux from the command line with:
Code:
zip -r -9 *

File size: 733832

I then optimised the images with pngcrush and recompressed under Linux to give the best file size so far.
File size: 686020

That's slightly smaller than before showing that zlib under Linux compresses best so far. I've also tried Ken Silverman's PNGout and kzip tools though neither has been better.

The JPG files are the biggest files at the moment so I'll have a go at them.
_________________
Arch Linux
OOo 3.2.0

OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/
Back to top
View user's profile Send private message Visit poster's website
9point9
Moderator
Moderator


Joined: 31 Aug 2004
Posts: 3875
Location: UK

PostPosted: Mon Nov 21, 2005 9:41 am    Post subject: Reply with quote

I've now used jpegoptim to reduce the size of the JPEG files losslessly. This isn't effective on all JPEG files but can give over 10% reduction on some.

I used it on the original file with:
Code:
jpegoptim *.jpg

This has further reduced the size of the file.
Uncompressed contents: 1015595
File size: 677642

That's a 21.943% reduction in file size with no loss of quality.
_________________
Arch Linux
OOo 3.2.0

OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/
Back to top
View user's profile Send private message Visit poster's website
9point9
Moderator
Moderator


Joined: 31 Aug 2004
Posts: 3875
Location: UK

PostPosted: Mon Nov 21, 2005 1:26 pm    Post subject: Reply with quote

I have now posted a Linux shell script that accomplishes all this in the Code Snippets Forum:
http://www.oooforum.org/forum/viewtopic.phtml?t=27452&highlight=

This gives the same output size: 677642

A DOS/Windows batch file to accomplish this would also be possible.
_________________
Arch Linux
OOo 3.2.0

OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/
Back to top
View user's profile Send private message Visit poster's website
oiaohm
General User
General User


Joined: 10 May 2005
Posts: 32

PostPosted: Mon Jan 23, 2006 10:41 pm    Post subject: Reply with quote

Under Linux or windows advancecomp to repack the Zip http://advancemame.sourceforge.net/comp-readme.html

It uses 7z better code for doing the compressing in side zip files than even zip -r -9.*

I don't know how advancecomp png compress compares to pngcrush.

I would love to see the deflate from 7-Zip implementation in openoffice it is far better.
Back to top
View user's profile Send private message
ace_dent
General User
General User


Joined: 09 Feb 2006
Posts: 6

PostPosted: Fri Feb 10, 2006 12:25 am    Post subject: PNG optimization Reply with quote

For some quite geeky information on crushing every last byte out of PNGs, I have written this guide (with batch scripts available). I noticed that 'pngcrush' was still being used and would recommend at least switching to the modern replacement OptiPNG.

Regards,
Andrew
Back to top
View user's profile Send private message
zero0w
Power User
Power User


Joined: 05 Oct 2003
Posts: 58
Location: Hong Kong

PostPosted: Sun Mar 12, 2006 4:16 am    Post subject: Reply with quote

Curiously, I found that recently the file size of ODT files saved by OOo 2.0.2 is 40% smaller than that of OOo 1.9 m125 (2.0 Beta 2).

It looks like there are some optimization works going on between these versions.
Back to top
View user's profile Send private message AIM Address
9point9
Moderator
Moderator


Joined: 31 Aug 2004
Posts: 3875
Location: UK

PostPosted: Sun Mar 12, 2006 6:23 am    Post subject: Reply with quote

I've had a go at using optiPNG as suggested. This gets the draft setup guide file used previously as small as 669518 bytes. I've incorporated this into my shell script too, check the link in my siggy.

I can probably push it further by brute forcing the zlib compression window. Then it might take all day!

Quote:
Curiously, I found that recently the file size of ODT files saved by OOo 2.0.2 is 40% smaller than that of OOo 1.9 m125 (2.0 Beta 2).

It looks like there are some optimization works going on between these versions.

I've noticed some differences too but not very consistently. I had a presentation which seemed to change size quite a few times with different edits in different versions.
_________________
Arch Linux
OOo 3.2.0

OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/
Back to top
View user's profile Send private message Visit poster's website
9point9
Moderator
Moderator


Joined: 31 Aug 2004
Posts: 3875
Location: UK

PostPosted: Sun Mar 12, 2006 1:48 pm    Post subject: Reply with quote

AdvancedCOMP does give quite an improvement. Running advpng before optipng and advzip at the end gives the best compression so far. The setup guide file now compacts to 661082 bytes. That's a 23.874% reduction.
_________________
Arch Linux
OOo 3.2.0

OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/
Back to top
View user's profile Send private message Visit poster's website
9point9
Moderator
Moderator


Joined: 31 Aug 2004
Posts: 3875
Location: UK

PostPosted: Mon Mar 13, 2006 11:04 am    Post subject: Reply with quote

I've now used OptiPNG with different sizes of zlib window. I've used all 7 available between 512 and 32k. This means I've now got the setup file size down to 660810 bytes. That's a 23.905% reduction in size from the original.

The downside of this is time. It now takes 2 hours on this complex file. This is all for the purpose of experimentation though so that perhaps soemthing useful can come out of it in the long run.
_________________
Arch Linux
OOo 3.2.0

OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/
Back to top
View user's profile Send private message Visit poster's website
9point9
Moderator
Moderator


Joined: 31 Aug 2004
Posts: 3875
Location: UK

PostPosted: Wed Mar 15, 2006 1:36 pm    Post subject: Reply with quote

A major improvement that can be made is by using progressive compression in JPEG's. Progressive JPEG's are supported by any modern web browser, decent graphics pakcage or office suite will display them.

When you see an image on a web page incrementally increase in quality as the page loads, that is because it uses progressive compression. It takes more power but can be slightly smaller and gives the low quality image, increasing in quality which helps on web pages.

By default it seems like most JPEG's don't use progressive compression, I think because many output programs don't use it or don't use it by default. It is available in GIMP as an option for instance.

When converting a JPEG to use progressive compression, we can then use a non-progressive program to try and improve the size, this results in which ever method is smaller being used. I have seen ~3% coming off JPEG files this way.

By implementing non-progressive > progressive > non-progressive processing to JPEG's in the sample file, I have been able to knock off another 18k from the file size, losslessly again. The file size is now 642484 bytes. A 26.016% reduction compared to the original file.
_________________
Arch Linux
OOo 3.2.0

OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/
Back to top
View user's profile Send private message Visit poster's website
ace_dent
General User
General User


Joined: 09 Feb 2006
Posts: 6

PostPosted: Tue Mar 21, 2006 4:08 pm    Post subject: Reply with quote

Re-saving a jpeg as you describe results in lossy compression. This accounts for the savings you are seeing. Progressive jpegs aren't saved by default in most programs, as there is extra file overhead for this. Although you will probably get a similar file size (normalk Vs progressive), the extra bytes taken means your image is lower quality. Try doing some visual compraisons.

Regards,
Andrew
Back to top
View user's profile Send private message
9point9
Moderator
Moderator


Joined: 31 Aug 2004
Posts: 3875
Location: UK

PostPosted: Tue Mar 21, 2006 5:10 pm    Post subject: Reply with quote

ace_dent wrote:
Re-saving a jpeg as you describe results in lossy compression.

To be precise, any JPEG operation is lossy as the format itself is lossy, hence any transform will be encoded differently.
ace_dent wrote:
This accounts for the savings you are seeing. Progressive jpegs aren't saved by default in most programs, as there is extra file overhead for this. Although you will probably get a similar file size (normalk Vs progressive), the extra bytes taken means your image is lower quality.

No. The savings are because multiple rendering passes are required to decode the image, each taking a higher quality. Mathematically, it is easy to see that this will take less space. This is the only downside of progressive JPEG, more processor power is required. Some websites say this is a bad thing but then they tend to be over a decade old. Nowadays its insignificant. The file sizes are smaller for the same quality. A number of sources on JPEG suggest that progressive JPEG's are suitable for use on high quality
ace_dent wrote:
Try doing some visual compraisons.

I have done. Even on screenshots (something that is borderline JPEG/PNG so should show artifacts better) I can not tell the difference in blind (not the best term here) tests.

I have thought of doing some statistical analysis of images processed in this way. I would expect to see far less deviation from the input image than the input image would have from the original lossless source. This is because I can pick out minor artifacts in the input image and the same are visible in the output image, in the same form. Artifacts in the input image would be significantly enhanced if progressive encoding was significantly lossy.

If you don't believe me, which of these is the original image? I did not create this image by the way so have no idea how the creator originally encoded it. Notice the file sizes are different too.




_________________
Arch Linux
OOo 3.2.0

OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/
Back to top
View user's profile Send private message Visit poster's website
pitonyak
Administrator
Administrator


Joined: 09 Mar 2004
Posts: 3655
Location: Columbus, Ohio, USA

PostPosted: Sat Aug 05, 2006 12:18 pm    Post subject: Reply with quote

Luckily, spending more time creating better compression only affects the creation, and not the reading of the file (for speed).
_________________
--
Andrew Pitonyak
http://www.pitonyak.org/oo.php
Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address
9point9
Moderator
Moderator


Joined: 31 Aug 2004
Posts: 3875
Location: UK

PostPosted: Sat Aug 05, 2006 1:43 pm    Post subject: Reply with quote

pitonyak wrote:
Luckily, spending more time creating better compression only affects the creation, and not the reading of the file (for speed).

It can affect the read time by improving it. Processors and memory are fast, disk and connection speed are slow. I have used similar methods to cut 20Mb off of Nexuiz (an open source game, definitely one I'd recommend www.nexuiz.com) and it can have a positive affect upon loading time as the pk3 file (essentially a ZIP archive of game data) needs to be read from disk.
_________________
Arch Linux
OOo 3.2.0

OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> General Discussion All times are GMT - 8 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group