| View previous topic :: View next topic |
| Author |
Message |
9point9 Moderator

Joined: 31 Aug 2004 Posts: 3875 Location: UK
|
Posted: Sat Nov 19, 2005 11:46 am Post subject: Optimising OpenDocument file sizes |
|
|
As the OpenDocument format is simply a set of ZIP'd files it is easy enough to extract and modify the files then recompress it. This may result in smaller file sizes which while not an issue for local files, is very important for files on the Internet and being emailed. This is a test that I've done which may be of use:
Original file
I downloaded version 1.23 of the OOo 2.0 setup guide from CVS:
http://documentation.openoffice.org/source/browse/documentation/www/setup_guide2/2.x/en/Attic/
File size: 868141
And when uncompressed...
Contents size: 1111415
Recompressing on maximum
It's widely known that OOo does not compress with the "Maximum" setting so the simplest thing to do is to recompress the extracted contents with a ZIP utility setting it to maximum. Doing this with Ontrack Powerdesk 5.0 resulted in the following:
File size: 735189
That's a 15% reduction. A couple of notes on this:
1. Don't use Windows XP's built in compression tools
2. Don't allow your OS to insert thumbnails, indexing files or anything like that in (you may not see them as they may be hidden but they'll be compressed)
3. Remember to enable paths in the ZIP file
4. You must include everything!
Optimising images
Many images are stored in the OpenDocument pictures subdirectory in PNG format. OOo does not produce optimal PNG files (nothing does) and the file sizes can be optimised without loss of quality. This has been discussed recently:
http://www.oooforum.org/forum/viewtopic.phtml?p=106214#106214
What I tried with the original file was to run PNGCrush on the pictures and thumbnails subdirectories with the Open Document file:
http://pmt.sourceforge.net/pngcrush/
Running the DOS MMX build of PNGCrush with
| Code: | | pngcrush -d [directory] -brute *.png |
took half an hour on the old Pentium MMX that I was using. It could have taken a lot less on a modern machine and could have been run with quicker optimisation. Then again, file size is important here, not time!
The results were:
Uncompressed contents: 1057988
File size: 686523
That's 79% of the original size. It would be bigger with more images in the file.
Loss of compression when editing
As soon as you edit the file and save it, the ZIP compression is run at the standard OOo rate so the file gets bigger again. The PNG images aren't affected unless they were edited. Having done a Save As on the original file became:
File size: 815476
The uncompressed size is the same as after running PNGCrush.
Other ideas
The thumbnail is not essential but I'm not too sure if removing it would break OpenDocument standards. That's the last thing I'd want to do. Something to look in to further.
Some tags could be removed but this also might break OpenDocument. Again, something to look into.
Document versioning could be completely removed to save space.
PNGCrush (and of course zlib) are GPL'd so it would be possible to implement these optimisations in OOo. This could not be run on every save due to the extra time. It would have to be some kind of 'optimise file size' option in a menu. This would be very useful for people emailing and posting files on the web.
If anyone has any further file size optimisations particularly with different ZIP compression tools and PNG optimisers I would be interested.
Moderators: If you don't want this sticky I apologise. _________________ Arch Linux
OOo 3.2.0
OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/
Last edited by 9point9 on Tue Sep 11, 2007 12:33 pm; edited 1 time in total |
|
| Back to top |
|
 |
9point9 Moderator

Joined: 31 Aug 2004 Posts: 3875 Location: UK
|
Posted: Sun Nov 20, 2005 3:14 pm Post subject: |
|
|
As an improvement i tried rezipping under Linux from the command line with:
File size: 733832
I then optimised the images with pngcrush and recompressed under Linux to give the best file size so far.
File size: 686020
That's slightly smaller than before showing that zlib under Linux compresses best so far. I've also tried Ken Silverman's PNGout and kzip tools though neither has been better.
The JPG files are the biggest files at the moment so I'll have a go at them. _________________ Arch Linux
OOo 3.2.0
OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/ |
|
| Back to top |
|
 |
9point9 Moderator

Joined: 31 Aug 2004 Posts: 3875 Location: UK
|
Posted: Mon Nov 21, 2005 9:41 am Post subject: |
|
|
I've now used jpegoptim to reduce the size of the JPEG files losslessly. This isn't effective on all JPEG files but can give over 10% reduction on some.
I used it on the original file with:
This has further reduced the size of the file.
Uncompressed contents: 1015595
File size: 677642
That's a 21.943% reduction in file size with no loss of quality. _________________ Arch Linux
OOo 3.2.0
OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/ |
|
| Back to top |
|
 |
9point9 Moderator

Joined: 31 Aug 2004 Posts: 3875 Location: UK
|
|
| Back to top |
|
 |
oiaohm General User

Joined: 10 May 2005 Posts: 32
|
Posted: Mon Jan 23, 2006 10:41 pm Post subject: |
|
|
Under Linux or windows advancecomp to repack the Zip http://advancemame.sourceforge.net/comp-readme.html
It uses 7z better code for doing the compressing in side zip files than even zip -r -9.*
I don't know how advancecomp png compress compares to pngcrush.
I would love to see the deflate from 7-Zip implementation in openoffice it is far better. |
|
| Back to top |
|
 |
ace_dent General User

Joined: 09 Feb 2006 Posts: 6
|
Posted: Fri Feb 10, 2006 12:25 am Post subject: PNG optimization |
|
|
For some quite geeky information on crushing every last byte out of PNGs, I have written this guide (with batch scripts available). I noticed that 'pngcrush' was still being used and would recommend at least switching to the modern replacement OptiPNG.
Regards,
Andrew |
|
| Back to top |
|
 |
zero0w Power User


Joined: 05 Oct 2003 Posts: 58 Location: Hong Kong
|
Posted: Sun Mar 12, 2006 4:16 am Post subject: |
|
|
Curiously, I found that recently the file size of ODT files saved by OOo 2.0.2 is 40% smaller than that of OOo 1.9 m125 (2.0 Beta 2).
It looks like there are some optimization works going on between these versions. |
|
| Back to top |
|
 |
9point9 Moderator

Joined: 31 Aug 2004 Posts: 3875 Location: UK
|
Posted: Sun Mar 12, 2006 6:23 am Post subject: |
|
|
I've had a go at using optiPNG as suggested. This gets the draft setup guide file used previously as small as 669518 bytes. I've incorporated this into my shell script too, check the link in my siggy.
I can probably push it further by brute forcing the zlib compression window. Then it might take all day!
| Quote: | Curiously, I found that recently the file size of ODT files saved by OOo 2.0.2 is 40% smaller than that of OOo 1.9 m125 (2.0 Beta 2).
It looks like there are some optimization works going on between these versions. |
I've noticed some differences too but not very consistently. I had a presentation which seemed to change size quite a few times with different edits in different versions. _________________ Arch Linux
OOo 3.2.0
OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/ |
|
| Back to top |
|
 |
9point9 Moderator

Joined: 31 Aug 2004 Posts: 3875 Location: UK
|
Posted: Sun Mar 12, 2006 1:48 pm Post subject: |
|
|
AdvancedCOMP does give quite an improvement. Running advpng before optipng and advzip at the end gives the best compression so far. The setup guide file now compacts to 661082 bytes. That's a 23.874% reduction. _________________ Arch Linux
OOo 3.2.0
OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/ |
|
| Back to top |
|
 |
9point9 Moderator

Joined: 31 Aug 2004 Posts: 3875 Location: UK
|
Posted: Mon Mar 13, 2006 11:04 am Post subject: |
|
|
I've now used OptiPNG with different sizes of zlib window. I've used all 7 available between 512 and 32k. This means I've now got the setup file size down to 660810 bytes. That's a 23.905% reduction in size from the original.
The downside of this is time. It now takes 2 hours on this complex file. This is all for the purpose of experimentation though so that perhaps soemthing useful can come out of it in the long run. _________________ Arch Linux
OOo 3.2.0
OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/ |
|
| Back to top |
|
 |
9point9 Moderator

Joined: 31 Aug 2004 Posts: 3875 Location: UK
|
Posted: Wed Mar 15, 2006 1:36 pm Post subject: |
|
|
A major improvement that can be made is by using progressive compression in JPEG's. Progressive JPEG's are supported by any modern web browser, decent graphics pakcage or office suite will display them.
When you see an image on a web page incrementally increase in quality as the page loads, that is because it uses progressive compression. It takes more power but can be slightly smaller and gives the low quality image, increasing in quality which helps on web pages.
By default it seems like most JPEG's don't use progressive compression, I think because many output programs don't use it or don't use it by default. It is available in GIMP as an option for instance.
When converting a JPEG to use progressive compression, we can then use a non-progressive program to try and improve the size, this results in which ever method is smaller being used. I have seen ~3% coming off JPEG files this way.
By implementing non-progressive > progressive > non-progressive processing to JPEG's in the sample file, I have been able to knock off another 18k from the file size, losslessly again. The file size is now 642484 bytes. A 26.016% reduction compared to the original file. _________________ Arch Linux
OOo 3.2.0
OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/ |
|
| Back to top |
|
 |
ace_dent General User

Joined: 09 Feb 2006 Posts: 6
|
Posted: Tue Mar 21, 2006 4:08 pm Post subject: |
|
|
Re-saving a jpeg as you describe results in lossy compression. This accounts for the savings you are seeing. Progressive jpegs aren't saved by default in most programs, as there is extra file overhead for this. Although you will probably get a similar file size (normalk Vs progressive), the extra bytes taken means your image is lower quality. Try doing some visual compraisons.
Regards,
Andrew |
|
| Back to top |
|
 |
9point9 Moderator

Joined: 31 Aug 2004 Posts: 3875 Location: UK
|
Posted: Tue Mar 21, 2006 5:10 pm Post subject: |
|
|
| ace_dent wrote: | | Re-saving a jpeg as you describe results in lossy compression. |
To be precise, any JPEG operation is lossy as the format itself is lossy, hence any transform will be encoded differently.
| ace_dent wrote: | | This accounts for the savings you are seeing. Progressive jpegs aren't saved by default in most programs, as there is extra file overhead for this. Although you will probably get a similar file size (normalk Vs progressive), the extra bytes taken means your image is lower quality. |
No. The savings are because multiple rendering passes are required to decode the image, each taking a higher quality. Mathematically, it is easy to see that this will take less space. This is the only downside of progressive JPEG, more processor power is required. Some websites say this is a bad thing but then they tend to be over a decade old. Nowadays its insignificant. The file sizes are smaller for the same quality. A number of sources on JPEG suggest that progressive JPEG's are suitable for use on high quality
| ace_dent wrote: | | Try doing some visual compraisons. |
I have done. Even on screenshots (something that is borderline JPEG/PNG so should show artifacts better) I can not tell the difference in blind (not the best term here) tests.
I have thought of doing some statistical analysis of images processed in this way. I would expect to see far less deviation from the input image than the input image would have from the original lossless source. This is because I can pick out minor artifacts in the input image and the same are visible in the output image, in the same form. Artifacts in the input image would be significantly enhanced if progressive encoding was significantly lossy.
If you don't believe me, which of these is the original image? I did not create this image by the way so have no idea how the creator originally encoded it. Notice the file sizes are different too.
 _________________ Arch Linux
OOo 3.2.0
OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/ |
|
| Back to top |
|
 |
pitonyak Administrator


Joined: 09 Mar 2004 Posts: 3618 Location: Columbus, Ohio, USA
|
Posted: Sat Aug 05, 2006 12:18 pm Post subject: |
|
|
Luckily, spending more time creating better compression only affects the creation, and not the reading of the file (for speed). _________________ --
Andrew Pitonyak
http://www.pitonyak.org/oo.php |
|
| Back to top |
|
 |
9point9 Moderator

Joined: 31 Aug 2004 Posts: 3875 Location: UK
|
Posted: Sat Aug 05, 2006 1:43 pm Post subject: |
|
|
| pitonyak wrote: | | Luckily, spending more time creating better compression only affects the creation, and not the reading of the file (for speed). |
It can affect the read time by improving it. Processors and memory are fast, disk and connection speed are slow. I have used similar methods to cut 20Mb off of Nexuiz (an open source game, definitely one I'd recommend www.nexuiz.com) and it can have a positive affect upon loading time as the pk3 file (essentially a ZIP archive of game data) needs to be read from disk. _________________ Arch Linux
OOo 3.2.0
OOoSVN, change control for OOo documents:
http://sourceforge.net/projects/ooosvn/ |
|
| Back to top |
|
 |
|