OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

Forum has corrupted non-ascii characters

 
Post new topic   Reply to topic    OOoForum.org Forum Index -> Site Feedback
View previous topic :: View next topic  
Author Message
acknak
Moderator
Moderator


Joined: 13 Aug 2004
Posts: 4295
Location: ~ 40°N,75°W

PostPosted: Thu Nov 15, 2007 1:08 pm    Post subject: Forum has corrupted non-ascii characters Reply with quote

Something has munged my signature.

Suddenly all my old posts have a corrupted signature, where I had used U+00B0 DEGREE SIGN. The characters have now been munged to the nonsense byte sequence ef bf bd, encoding U+FFFD.

My signature is obviously of little importance, but I assume other non-ascii characters are botched as well, which could mean some more significant information loss from the forum archives.

PS: Oops, sorry. It seems that the problem was only in my profile. I fixed it there and that takes care of the obvious problem. A couple of spot checks shows that characters in the messages seems to be ok, so maybe it's just my signature itself that got fried.

Any ideas how the profile got corrupted?
Back to top
View user's profile Send private message
noranthon
Super User
Super User


Joined: 07 Jul 2005
Posts: 3318

PostPosted: Thu Nov 15, 2007 8:31 pm    Post subject: Reply with quote

Hi, acknak

Popped in to see whether I could find out something and read your post. Are you saying that your profile needed editing? The most likely explanation is that you bungled it yourself somehow, isn't it? It's your location you're talking about BTW, not your sig.

Another user reported a few months ago that a change to his sig did not stick. There have been other anomalies in the performance of the site. It might be just a case of overdue maintenance on an old database.
_________________
search forum by month
Back to top
View user's profile Send private message
acknak
Moderator
Moderator


Joined: 13 Aug 2004
Posts: 4295
Location: ~ 40°N,75°W

PostPosted: Thu Nov 15, 2007 11:54 pm    Post subject: Reply with quote

Oh, duh. Yes it's in my location text, not the sig. Thanks.

I'm quite sure I didn't change it, at least not by me visiting the profile and editing something. I first noticed it yesterday (15 Nov) at, oh about 2am EST. I was doing an OS upgrade and I thought at first that the problem was on my end.

After I realized what had happened, I just deleted the strange characters and re-entered the correct degree symbols.
Back to top
View user's profile Send private message
noranthon
Super User
Super User


Joined: 07 Jul 2005
Posts: 3318

PostPosted: Sun Nov 18, 2007 7:26 pm    Post subject: Reply with quote

I suppose the site may have changed to a different locale setting (one of those new-fangled UTF ones). No, you're saying the code changed somehow. One for the log under 'strange sightings'.
_________________
search forum by month
Back to top
View user's profile Send private message
draude
Administrator
Administrator


Joined: 10 Dec 2002
Posts: 353
Location: San Francisco

PostPosted: Tue Nov 20, 2007 1:25 am    Post subject: Reply with quote

Yeah, I'm not really sure what could have happened here. There shouldn't be any way to modify the profile of a user unless either the user or a mod/admin makes the change.

It may be that you originally entered the data using a different character set, which looked fine on your computer until you upgraded the OS. Perhaps the OS upgrade changed your default character set to something else.
Back to top
View user's profile Send private message Visit poster's website
acknak
Moderator
Moderator


Joined: 13 Aug 2004
Posts: 4295
Location: ~ 40°N,75°W

PostPosted: Tue Nov 20, 2007 8:28 am    Post subject: Reply with quote

Hmmm... well, has used Unicode as the default encoding for about the last 5 releases, so I don't think it's a character set issue.

One part may be a font thing: the default fonts look different to me; it may be that the new fonts display missing glyphs differently. I now see black diamonds with a question mark in the middle. That tends to catch my eye, so I may be noticing it more than I did before.

However, I don't see how something on my end would change the character data from the forum. Change the way it's displayed, sure, but change the data? I don't think so. Unfortunately, I changed my profile, so I can't go back and look now to try some different access paths.

I have seen the "missing glyph" character in several older threads, but I didn't stop to investigate.
Back to top
View user's profile Send private message
huwg
Super User
Super User


Joined: 14 Feb 2007
Posts: 890

PostPosted: Tue Nov 20, 2007 8:45 am    Post subject: Reply with quote

If it helps any, acknak, your profile has always shown the degrees symbol correctly for me.

I did notice this post the other day - Regular expression combine with conditional - showing squares for kanji, but I think that is because I don't have a suitable font installed.
Back to top
View user's profile Send private message
acknak
Moderator
Moderator


Joined: 13 Aug 2004
Posts: 4295
Location: ~ 40°N,75°W

PostPosted: Tue Nov 20, 2007 10:37 am    Post subject: Reply with quote

Ok, I'm not sure what this means, but here's one problem: the forum contains UTF-8 data, but it serves pages with the charset 8859-1:
Quote:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

If I look at my profile page with an application that follows what the HTML says the charset is, then it displays the degree symbols wrong.

The browser, OTOH, seems to look at the data stream and decide the charset setting is wrong and uses UTF-8 despite what the HTML says.

Some posts, e.g. this one: http://www.oooforum.org/forum/viewtopic.phtml?p=27881#27881 seem to use some other character set, probably whatever was on the posters system when the message was posted. For me, the apostrophes in that message show as black diamonds in my browser.
Back to top
View user's profile Send private message
draude
Administrator
Administrator


Joined: 10 Dec 2002
Posts: 353
Location: San Francisco

PostPosted: Tue Nov 20, 2007 1:52 pm    Post subject: Reply with quote

acknak wrote:
Ok, I'm not sure what this means, but here's one problem: the forum contains UTF-8 data, but it serves pages with the charset 8859-1

I believe your browser will use whatever the default character set is for your OS. I'm not sure of this but I believe this is the behavior that I see. On my debian desktop (which I'm using now), firefox defaults to UTF-8 because that's what I have as my OS default. The forum defaults to iso-8859-1 as identified in the META tag, as you note.

At some point, it makes sense to switch to UTF-8. That will probably happen with the next set of upgrades to the forum.

Ed
Back to top
View user's profile Send private message Visit poster's website
DrewJensen
Super User
Super User


Joined: 06 Jul 2005
Posts: 2616
Location: Cumberland, MD

PostPosted: Tue Nov 20, 2007 2:10 pm    Post subject: Reply with quote

hi draube,

Just as an interesting aside with the next version release of the forum software it almost mandates use of UTF-8 - both for the front end and the back

Example of the default output from the 3.0 release software

Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-gb" xml:lang="en-gb">
<head>

<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

_________________
Blog - http://baseanswers.spaces.live.com/
Back to top
View user's profile Send private message Send e-mail Visit poster's website
TerryE
Super User
Super User


Joined: 16 Jul 2006
Posts: 550
Location: UK

PostPosted: Tue Nov 27, 2007 3:17 pm    Post subject: Reply with quote

Draude, I suspect that some settings got changed when you moved migrated the site onto the new box. I checked up an old post of mine where I used an Arabic characters, and it is now being munged. You often get this where UTF-8 encoding is used within as page which does not force the charset to UTF-8 in the content meta tag. At the moment is set to charset=iso-8859-1

If we spilt out raw UTF-8 then we should force the charset. An alternative possibility is that this munging occurred during the export/import of the D/B during migration.
_________________
Terry
WinXPSP3, OOo 2.4.1, Ubunto 8.04 for development
Also try the Official OOo Community Forum where I mainly post now.
Back to top
View user's profile Send private message Visit poster's website
TerryE
Super User
Super User


Joined: 16 Jul 2006
Posts: 550
Location: UK

PostPosted: Mon Dec 10, 2007 9:53 am    Post subject: Reply with quote

Ed, I see that the forum still contains corrupted content after the migration. Have you got a back-up of the old database so that we can do an intelligent selective restore. For the general dialogue this is a bit of a nuisance, but for the code fragments this is really something that we should sort out.

If you want I would be willing to support you on this. //Terry
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> Site Feedback All times are GMT - 8 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group