The online racing simulator
Encoding problem?
(14 posts, started )
Encoding problem?
Hi,

I have since I started programming always had one issue, wich is encoding of text...

Now once again I have the same problem with JSON.

I'm using the lfs hostprogress json data sheet (http://www.lfsworld.net/pubsta ... S.C+%95+Aston+Cruise+%231 ) to use real-time serverinfo in a .net program.

Tho I have one big issue, wich is that when I read the hostname, I can't get the "special" characters to work.

Json:
{"hostinfo":{"host":"^6WS^7.^3C^7 \u0095 Aston Cruise #1","host_parsed":"<font color=\"#00FFFF\">WS<\/font><font color=\"#FFFFFF\">.<\/font><font color=\"#FFFF00\">C<\/font><font color=\"#FFFFFF\"> • Aston Cruise #1<\/font>","host_stripped":"WS.C \u0095 Aston Cruise....

The "\u0095" should be a •

So well all that is clear to me. But then I want to use JSON.NET to read data from the json file:

JObject Jhostprogress = JObject.Parse(strings[1]); // strings is received with WebRequest & WebResponse
JObject Jhostinfo = (JObject)Jhostprogress["hostinfo"];
string hostname = (string)Jhostinfo["host"];

so now I would think hostname should work perfectly... But it doesn't it now just has a space on where the • should be.

Another thing i found is that
string testBull = "•";

isn't the same result as
string testBull = "\u0095";

So is there anyone who can help me fixing this silly issue..
"\u0095" is the unicode character, so it's not really a silly problem, it's just that the text is in a different encoding from the one you expect. See this essay on Unicode and this document on Unicode in .NET for more information.
I know that putting header etc on a website sometimes fixes it...

But maybe I'm just retarded or whatever, but I still don't know how to fix it in my program...

Tried:
[SIZE=2][COLOR=#2b91af][SIZE=2][COLOR=#2b91af]
Encoding[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2] targetEncoding;
[/SIZE][SIZE=2][COLOR=#0000ff][SIZE=2][COLOR=#0000ff]byte[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][] encodedChars;
[/SIZE][SIZE=2][COLOR=#008000][SIZE=2][COLOR=#008000]// Gets the encoding for the specified code page.
[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2]targetEncoding = [/SIZE][SIZE=2][COLOR=#2b91af][SIZE=2][COLOR=#2b91af]Encoding[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2].GetEncoding([/SIZE][SIZE=2][COLOR=#a31515][SIZE=2][COLOR=#a31515]"utf-32"[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2]);[/SIZE]
[SIZE=2]
[/SIZE][SIZE=2][COLOR=#008000][SIZE=2][COLOR=#008000]// Gets the byte representation of the specified string.
[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2]encodedChars = targetEncoding.GetBytes(strings[1]);[/SIZE][SIZE=2]
[/SIZE][SIZE=2][COLOR=#0000ff][SIZE=2][COLOR=#0000ff]string[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2] test = targetEncoding.GetString(encodedChars);
[/SIZE]

and i still get the same result...
On my insim ive got the same result...
if i maybe like to add an extra sign a Star as example it only shows weird stuff on insim code and on lfs to like: é!^°
or something else..
both of your problems are to do with :

READ text in some codepage
and then
DISPLAY it in another codepage

If the read and display codepages are not the same (eg. you read text in unicode but display it in iso-8859-1) then characters may not be displayed properly.
So if that's the case, then you'll always have to convert the incoming text to whatever format you use when displaying the text.

As for the insim case, you may be creating text in unicode and sending unicode text to lfs. It will not understand that. LFS only understands single byte codepages (see this codepages post for example, or an example conversion function in PHP)
Thanks now i understand it.

LFS got a nice,FAST and friendly support!!
Hey,

I've been pretty retarded, it seems to be that:


WebClient test = new WebClient();
byte[] data = test.DownloadData("http://www.lfsworld.net/pubstat/hostprogress.php?host=" + webname);
string tests = Encoding.GetEncoding("ISO-8859-1").GetString(data);

Just reads the special character as "\\u0095".. instead of "\u0095"..

So I took the easy and lame solution since I still havent found how to convert unicode to ... whatever c# forms use...

.Replace("\\u0095", "•")

Tho gona take a look on your converter class victor, perhaps that can help me out a bit more

Ps: Here is the result now:
Attached images
Naamloos.jpg
#8 - amp88
Quote from G. Dierckx :Just reads the special character as "\\u0095".. instead of "\u0095"..

The backslash in "\u0095" is a special character. Since it's a special character it needs to be "escaped" with another backslash. The extra backslash (the escape character) basically says to your PC "look out, here comes a special character". Try using .Replace("\\", "\") and see if that helps. Also, you should probably be using ReplaceAll rather than just Replace in case there are multiple special characters.
There is no ReplaceAll() in C#, instead the standard Replace() already replaces all occurrences
Just wondering if setting the encoding on the WebRequest stream would help, something like this. Bare in mind I've not tried this, just a random thought.

string url = "http://www.lfs.net/etc/";

HttpWebRequest request = null;
HttpWebResponse response = null;
StreamReader reader = null;
try
{
request = (HttpWebRequest)WebRequest.Create(url);
response = (HttpWebResponse)request.GetResponse();

// Create reader with specific encoding.
reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("utf-8"));

// Read stream.
string data = reader.ReadToEnd();
}
catch (Exception ex)
{
// Error.
}
finally
{
if (reader != null) reader.Close();
if (response != null) response.Close();
}

Quote from amp88 :The backslash in "\u0095" is a special character. Since it's a special character it needs to be "escaped" with another backslash. The extra backslash (the escape character) basically says to your PC "look out, here comes a special character". Try using .Replace("\\", "\") and see if that helps. Also, you should probably be using ReplaceAll rather than just Replace in case there are multiple special characters.

.Replace("\\", "\") doesn't work, it would just escape the " character...

Quote from DarkTimes :Just wondering if setting the encoding on the WebRequest stream would help, something like this. Bare in mind I've not tried this, just a random thought.

string url = "http://www.lfs.net/etc/";

HttpWebRequest request = null;
HttpWebResponse response = null;
StreamReader reader = null;
try
{
request = (HttpWebRequest)WebRequest.Create(url);
response = (HttpWebResponse)request.GetResponse();

// Create reader with specific encoding.
reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("utf-8"));

// Read stream.
string data = reader.ReadToEnd();
}
catch (Exception ex)
{
// Error.
}
finally
{
if (reader != null) reader.Close();
if (response != null) response.Close();
}


Tried, still the same result:
"{\"hostinfo\":{\"host\":\"^6WS^7.^4R^7 \\u0095 Training\",\...

Solution
This thread is over 1 year old but I have found a working .Net Solution for the bullet issue.

To Summarize the problem.

Data is received in iso-8859-1 format. This format has special characters that do not display correctly when displayed charset is utf-8 (subset of unicode I believe).

When converting from iso-8859-1 to utf-8 the characters end up not displaying correctly but are converted correctly.


Encoding iso8859 = Encoding.GetEncoding("iso-8859-1");
Encoding unicode = Encoding.Unicode;
byte[] srcTextBytes = iso8859.GetBytes(textToConvert);
byte[] destTextBytes = Encoding.Convert(iso8859,unicode, srcTextBytes);
char[] destChars = new char[unicode.GetCharCount(destTextBytes, 0, destTextBytes.Length)];
unicode.GetChars(destTextBytes, 0, destTextBytes.Length, destChars, 0);

This code will convert the bullet to unicode.

As pointed out by G. Dierckx:
Quote :ps: I also found something (wich isn't totally related to the problem) but the "bullet" in lfs..

I retrieve from hostprogress
unicode 0095 -> http://www.fileformat.info/info/unic...0095/index.htm

While a "real" bullet should be: 2022 -> http://www.fileformat.info/info/unic...2022/index.htm

u\0095 is one of the correct unicode characters for the bullet so is 149;

The html entity is &#149 (•)

Using a snippet from another board on manually converting special characters to html_entities I grabbed:

StringBuilder result = new StringBuilder(textToConvert.Length + (int)(textToConvert.Length * 0.1));

foreach (char c in destChars)
{
int value = Convert.ToInt32(c);
if (value > 127)
result.AppendFormat("&#{0};", value);
else
result.Append(c);
}

return result.ToString();

Which gave me this function:
publicstatic string iso8859ToUnicode(string textToConvert)
{
Encoding iso8859 = Encoding.GetEncoding("iso-8859-1");
Encoding unicode = Encoding.Unicode;
byte[] srcTextBytes = iso8859.GetBytes(textToConvert);
byte[] destTextBytes = Encoding.Convert(iso8859,unicode, srcTextBytes);
char[] destChars = new char[unicode.GetCharCount(destTextBytes, 0, destTextBytes.Length)];
unicode.GetChars(destTextBytes, 0, destTextBytes.Length, destChars, 0);

StringBuilder result = new StringBuilder(textToConvert.Length + (int)(textToConvert.Length * 0.1));

foreach (char c in destChars)
{
int value = Convert.ToInt32(c);
if (value > 127)
result.AppendFormat("&#{0};", value);
else
result.Append(c);
}

return result.ToString();
}

This successfully converted my • in iso-8859-1 to &#149 for displaying in utf-8.

I have not done extensive testing yet so I do not know if this will work for all cases.

Encoding problem?
(14 posts, started )
FGED GREDG RDFGDR GSFDG