LFS Forum - Insim String Encoding Help

#1 - Bass-Driver

Insim String Encoding Help

Sat 11 Nov 2017, 22:32

Hello programmers,

Since i have been started with updating LFSLapper insimlibrary, i want to solve old bugs.
One of them is a character bug.

This bug does not display the japanese characters when sending an string to LFS.
After some research on google, i found something that these Japanse characters works with Double Bytes.

After looking into the codepage sourcecode, i do not quite understand what the previous developer (Gai-Luron) has done with the code. And i have no idea how to edit the code to support Double Byte characters.

So my idea is to start from scratch and i want to ask you guys how to start with encoding strings in a proper way.

Just a small note: My C# experience is bad.

Thanks in advance

Attached images

#2 - K0Z3L_43V3R

Sun 12 Nov 2017, 14:02

Hey, I'm working on PHP insim tracker, here is my implementation of it:
https://bitbucket.org/K0Z3L43V3R/kingtracker-v2/src/10fd383a511ee47b35e532f5007bec612131d4b9/src/AppBundle/Component/LFSTranslator.php?at=master&fileviewer=file-view-default#LFSTranslator.php-97

Even if it's in PHP it should be similar in C#. Lets say you have some Japanese string in your application encoded in UTF-8. LFS is for Japanese expecting CP932 codepage. All you have to do is transliterate string from one encoding to another encoding and prefix it with "^J"

#3 - Bass-Driver

Sat 18 Nov 2017, 12:36

Oke thanks for the feedback, will look into it.

For someone who's is interested in the codepage source.
Sending a MTC packet looks like this.



<?php 
public byte[] MTC(int UCID, int PLID, string msg)
        {
            int msgLen = msg.Length > 127 ? 127 : msg.Length;

            byte[] packet = new byte[136];
            packet[0] = 136;
            packet[1] = (byte)TypePack.ISP_MTC;
            packet[2] = 0;
            packet[3] = 0;
            packet[4] = (byte)UCID;
            packet[5] = (byte)PLID;
            packet[6] = 0;
            packet[7] = 0;
            InSim.CodePage.GetBytes(msg, 0, System.Math.Min(127, msg.Length), packet, 8);
            return packet;
        }
?>

The codepage source can be found in the attachment below.

Attached files

#4 - Bass-Driver

Mon 27 Nov 2017, 21:20

Well , after some code editing, and i even have used the encoding code of InsimDotNet ( JUST FOR TESTING!!!). I still cannot manage to get the japanese characters working.

Ill explain what i already did.

First i have set a Japanese Char as my Nickname.

Then i type a command to display my nickname.
In Lapper you use the following function to send a private message to your self or someone else.
In this case i sent a message to myself.

privmsg(GetPlayerVar($userName,"NickName"));

In the lapperconsole i get the following output , within the yellow square:
The first line is the sended message right next after 'Message:'
I dont know, if the output '^J T' is correct.

As last i show you the code after i used InsimDotNet's Encoding code.

So if someone got an idea about to fix this. Feel free to post.

Thanks in advanced.

Attached images

#5 - FreeScirocco

Tue 28 Nov 2017, 00:25

Why don't you dump the content of the InSim packet from a working situation (InsimDotNet?) and dump the content of your project with a tcpdump program and compare the difference to see what you are doing wrong?

-

(expr) DELETED by expr

Tue 28 Nov 2017, 13:52

#6 - expr

Tue 28 Nov 2017, 16:39

Quote from Bass-Driver :In the lapperconsole i get the following output , within the yellow square:
The first line is the sended message right next after 'Message:'
I dont know, if the output '^J T' is correct.

I don't think it is. After the first two bytes 94, 74 (^J for japanese) should come bytes 129 and 153 for the encoded star – however, your message seems to have a terminator right after the first two.

LFS encodings are a messy business.. a product of its time, unfortunately. For example (on the decoding side), consider the following bytes (ASCII representations below where applicable):


case 1:  5e 4a 94 5e 4c cf cf cf cf
         ^  J     ^  L
(decoded: 膿Lﾏﾏﾏﾏ)

case 2:  5e 4a cf 5e 4c cf cf cf cf 
         ^  J     ^  L
(decoded: ﾏÏÏÏÏ)

The first two bytes on each, once again, mark that what follows is in the Japanese code page. However, only on the latter case do the bytes ^L mean shifting back to Windows-1252; in the first case the "control character" is actually a trailing byte of a multibyte character, and L is just a regular character after that (as would any other following characters be until next change of code page). Even though the last bytes are equal on both, they stand for different characters altogether in each case: you definitely need incremental decoding here to keep track of used code pages – now contrast this code page mess to the simple world of Unicode only Wave

Edit: speaking of difficulties; checking through my logs, I think even AIRIO managed to mangle someone's nick once.

#7 - Bass-Driver

Sat 2 Dec 2017, 13:36

sorry for the long respond.

I have tried to encode it first to UTF8 and Unicode and Windows-1251 and none of them works.
I also digged deeper into the sourcecode of lapper to encode the NickNames. Same result.

Have searched on the web for the Heximal and Decimal value of the "White Star". It came up with this.

//0x8199(HEX) 0x2606(DEC) #WHITE STAR

But i couldnt find out how to convert this into bytes. I might to think to difficult about this :S

I think '81' is one byte and '99' is the another one. But none of them are the same in as i can see in my LapperConsole.

Atm i have no clue what todo. Schwitz

See the last Packetcode below.

I know i'm new with this and you guys might think this i'm stupid. But this is a big learning process for me and i enjoy it. Big grin

Thanks in advanced.

Attached images

-

(FreeScirocco) DELETED by FreeScirocco : I was too quick, you are already doing this suggestion

Sat 2 Dec 2017, 17:06

#8 - FreeScirocco

Sat 2 Dec 2017, 17:31

OK.. going to take some risk. This is part of Airio code. Does it help?


{
    using System;
    using System.Collections;
    using System.Text;

    public static class Strings
    {
        private static Hashtable encs = new Hashtable();
        private static string pages = "LEBGCTJSKH";

        public static string CharsLFSToStandard(string lts)
        {
            if (string.IsNullOrEmpty(lts))
            {
                return "";
            }
            StringBuilder builder = new StringBuilder(lts);
            return builder.Replace("^l", "<").Replace("^r", ">").Replace("^v", "|").Replace("^s", "/").Replace("^d", @"\").Replace("^a", "*").Replace("^q", "?").Replace("^c", ":").Replace("^t", "\"").Replace("^h", "#").Replace("^^", "^").ToString();
        }

        public static string CharsStandardToLFS(string stl)
        {
            if (string.IsNullOrEmpty(stl))
            {
                return "";
            }
            StringBuilder builder = new StringBuilder(stl);
            return builder.Replace("^", "^^").Replace("<", "^l").Replace(">", "^r").Replace("|", "^v").Replace("/", "^s").Replace(@"\", "^d").Replace("*", "^a").Replace("?", "^q").Replace(":", "^c").Replace("\"", "^t").Replace("#", "^h").ToString();
        }

        public static byte[] GetBytes(string str)
        {
            if (encs.Count == 0)
            {
                InitEncodings();
            }
            byte[] destinationArray = new byte[str.Length * 4];
            byte[] sourceArray = new byte[0];
            char ch = 'L';
            Encoding encoding = (Encoding) encs[ch];
            int destinationIndex = 0;
            bool flag = false;
            int num2 = -1;
            char[] chars = str.ToCharArray();
            for (int i = 0; i < chars.Length; i++)
            {
                if (((chars[i] == '^') && (i < (chars.Length - 1))) && (chars[i + 1] == '8'))
                {
                    destinationArray[destinationIndex++] = (byte) chars[i++];
                    destinationArray[destinationIndex++] = (byte) chars[i];
                    ch = 'L';
                    encoding = (Encoding) encs[ch];
                }
                else
                {
                    try
                    {
                        sourceArray = encoding.GetBytes(chars, i, 1);
                    }
                    catch (EncoderFallbackException exception)
                    {
                        if (++num2 < pages.Length)
                        {
                            ch = pages[num2];
                            encoding = (Encoding) encs[ch];
                            flag = true;
                            i--;
                            goto Label_0169;
                        }
                        Helpers.OnStaticMessage("WARNING : Unknown character to encode - " + exception.CharUnknown);
                        destinationArray[destinationIndex++] = (byte) chars[i];
                    }
                    if (flag)
                    {
                        destinationArray[destinationIndex++] = 0x5e;
                        destinationArray[destinationIndex++] = (byte) ch;
                        flag = false;
                    }
                    Array.Copy(sourceArray, 0, destinationArray, destinationIndex, sourceArray.Length);
                    destinationIndex += sourceArray.Length;
                    num2 = -1;
                Label_0169:;
                }
            }
            Array.Resize<byte>(ref destinationArray, destinationIndex);
            return destinationArray;
        }

        public static byte[] GetRawBytes(string str)
        {
            if (encs.Count == 0)
            {
                InitEncodings();
            }
            byte[] array = new byte[str.Length * 4];
            int newSize = 0;
            for (int i = 0; i < str.Length; i++)
            {
                array[newSize++] = (byte) str[i];
            }
            Array.Resize<byte>(ref array, newSize);
            return array;
        }

        public static string GetRawString(byte[] pack)
        {
            if (encs.Count == 0)
            {
                InitEncodings();
            }
            StringBuilder builder = new StringBuilder("");
            for (int i = 0; i < pack.Length; i++)
            {
                if (pack[i] == 0)
                {
                    break;
                }
                builder.Append((char) pack[i]);
            }
            return builder.ToString();
        }

        public static string GetString(byte[] pack)
        {
            if (encs.Count == 0)
            {
                InitEncodings();
            }
            StringBuilder builder = new StringBuilder("");
            Encoding encoding = (Encoding) encs['E'];
            Encoding encoding2 = (Encoding) encs['L'];
            int index = 0;
            bool flag = false;
            for (int i = 0; i < pack.Length; i++)
            {
                if ((encoding != encoding2) || flag)
                {
                    index = i;
                    encoding = encoding2;
                    flag = false;
                }
                if ((pack[i] == 0x5e) && (i < (pack.Length - 1)))
                {
                    if (encs.ContainsKey((char) pack[i + 1]))
                    {
                        encoding2 = (Encoding) encs[(char) pack[i + 1]];
                        flag = true;
                    }
                    if (pack[i + 1] == 0x38)
                    {
                        encoding2 = (Encoding) encs['L'];
                    }
                    if (pack[i + 1] == 0x5e)
                    {
                        i++;
                    }
                }
                if (((encoding != encoding2) || (pack[i] == 0)) || ((i == (pack.Length - 1)) || flag))
                {
                    try
                    {
                        if ((encoding != encoding2) || flag)
                        {
                            builder.Append(encoding.GetChars(pack, index, i++ - index));
                        }
                        else
                        {
                            if (pack[i] == 0)
                            {
                                builder.Append(encoding2.GetChars(pack, index, i - index));
                                break;
                            }
                            builder.Append(encoding2.GetChars(pack, index, pack.Length - index));
                        }
                    }
                    catch (DecoderFallbackException exception)
                    {
                        Helpers.OnStaticMessage(string.Concat(new object[] { "WARNING : Unknown bytes to decode - ", exception.BytesUnknown[0], (exception.BytesUnknown.Length > 1) ? (" " + exception.BytesUnknown[1]) : "", (exception.BytesUnknown.Length > 2) ? (" " + exception.BytesUnknown[2]) : "", (exception.BytesUnknown.Length > 3) ? (" " + exception.BytesUnknown[3]) : "" }));
                        builder.Append('?');
                        index = (index + exception.Index) + exception.BytesUnknown.Length;
                        i = index - 1;
                    }
                }
            }
            return builder.ToString();
        }

        private static void InitEncodings()
        {
            encs.Add('L', Encoding.GetEncoding(0x4e4, new EncoderExceptionFallback(), new DecoderExceptionFallback()));
            encs.Add('E', Encoding.GetEncoding(0x4e2, new EncoderExceptionFallback(), new DecoderExceptionFallback()));
            try
            {
                encs.Add('B', Encoding.GetEncoding(0x6fbb, new EncoderExceptionFallback(), new DecoderExceptionFallback()));
            }
            catch
            {
                encs.Add('B', Encoding.GetEncoding(0x4e9, new EncoderExceptionFallback(), new DecoderExceptionFallback()));
                Helpers.OnStaticMessage("INFO : Using CP01257 instead of CP28603");
            }
            encs.Add('G', Encoding.GetEncoding(0x6fb5, new EncoderExceptionFallback(), new DecoderExceptionFallback()));
            encs.Add('C', Encoding.GetEncoding(0x4e3, new EncoderExceptionFallback(), new DecoderExceptionFallback()));
            encs.Add('T', Encoding.GetEncoding(0x6fb7, new EncoderExceptionFallback(), new DecoderExceptionFallback()));
            encs.Add('J', Encoding.GetEncoding(0x3a4, new EncoderExceptionFallback(), new DecoderExceptionFallback()));
            encs.Add('S', Encoding.GetEncoding(0x3a8, new EncoderExceptionFallback(), new DecoderExceptionFallback()));
            encs.Add('K', Encoding.GetEncoding(0x3b5, new EncoderExceptionFallback(), new DecoderExceptionFallback()));
            encs.Add('H', Encoding.GetEncoding(950, new EncoderExceptionFallback(), new DecoderExceptionFallback()));
        }

        public static string RawToEncoded(string url)
        {
            StringBuilder builder = new StringBuilder("");
            for (int i = 0; i < url.Length; i++)
            {
                if (((byte) url[i]) > 0x7f)
                {
                    builder.Append('%' + ((byte) url[i]).ToString("X"));
                }
                else
                {
                    char ch = url[i];
                    builder.Append(Uri.EscapeDataString(ch.ToString()));
                }
            }
            return builder.ToString();
        }

        public static string RawToUTF(string raw)
        {
            return GetString(GetRawBytes(raw)).Trim();
        }

        public static string RemoveColor(string ncl)
        {
            if (string.IsNullOrEmpty(ncl))
            {
                return "";
            }
            StringBuilder builder = new StringBuilder(ncl);
            return builder.Replace("^0", "").Replace("^1", "").Replace("^2", "").Replace("^3", "").Replace("^4", "").Replace("^5", "").Replace("^6", "").Replace("^7", "").Replace("^8", "").Replace("^9", "").ToString();
        }

        public static string UTFToRaw(string utf)
        {
            return GetRawString(GetBytes(utf.Trim()));
        }
    }
}

#9 - expr

Sat 2 Dec 2017, 23:28

Quote from Bass-Driver :Have searched on the web for the Heximal and Decimal value of the "White Star". It came up with this.

//0x8199(HEX) 0x2606(DEC) #WHITE STAR

But i couldnt find out how to convert this into bytes. I might to think to difficult about this :S

I think '81' is one byte and '99' is the another one. But none of them are the same in as i can see in my LapperConsole.

Actually you already have the latter there (but you're not seeing it as you print the byte values in decimal, whereas your search results have them in hexadecimal. (0x99 = 153) Your message is mangled for some reason, though.

LFS encoded message for the white star would be as follows (4 bytes):


0x5e 0x4a 0x81 0x99

The first two bytes (in ASCII, "^J") once again tell that following sequence – in this case the remaining 2 bytes – is in Japanese code page. (I had a typo in my earlier post regarding the byte value of 'J')

#10 - Bass-Driver

Sun 3 Dec 2017, 12:32

oke thanks,
But the weird thing is that i get those "Weird Bytes" before its encoded.

So the problem may be deeper, if i'm right.
So much fun Tongue

So what i think it goes wrong with getting data from the lapper scripts.
or saving the Nicknames of a new connection
or must go wrong with creating the package.



<?php 
#Normal plain text from the sourcecode
SendMsgToConnection(currInfoPlayer.UCID, "Blah Blah");

#Get Player info that is saved. 
SendMsgToConnection(currInfoPlayer.UCID, currInfoPlayer.nickName);

void SendMsgToConnection(int UCID, string msg)
        {
            if (msg.Length > 0)
            {
                if (UCID != -1)
                {
                    byte[] outMsg = myEncoder.MTC(UCID, 0, msg);
                    insimConnection.Send(outMsg, outMsg.Length);
                }
            }
        }
?>

What i want todo now ism a quick test with the sourcecode. But i need to know is, how to add a 'White Star' as text. Because i cannot work with bytes than. it must look like normal text.

Thanks

#11 - K0Z3L_43V3R

Sun 3 Dec 2017, 14:35

Can't tell from screenshots, are you calculating packet length correctly and adding last zero bytes?

From docs:
byte Size = 8 + TEXT_SIZE (TEXT_SIZE = 4, 8, 12... 128)
char Text[TEXT_SIZE]; // up to 128 characters of text - last byte must be zero

so minimal packet data (16 bytes) will looks like this (^J☆) in hex:
10 0e 00 00 ff 00 00 00 5e 4a 81 99 00 00 00 00

Let computers do the hard work with bytes Smile


Encoding codepage = Encoding.GetEncoding(932);
Encoding unicode = Encoding.Unicode;

byte[] unicodeBytes = Encoding.Unicode.GetBytes("^J☆");
byte[] codepageBytes = Encoding.Convert(unicode, codepage, unicodeBytes);

#12 - Bass-Driver

Tue 5 Dec 2017, 18:11

Well after some test it must be a problem between the scripts and the sourcecode of lapper.

I'm still using the Packetcode from post #7 of this topic.

=====================================================
TEST 01:
=====================================================
I created a new function and sended a MTC packet with "^J☆" as text, and that works how it should be.
see the code below.

In the Lapperscript i use 'testfunction();' as command.



<?php 
public void TestFunction(GLScript.unionVal val, ArrayList args)
        {
            infoPlayer currInfoPlayer = newCfg.getCurrInfoPlayer();
            string ident = val.nameFunction;
            string text = "^J☆";

            SendMsgToConnection(currInfoPlayer.UCID, text);
            //SendMsg(args[0].ToString());
        }

void SendMsgToConnection(int UCID, string msg)
        {
            if (msg.Length > 0)
            {
                if (UCID != -1)
                {
                    byte[] outMsg = myEncoder.MTC(UCID, 0, msg);
                    insimConnection.Send(outMsg, outMsg.Length);
                }
            }
        }
?>

=====================================================
TEST 02:
=====================================================

So the next test because the problem must be somewhere else.

When i copy/paste a 'white star' into the script. I get a weird "HOP" text. And when sending the message, i receive a Square and a Questionmark. The Script is converted to UTF-8.

And i use in the sourcecode the next line as a string.
string text = args[0].ToString();

=====================================================
TEST 03:
=====================================================

In the script i used a parameter to be send to the testfunction.

ingame i type: !tf ☆
testfunction($argv);

$argv = ☆

but now it doesnt work at all , like in the beginning.

in the console it looks like this:

So, could it be a convertproblem between script UTF-8 to the sourcecode (where can i find this)??
Or should i convert the script to a another encodingpage.

Attached images