Shocker's picture

2byte character may causes a bug with GL.ShaderSource .

Project:The Open Toolkit library
Version:0.9.9-3
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:confirmed
Description

When I used a command --- GL.ShaderSource --- which receives a shader code including 2byte character , Program crashed .
Error Message is below .

System.Runtime.InteropServices.COMException (0x8007007A)
The data area passed to a system call is too small.

I did some experiments .
- I used a shader GUI , "RenderMonkey" and "Shader Maker" , and I wrote some shaders including 2 byte character . There was no problem .
- I tried adding some 2byte char comment into the shader in OpenTK.Examples - OpenGL - 3.0 - HelloGL3.cs (0.9.9-3) . the crash appeared .


Comments

the Fiddler's picture

#1

Thanks for the bug report.

Could you please attach a short test case that reproduces the issue? I don't have an IME-enabled OS and I haven't been able to reproduce this.

My guess is that the shader is being corrupted somewhere during the process of loading from disk, marshaling it through .Net and passing it to OpenGL. This could be something as "simple" as an encoding issue (encoding issues are never simple!)

Shocker's picture

#2

OK .
I attached a program which is based on HelloGL3.cs .
This has "vertexShaderSource" which includes 2byte character at line 3 .

....
        string vertexShaderSource = @"
#version 130
 
precision highp float;//2バイト文字 2byte Character
 
uniform mat4 projection_matrix;
uniform mat4 modelview_matrix;
 
in vec3 in_position;
in vec3 in_normal;
....
AttachmentSize
HelloGL3.cs8.01 KB
the Fiddler's picture

#3

Status:open» confirmed

Thanks, using your code I've managed to reproduce this. I had to change my "language for non-Unicode programs" to Japanese for the issue to appear.

Searching for solutions.

the Fiddler's picture

#4

Unfortunately, the only solution as far as I can see is to replace the .Net string marshaling code with our own. ShaderSource takes an array of strings and the framework doesn't allow us to alter the marshaling behavior in this case (it always treats them as an array of LPStr, that is "A pointer to a null-terminated array of ANSI characters.")

The manual marshaling solution would look like this:

  • change signatures of all string-receiving methods to byte*
  • provide string and string[] overloads that match the current signatures
  • use System.Text.Encoding inside those overloads to convert the strings into byte arrays with the correct encoding

This is too complex a change to implement inside the 1.0 timeframe but it can be implemented in time for 1.1. As a temporary workaround, you can use this code to remove multibyte characters from shaders:

        /// <summary>
        /// Converts the specified string to ASCII. Characters that cannot be represented in ASCII
        /// are converted to '?'.
        /// </summary>
        /// <param name="s">The string to convert.</param>
        /// <returns>A new string instance, containing only ASCII characters.</returns>
        public static string ConvertToAscii(string s)
        {
            return new string(Encoding.ASCII.GetChars(Encoding.ASCII.GetBytes(s)));
        }

This is a destructive change because you cannot get the original string back from the converted one. However, it allows you to use multibyte characters in comments - but not as variable names!

Edit: fixed html tags.

Shocker's picture

#5

Thank you for your much work .
I'm going to use the workaround .

I'm looking forward to a corrected version of OpenTK .

ebnf's picture

#6

Even if you replaced the marshaling code you'd still hit this issue as the OpenGL string format isn't defined to support other languages, just ANSI GLchar* to a null-terminated string.

I've seen some issues in older nVidia drivers with the newline characters being required to be the platform's newline characters and if it
wasn't you'd get some strange tokenization errors.

As a rule, all OpenGL input should be in the servers' preferred format.

So on x86 Windows it's little endian (LE), ANSI null terminated (cstr) with CR+LF (U+000D, U+000A) for newlines
On x86 Linux its LE, cstr, CR
On x86 Mac it's LE, cstr, CR
On PPC Mac it's BE, cstr, CR

the Fiddler's picture

#7

Makes sense. Then again, why does Rendermonkey when OpenTK crashes? Something else seems to be at play here.