Shocker's picture

2byte character may causes a bug with GL.ShaderSource .

Project:The Open Toolkit library
Version:1.1.0-stable
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:closed
Description

When I used a command --- GL.ShaderSource --- which receives a shader code including 2byte character , Program crashed .
Error Message is below .

System.Runtime.InteropServices.COMException (0x8007007A)
The data area passed to a system call is too small.

I did some experiments .
- I used a shader GUI , "RenderMonkey" and "Shader Maker" , and I wrote some shaders including 2 byte character . There was no problem .
- I tried adding some 2byte char comment into the shader in OpenTK.Examples - OpenGL - 3.0 - HelloGL3.cs (0.9.9-3) . the crash appeared .


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
the Fiddler's picture

#1

Thanks for the bug report.

Could you please attach a short test case that reproduces the issue? I don't have an IME-enabled OS and I haven't been able to reproduce this.

My guess is that the shader is being corrupted somewhere during the process of loading from disk, marshaling it through .Net and passing it to OpenGL. This could be something as "simple" as an encoding issue (encoding issues are never simple!)

Shocker's picture

#2

OK .
I attached a program which is based on HelloGL3.cs .
This has "vertexShaderSource" which includes 2byte character at line 3 .

....
        string vertexShaderSource = @"
#version 130
 
precision highp float;//2バイト文字 2byte Character
 
uniform mat4 projection_matrix;
uniform mat4 modelview_matrix;
 
in vec3 in_position;
in vec3 in_normal;
....
AttachmentSize
HelloGL3.cs8.01 KB
the Fiddler's picture

#3

Status:open» confirmed

Thanks, using your code I've managed to reproduce this. I had to change my "language for non-Unicode programs" to Japanese for the issue to appear.

Searching for solutions.

the Fiddler's picture

#4

Unfortunately, the only solution as far as I can see is to replace the .Net string marshaling code with our own. ShaderSource takes an array of strings and the framework doesn't allow us to alter the marshaling behavior in this case (it always treats them as an array of LPStr, that is "A pointer to a null-terminated array of ANSI characters.")

The manual marshaling solution would look like this:

  • change signatures of all string-receiving methods to byte*
  • provide string and string[] overloads that match the current signatures
  • use System.Text.Encoding inside those overloads to convert the strings into byte arrays with the correct encoding

This is too complex a change to implement inside the 1.0 timeframe but it can be implemented in time for 1.1. As a temporary workaround, you can use this code to remove multibyte characters from shaders:

        /// <summary>
        /// Converts the specified string to ASCII. Characters that cannot be represented in ASCII
        /// are converted to '?'.
        /// </summary>
        /// <param name="s">The string to convert.</param>
        /// <returns>A new string instance, containing only ASCII characters.</returns>
        public static string ConvertToAscii(string s)
        {
            return new string(Encoding.ASCII.GetChars(Encoding.ASCII.GetBytes(s)));
        }

This is a destructive change because you cannot get the original string back from the converted one. However, it allows you to use multibyte characters in comments - but not as variable names!

Edit: fixed html tags.

Shocker's picture

#5

Thank you for your much work .
I'm going to use the workaround .

I'm looking forward to a corrected version of OpenTK .

ebnf's picture

#6

Even if you replaced the marshaling code you'd still hit this issue as the OpenGL string format isn't defined to support other languages, just ANSI GLchar* to a null-terminated string.

I've seen some issues in older nVidia drivers with the newline characters being required to be the platform's newline characters and if it
wasn't you'd get some strange tokenization errors.

As a rule, all OpenGL input should be in the servers' preferred format.

So on x86 Windows it's little endian (LE), ANSI null terminated (cstr) with CR+LF (U+000D, U+000A) for newlines
On x86 Linux its LE, cstr, CR
On x86 Mac it's LE, cstr, CR
On PPC Mac it's BE, cstr, CR

the Fiddler's picture

#7

Makes sense. Then again, why does Rendermonkey when OpenTK crashes? Something else seems to be at play here.

ganaware's picture

#8

Version:0.9.9-3» 1.1.0-2013-12-15

Hi,

I encountered a similer problem yesterday.
The workaround in the comment #4 still worked.
Thanks.

The Problem

(How to attach a file? )
I used a similer source as #2 HelloGL3.cs.
It had a Japanese comment in the string vertexShaderSource .
And I added GL.GetShaderInfoLog(...) after GL.CompileShader(vertexShaderHandle);.

The log said that: (0) : error C0000: syntax error, unexpected $end, expecting ',' or ')' at token "<EOF>"

Why EOF?

The OpenTK passes vertexShaderSource.Length and a Marshal.StringToHGlobalAnsi()'ed vertexShaderSource to the glShaderSource() (is it true?).
But the each character in the Japanese comment in vertexShaderSource is 2bytes after marshalling (with Japanese ANSI codepage 932),
the actual buffer length of Marshal.StringToHGlobalAnsi() is 5bytes longer than vertexShaderSource.Length.
So the last 5characters is truncated, it produces the above error.

Encoding

In §3.1 of the OpenGL® Shading Language 4.40 specification:
"The source character set used for the OpenGL shading languages is Unicode in the UTF-8 encoding
scheme."
So I think, not ANSI but UTF-8 strings should be passed to glShaderSource().

the Fiddler's picture

#9

Khronos, in their infinite wisdom, has decided to change the semantics of ShaderSource to accept UTF8 characters in comments (not in code). This appears to have happened in OpenGL 4.2.

Sigh.

The good news is that OpenTK 1.1-beta3 switched to manual marshaling of strings, so we actually have a chance of fixing that. The bad news is that there's no way to marshal UTF8 strings without allocating memory.

The relevant code is here: https://github.com/opentk/opentk/blob/develop/Source/OpenTK/BindingsBase...

Edit: can you please upload the shader file at https://github.com/opentk/opentk/issues/18? Thanks

the Fiddler's picture

#10

Status:confirmed» in progress (review)

Potential fix here: https://github.com/opentk/opentk/commit/94b04c02ca14f8e6611e895163c09fd8...

Needs testing.