Inertia's picture

PQ Torus Knots

I had the idea for this when Fiddler mentioned triangle-strips with vertex caches. For procedural generated geometry a triangle strip has some beneficial properties that can be used to examine the Vertex Cache's behaviour, which is quite problematic with indexed triangle-lists.

Because of the attributes used when building the Torusknot, it can be used in Profiling to help identify the size of the graphic-card's Vertex Cache. To my knowledge there is no GL variable one can query to retrieve the Cache size (however there exists a DX tool http://www.clootie.ru/delphi/dxtools.html ), this functionality could be handy for a Setup/Config Tool for a main Application, that can examine the graphics card (and maybe convert all Meshes to ideal cache-layout for the current Client).
Possible Vertex Cache sizes between 8-50 are examined, and the results are stored in the Logs/ Folder (which the app will create when missing). No manipulation to the system is done except (inside) this Folder.

This program does nothing but generating a Torusknot from given parameters and offer 3 modes (Interactive, Turntable and Profiling) to put the mesh to some use and was written in 2 days from scratch, in the processes of going through the OpenTK Examples and experimenting with some stuff.

After some alpha testing, the profiling produces rather useful results now. However some graphic cards produce weird results, which could be connected to multiple parallel Vertex-Processing-Units inserting Vertices into the Cache.

Also Problematic are measurings like this. The examined graphics card has a cache of 24 and 12 would be perfect with the given tri-strip layout.

11 Vertices per Ring. 3,796ms averaged per draw.
12 Vertices per Ring. 3,821ms averaged per draw.
13 Vertices per Ring. 6,515ms averaged per draw.

To get a more clear picture about this problem, I need Your help. Please run the application and do a profiling run by pressing "P". This will take a few seconds, then a new file is created in the Logs/ Folder. If you want, you can run multiple tests with different P/Q or disabling Texture2D, however a single benchmark from the default settings will be perfectly sufficient. Please attach or c&p that text file into this thread, this will take you less than 2 minutes, if you don't start toying around in interactive mode ;)

Make sure you have OpenTK.dll available to the app.

Thanks!

Edit: As promised, the source code. Only little documentation, most of it is trivial. Use at your own risk! :P

The Torusknot.cs class itself handles the mesh from generating Vertices and Triangles up to the VBO. A Torus Knot is specified like this
Create( uint pathsteps, uint shapevertices, float radius, int p, int q )
where pathsteps is the count how many Rings are in the Knot. shapevertices defines the number of Vertices per Ring.

The other files aren't really interesting and only included so you can build the app.

---------------------------------------------------------------------------

The included Solution was created with VC# Express 2008, in case you cannot load it:

Create a new project, console application.
-Add *.cs *.jpg from the compressed archive. Set the properties of logo-dark.jpg so that it'll be copied when building.
-Optionally add OpenTK.dll.
-Add System, System.Drawing and OpenTK as references.

AttachmentSize
PQTorusKnots Source Code (OpenTK 0.9.0)37.83 KB

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
the Fiddler.'s picture

Tested on Vista x64 with the following results:

Profiling Log for Radeon X1950 Pro (2.0.6956 Release)
 
Window Size 1400 x 1000
Max. allowed Verts: 100000 Radius: 0,15 P: 6 Q: 1
 
4 Verts per Ring. 1,727ms averaged per draw.
5 Verts per Ring. 1,626ms averaged per draw.
6 Verts per Ring. 1,598ms averaged per draw.
7 Verts per Ring. 1,812ms averaged per draw.
8 Verts per Ring. 1,745ms averaged per draw.
9 Verts per Ring. 1,736ms averaged per draw.
10 Verts per Ring. 1,719ms averaged per draw.
11 Verts per Ring. 1,728ms averaged per draw.
12 Verts per Ring. 1,713ms averaged per draw.
13 Verts per Ring. 1,767ms averaged per draw.
14 Verts per Ring. 1,715ms averaged per draw.
15 Verts per Ring. 1,704ms averaged per draw.
16 Verts per Ring. 1,712ms averaged per draw.
17 Verts per Ring. 1,711ms averaged per draw.
18 Verts per Ring. 1,700ms averaged per draw.
19 Verts per Ring. 1,766ms averaged per draw.
20 Verts per Ring. 1,690ms averaged per draw.
21 Verts per Ring. 1,695ms averaged per draw.
22 Verts per Ring. 1,695ms averaged per draw.
23 Verts per Ring. 1,687ms averaged per draw.
24 Verts per Ring. 1,704ms averaged per draw.
25 Verts per Ring. 1,752ms averaged per draw.

I'm not sure what to make of the results, they look rather random and further testing didn't show anything different. Any ideas?

I'd love to take a look at the source. Also, would you mind if I took some screenshots and used them as a favicon for this site?

Inertia's picture

You resized the window to fullscreen, which made the fillrate a limiting factor too. This is simply 1 Light with fixed function Gouraud and Texture mapping, unless I'm timing the GL.Finish(); wrong the time should reflect exactly the time it took to GL.DrawElements() the VBO. Overdraw and culling affect the result aswell, that's why the model isn't moving during profiling.
According to these results your vertex cache would be estimated as 12, which is unlikely true ;)

Use them in any way you like to, but be careful that you don't attract Sceners or there will be spikeballs all over the place ><

Stevo14's picture

My test results:

Profiling Log for MOBILITY RADEON X300 x86/SSE2 (2.0.5698 WinXP Release)
 
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0.15 P: 6 Q: 1
 
4 Verts per Ring. 3.604ms averaged per draw.
5 Verts per Ring. 3.564ms averaged per draw.
6 Verts per Ring. 3.550ms averaged per draw.
7 Verts per Ring. 5.961ms averaged per draw.
8 Verts per Ring. 5.893ms averaged per draw.
9 Verts per Ring. 5.867ms averaged per draw.
10 Verts per Ring. 5.841ms averaged per draw.
11 Verts per Ring. 5.817ms averaged per draw.
12 Verts per Ring. 5.911ms averaged per draw.
13 Verts per Ring. 5.790ms averaged per draw.
14 Verts per Ring. 5.778ms averaged per draw.
15 Verts per Ring. 5.765ms averaged per draw.
16 Verts per Ring. 5.756ms averaged per draw.
17 Verts per Ring. 5.749ms averaged per draw.
18 Verts per Ring. 5.740ms averaged per draw.
19 Verts per Ring. 5.741ms averaged per draw.
20 Verts per Ring. 5.729ms averaged per draw.
21 Verts per Ring. 5.723ms averaged per draw.
22 Verts per Ring. 5.718ms averaged per draw.
23 Verts per Ring. 5.721ms averaged per draw.
24 Verts per Ring. 5.711ms averaged per draw.
25 Verts per Ring. 5.708ms averaged per draw.

There seems to be a noticeable increase right at 7 vertices. Does this mean that my vertex cache is 6 vertices?

ps. first post in forums.

Inertia's picture

Welcome Stevo,

and thank you for posting the result. Like Fiddler's ATi card, your effective Vertex Cache would be 12 too.
This could mean that your true Vertex Cache size is 32, but 20 Vertex Units are inserting new Vertices parallel into the Cache, decreasing the effective size because of the new Vertices added. This is not a bad thing, especially when rendering objects that share very little or no Vertices (e.g Particle Systems) your graphics card will probably exceed any other card that relies on using the Vertex Cache.

I've looked this up, and it seems like ATi cards are using the same memory for L1 Texture Cache and Vertex Cache. Would you please make another profile run with Texture2D disabled? (Hotkey Q). Also make sure the driver does not enforce Anti-Alias/Anisotropy/Tru-form, especially the last could be responsible for these low values (insert new Vertices that aren't considered by the profiling).

Thanks!

the Fiddler.'s picture

Welcome Stevo14 :)

I reran the tests with the default window and, sure enough, the results became a little clearer.
With textures:

Profiling Log for Radeon X1950 Pro (2.0.6956 Release)
 
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0,15 P: 6 Q: 1
 
4 Verts per Ring. 1,038ms averaged per draw.
5 Verts per Ring. 1,002ms averaged per draw.
6 Verts per Ring. 1,035ms averaged per draw.
7 Verts per Ring. 1,268ms averaged per draw.
8 Verts per Ring. 1,267ms averaged per draw.
9 Verts per Ring. 1,264ms averaged per draw.
10 Verts per Ring. 1,248ms averaged per draw.
[...]

Without:

Profiling Log for Radeon X1950 Pro (2.0.6956 Release)
 
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0,15 P: 6 Q: 1
 
4 Verts per Ring. 0,809ms averaged per draw.
5 Verts per Ring. 0,760ms averaged per draw.
6 Verts per Ring. 0,735ms averaged per draw.
7 Verts per Ring. 0,895ms averaged per draw.
8 Verts per Ring. 0,898ms averaged per draw.
9 Verts per Ring. 0,895ms averaged per draw.
10 Verts per Ring. 0,899ms averaged per draw.
[...]

Disabling textures doesn't seem to affect the effective size of the cache. I'll run the test on a couple of nv40 and g70 cards, to have something to compare against.

Inertia's picture

Thank you, this clarifies at least the connection between vertex and texture cache. It seems like your card's effective cache is really 12, drawing a ring with 7 vertices is 120% the time compared to 6 verts.

If you take a look at the first benchmark, the result would be a vertex Cache of 10 though, while the second results in 12. I'm rather sure these discrepancies are related to the OS performing actions in the background while the app is running, a Diagnostics.Stopwatch has the resolution to be affected by this.

What I had in mind as a backup solution was binding an "expensive" vertex shader to draw the knot. This would be useless calculations that the compiler doesn't opt out, and should increase the cost of processing a vertex alot. So the ms/draw should increase stronger when no vertex cache hits are made.

Stevo14's picture

Looks like about the same result the second time with textures off:

Profiling Log for MOBILITY RADEON X300 x86/SSE2 (2.0.5698 WinXP Release)
 
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0.15 P: 6 Q: 1
 
4 Verts per Ring. 4.075ms averaged per draw.
5 Verts per Ring. 4.170ms averaged per draw.
6 Verts per Ring. 4.081ms averaged per draw.
7 Verts per Ring. 7.161ms averaged per draw.
8 Verts per Ring. 7.103ms averaged per draw.
9 Verts per Ring. 7.068ms averaged per draw.
10 Verts per Ring. 7.077ms averaged per draw.
11 Verts per Ring. 7.009ms averaged per draw.
12 Verts per Ring. 6.994ms averaged per draw.
13 Verts per Ring. 6.967ms averaged per draw.
14 Verts per Ring. 6.964ms averaged per draw.
15 Verts per Ring. 6.949ms averaged per draw.
16 Verts per Ring. 6.944ms averaged per draw.
17 Verts per Ring. 6.922ms averaged per draw.
18 Verts per Ring. 6.913ms averaged per draw.
19 Verts per Ring. 6.907ms averaged per draw.
20 Verts per Ring. 6.901ms averaged per draw.
21 Verts per Ring. 6.892ms averaged per draw.
22 Verts per Ring. 6.889ms averaged per draw.
23 Verts per Ring. 6.876ms averaged per draw.
24 Verts per Ring. 6.878ms averaged per draw.
25 Verts per Ring. 6.877ms averaged per draw.

I find it curious that it was slower this time with the textures off. Of course, it could just be the fact that something was running in the background slowing things down.

Inertia's picture

Ofcourse other Processes affect the results, although Thread priority is already set to highest. The Texture is trilinear filtered, and might have caused the Vertex Cache to make room for L1 Texture Cache, just wanted to verify that's not true.

Here's one benchmark clearly indicating a Vertex Cache Size of 24, at 12 rings (all vertices from the previous ring are free).

Profiling Log for GeForce FX 5600/AGP/SSE/3DNOW! (2.1.1)
 
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0,15 P: 6 Q: 1
 
3 Verts per Ring. 4,734ms averaged per draw.
4 Verts per Ring. 4,471ms averaged per draw.
5 Verts per Ring. 4,188ms averaged per draw.
6 Verts per Ring. 3,997ms averaged per draw.
7 Verts per Ring. 3,858ms averaged per draw.
8 Verts per Ring. 3,709ms averaged per draw.
9 Verts per Ring. 3,593ms averaged per draw.
10 Verts per Ring. 3,572ms averaged per draw.
11 Verts per Ring. 3,538ms averaged per draw.
12 Verts per Ring. 3,391ms averaged per draw.
13 Verts per Ring. 6,114ms averaged per draw.
14 Verts per Ring. 6,128ms averaged per draw.
15 Verts per Ring. 6,092ms averaged per draw.
16 Verts per Ring. 6,085ms averaged per draw.
17 Verts per Ring. 6,111ms averaged per draw.
18 Verts per Ring. 6,091ms averaged per draw.
19 Verts per Ring. 6,074ms averaged per draw.
20 Verts per Ring. 6,092ms averaged per draw.
21 Verts per Ring. 6,002ms averaged per draw.
22 Verts per Ring. 6,098ms averaged per draw.
23 Verts per Ring. 5,994ms averaged per draw.
24 Verts per Ring. 6,027ms averaged per draw.
25 Verts per Ring. 5,996ms averaged per draw.

Edit: Source Code added.

the Fiddler's picture

Thanks for the source. I will be running tests on a couple of other systems to see what comes up.

Ah, one small thing: you can register for KeyDown and KeyUp events in the GameWindow.Keyboard class, which can simplify the keyboard handling logic (if I understand how the KeyStrokeManager works). Documentation...

Inertia's picture

I had a couple of profiling runs on other people's laptops, but a Vertex Cache of 24 was the highest result so far. One Intel chipset only cached the last 8 vertices, i think that's the absolut minimum an OpenGL driver must provide? I'm also getting the suspicion that there may be no standard if the cache must have a FIFO or LRU logic to decide which entry gets discarded.

Well, I just had trouble with the repeating and wanted to get this done quickly. For a game this kind of behaviour from the input class is great, I didn't really look into the events as this was just a Quickstart template. I just built this app to get my mind off porting the MS3D Loader to OpenTK.Math, and kinda proving that the mathlib isn't the Problem factor ;)