Inertia's picture

PQ Torus Knots

I had the idea for this when Fiddler mentioned triangle-strips with vertex caches. For procedural generated geometry a triangle strip has some beneficial properties that can be used to examine the Vertex Cache's behaviour, which is quite problematic with indexed triangle-lists.

Because of the attributes used when building the Torusknot, it can be used in Profiling to help identify the size of the graphic-card's Vertex Cache. To my knowledge there is no GL variable one can query to retrieve the Cache size (however there exists a DX tool http://www.clootie.ru/delphi/dxtools.html ), this functionality could be handy for a Setup/Config Tool for a main Application, that can examine the graphics card (and maybe convert all Meshes to ideal cache-layout for the current Client).
Possible Vertex Cache sizes between 8-50 are examined, and the results are stored in the Logs/ Folder (which the app will create when missing). No manipulation to the system is done except (inside) this Folder.

This program does nothing but generating a Torusknot from given parameters and offer 3 modes (Interactive, Turntable and Profiling) to put the mesh to some use and was written in 2 days from scratch, in the processes of going through the OpenTK Examples and experimenting with some stuff.

After some alpha testing, the profiling produces rather useful results now. However some graphic cards produce weird results, which could be connected to multiple parallel Vertex-Processing-Units inserting Vertices into the Cache.

Also Problematic are measurings like this. The examined graphics card has a cache of 24 and 12 would be perfect with the given tri-strip layout.

11 Vertices per Ring. 3,796ms averaged per draw.
12 Vertices per Ring. 3,821ms averaged per draw.
13 Vertices per Ring. 6,515ms averaged per draw.

To get a more clear picture about this problem, I need Your help. Please run the application and do a profiling run by pressing "P". This will take a few seconds, then a new file is created in the Logs/ Folder. If you want, you can run multiple tests with different P/Q or disabling Texture2D, however a single benchmark from the default settings will be perfectly sufficient. Please attach or c&p that text file into this thread, this will take you less than 2 minutes, if you don't start toying around in interactive mode ;)

Make sure you have OpenTK.dll available to the app.

Thanks!

Edit: As promised, the source code. Only little documentation, most of it is trivial. Use at your own risk! :P

The Torusknot.cs class itself handles the mesh from generating Vertices and Triangles up to the VBO. A Torus Knot is specified like this
Create( uint pathsteps, uint shapevertices, float radius, int p, int q )
where pathsteps is the count how many Rings are in the Knot. shapevertices defines the number of Vertices per Ring.

The other files aren't really interesting and only included so you can build the app.

---------------------------------------------------------------------------

The included Solution was created with VC# Express 2008, in case you cannot load it:

Create a new project, console application.
-Add *.cs *.jpg from the compressed archive. Set the properties of logo-dark.jpg so that it'll be copied when building.
-Optionally add OpenTK.dll.
-Add System, System.Drawing and OpenTK as references.

AttachmentSize
PQTorusKnots Source Code (OpenTK 0.9.0)37.83 KB

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Inertia's picture

Please keep posting results, couldn't get my hands on any state-of-the-art Geforce 8 or Radeon 3xxx card so far and would really like to see the trends there. According to a paper (link?) there is supposed to be a barrier at a Vertex Cache size of 32, where the area of the mesh is so large that cache misses become inevitable since all neighbours inside the cache are already drawn. If the trend on high-end hardware is to neglect cache optimizations for parallel processing, it might be best to optimize meshes for a very low cache size (8-12) to increase the chance to get the vertex while it's still in the cache at all.

Again, please do a profiling run. It will take less than 2 minutes. Thank you!

Edit:

Profiling Log for GeForce 6800 GT/AGP/SSE2/3DNOW! (2.1.1)
 
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0,15 P: 6 Q: 1
 
4 Verts per Ring. 1,567ms averaged per draw.
5 Verts per Ring. 1,461ms averaged per draw.
6 Verts per Ring. 1,434ms averaged per draw.
7 Verts per Ring. 1,359ms averaged per draw.
8 Verts per Ring. 1,308ms averaged per draw.
9 Verts per Ring. 1,284ms averaged per draw.
10 Verts per Ring. 1,252ms averaged per draw.
11 Verts per Ring. 1,265ms averaged per draw.
12 Verts per Ring. 1,261ms averaged per draw.
13 Verts per Ring. 1,987ms averaged per draw.
14 Verts per Ring. 1,952ms averaged per draw.
15 Verts per Ring. 1,903ms averaged per draw.
16 Verts per Ring. 1,915ms averaged per draw.
17 Verts per Ring. 1,948ms averaged per draw.
18 Verts per Ring. 1,951ms averaged per draw.
19 Verts per Ring. 1,893ms averaged per draw.
20 Verts per Ring. 1,885ms averaged per draw.
21 Verts per Ring. 1,944ms averaged per draw.
22 Verts per Ring. 1,937ms averaged per draw.
23 Verts per Ring. 1,943ms averaged per draw.
24 Verts per Ring. 1,883ms averaged per draw.
25 Verts per Ring. 1,889ms averaged per draw.
Inertia's picture

:|

..and I thought it'd be quicker to get results over the net. Won't be my last err.
Do I have to wrap it into a Setup.exe and implement some function that e-mails me the results or what is the problem? It really takes less than 2 minutes if you have an IDE, .NET and OpenTK installed, so X-mas is not a valid excuse.

athiniar's picture
Profiling Log for Quadro NVS 135M/PCI/SSE2 (2.1.2)
 
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0,15 P: 6 Q: 1
 
4 Verts per Ring. 3,041ms averaged per draw.
5 Verts per Ring. 3,035ms averaged per draw.
6 Verts per Ring. 2,960ms averaged per draw.
7 Verts per Ring. 3,051ms averaged per draw.
8 Verts per Ring. 3,564ms averaged per draw.
9 Verts per Ring. 3,570ms averaged per draw.
10 Verts per Ring. 3,438ms averaged per draw.
11 Verts per Ring. 3,552ms averaged per draw.
12 Verts per Ring. 3,577ms averaged per draw.
13 Verts per Ring. 3,519ms averaged per draw.
14 Verts per Ring. 3,732ms averaged per draw.
15 Verts per Ring. 5,506ms averaged per draw.
16 Verts per Ring. 5,575ms averaged per draw.
17 Verts per Ring. 5,628ms averaged per draw.
18 Verts per Ring. 5,606ms averaged per draw.
19 Verts per Ring. 5,455ms averaged per draw.
20 Verts per Ring. 5,580ms averaged per draw.
21 Verts per Ring. 5,539ms averaged per draw.
22 Verts per Ring. 5,513ms averaged per draw.
23 Verts per Ring. 5,622ms averaged per draw.
24 Verts per Ring. 5,569ms averaged per draw.
25 Verts per Ring. 5,513ms averaged per draw.
Inertia's picture

Thanks hun! Very interesting results, thanks again for posting them :)

athiniar's picture

Here are the results with my second computer (if you still need them for your cooking!)

Profiling Log for RADEON 9600 SERIES x86/SSE2 (2.0.6645 WinXP Release)
 
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0,15 P: 6 Q: 1
 
4 Verts per Ring. 2,208ms averaged per draw.
5 Verts per Ring. 2,177ms averaged per draw.
6 Verts per Ring. 2,153ms averaged per draw.
7 Verts per Ring. 3,975ms averaged per draw.
8 Verts per Ring. 3,949ms averaged per draw.
9 Verts per Ring. 3,942ms averaged per draw.
10 Verts per Ring. 3,910ms averaged per draw.
11 Verts per Ring. 3,903ms averaged per draw.
12 Verts per Ring. 3,914ms averaged per draw.
13 Verts per Ring. 3,908ms averaged per draw.
14 Verts per Ring. 3,859ms averaged per draw.
15 Verts per Ring. 3,857ms averaged per draw.
16 Verts per Ring. 3,843ms averaged per draw.
17 Verts per Ring. 3,833ms averaged per draw.
18 Verts per Ring. 3,868ms averaged per draw.
19 Verts per Ring. 3,831ms averaged per draw.
20 Verts per Ring. 3,827ms averaged per draw.
21 Verts per Ring. 3,814ms averaged per draw.
22 Verts per Ring. 3,815ms averaged per draw.
23 Verts per Ring. 3,849ms averaged per draw.
24 Verts per Ring. 3,813ms averaged per draw.
25 Verts per Ring. 3,846ms averaged per draw.
Inertia's picture

Thanks again, very much appreciated :)

Every single result helps getting a better picture how the graphic cards are designed under the hood, keep them coming :D

objarni's picture

Here's mine! I just ran the .exe and pressed P, did not zoom or rotate or anything. As you know already, I have a lousy card, so don't be surprised by the numbers :).

It seems I don't have any vertex cache, or how should I interpret the result?

Profiling Log for GeForce 7300 SE/7200 GS/PCI/SSE2/3DNOW! (2.1.1)

Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0,15 P: 6 Q: 1

4 Verts per Ring. 6,038ms averaged per draw.
5 Verts per Ring. 6,084ms averaged per draw.
6 Verts per Ring. 6,379ms averaged per draw.
7 Verts per Ring. 6,241ms averaged per draw.
8 Verts per Ring. 6,226ms averaged per draw.
9 Verts per Ring. 6,243ms averaged per draw.
10 Verts per Ring. 6,252ms averaged per draw.
11 Verts per Ring. 6,226ms averaged per draw.
12 Verts per Ring. 6,217ms averaged per draw.
13 Verts per Ring. 6,205ms averaged per draw.
14 Verts per Ring. 6,185ms averaged per draw.
15 Verts per Ring. 6,185ms averaged per draw.
16 Verts per Ring. 6,189ms averaged per draw.
17 Verts per Ring. 6,190ms averaged per draw.
18 Verts per Ring. 6,179ms averaged per draw.
19 Verts per Ring. 6,185ms averaged per draw.
20 Verts per Ring. 6,172ms averaged per draw.
21 Verts per Ring. 6,194ms averaged per draw.
22 Verts per Ring. 6,183ms averaged per draw.
23 Verts per Ring. 6,157ms averaged per draw.
24 Verts per Ring. 6,150ms averaged per draw.
25 Verts per Ring. 6,149ms averaged per draw.

Inertia's picture

I'd say the cache size is 10, there's a low at 5 Verts and a high at 6 Verts. The slight difference between 4 and 5 Verts is related to timer accuracy, which is unfortunately a common problem.

Due to the TriangleStrip's zig-zag pattern there must be 2 complete Rings in the vertex cache to get the speed boost, thus the cache size is calculated by multiplying the number of "Verts per Ring" * 2.

Thank you for posting, all results help :)

Mincus's picture

Noticed you were asking for newer cards. Got an 8600M GT, so hope it's new enough! (Not sure if it makes a difference, but this was run under Vista.)

Profiling Log for GeForce 8600M GT/PCI/SSE2 (2.1.2)

Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0.15 P: 6 Q: 1

3 Verts per Ring. 1.593ms averaged per draw.
4 Verts per Ring. 1.465ms averaged per draw.
5 Verts per Ring. 1.420ms averaged per draw.
6 Verts per Ring. 1.403ms averaged per draw.
7 Verts per Ring. 1.387ms averaged per draw.
8 Verts per Ring. 1.375ms averaged per draw.
9 Verts per Ring. 1.377ms averaged per draw.
10 Verts per Ring. 1.359ms averaged per draw.
11 Verts per Ring. 1.342ms averaged per draw.
12 Verts per Ring. 1.370ms averaged per draw.
13 Verts per Ring. 1.352ms averaged per draw.
14 Verts per Ring. 1.354ms averaged per draw.
15 Verts per Ring. 1.308ms averaged per draw.
16 Verts per Ring. 1.346ms averaged per draw.
17 Verts per Ring. 1.336ms averaged per draw.
18 Verts per Ring. 1.296ms averaged per draw.
19 Verts per Ring. 1.342ms averaged per draw.
20 Verts per Ring. 1.346ms averaged per draw.
21 Verts per Ring. 1.345ms averaged per draw.
22 Verts per Ring. 1.349ms averaged per draw.
23 Verts per Ring. 1.339ms averaged per draw.
24 Verts per Ring. 1.332ms averaged per draw.
25 Verts per Ring. 1.344ms averaged per draw.

Inertia's picture

Thank you very much, this confirms my suspicion that state-of-the-art cards need a more expensive Vertex Shader program to give useful test results. From the log you posted the effective size could be either at 30 or 36 verts.
Will investigate and post an update to the app.