
PQ Torus Knots
Posted Thursday, 6 December, 2007 - 22:19 by Inertia inI had the idea for this when Fiddler mentioned triangle-strips with vertex caches. For procedural generated geometry a triangle strip has some beneficial properties that can be used to examine the Vertex Cache's behaviour, which is quite problematic with indexed triangle-lists.
Because of the attributes used when building the Torusknot, it can be used in Profiling to help identify the size of the graphic-card's Vertex Cache. To my knowledge there is no GL variable one can query to retrieve the Cache size (however there exists a DX tool http://www.clootie.ru/delphi/dxtools.html ), this functionality could be handy for a Setup/Config Tool for a main Application, that can examine the graphics card (and maybe convert all Meshes to ideal cache-layout for the current Client).
Possible Vertex Cache sizes between 8-50 are examined, and the results are stored in the Logs/ Folder (which the app will create when missing). No manipulation to the system is done except (inside) this Folder.
This program does nothing but generating a Torusknot from given parameters and offer 3 modes (Interactive, Turntable and Profiling) to put the mesh to some use and was written in 2 days from scratch, in the processes of going through the OpenTK Examples and experimenting with some stuff.
After some alpha testing, the profiling produces rather useful results now. However some graphic cards produce weird results, which could be connected to multiple parallel Vertex-Processing-Units inserting Vertices into the Cache.
Also Problematic are measurings like this. The examined graphics card has a cache of 24 and 12 would be perfect with the given tri-strip layout.
11 Vertices per Ring. 3,796ms averaged per draw.
12 Vertices per Ring. 3,821ms averaged per draw.
13 Vertices per Ring. 6,515ms averaged per draw.
To get a more clear picture about this problem, I need Your help. Please run the application and do a profiling run by pressing "P". This will take a few seconds, then a new file is created in the Logs/ Folder. If you want, you can run multiple tests with different P/Q or disabling Texture2D, however a single benchmark from the default settings will be perfectly sufficient. Please attach or c&p that text file into this thread, this will take you less than 2 minutes, if you don't start toying around in interactive mode ;)
Make sure you have OpenTK.dll available to the app.
Thanks!
Edit: As promised, the source code. Only little documentation, most of it is trivial. Use at your own risk! :P
The Torusknot.cs class itself handles the mesh from generating Vertices and Triangles up to the VBO. A Torus Knot is specified like this
Create( uint pathsteps, uint shapevertices, float radius, int p, int q )
where pathsteps is the count how many Rings are in the Knot. shapevertices defines the number of Vertices per Ring.
The other files aren't really interesting and only included so you can build the app.
---------------------------------------------------------------------------
The included Solution was created with VC# Express 2008, in case you cannot load it:
Create a new project, console application.
-Add *.cs *.jpg from the compressed archive. Set the properties of logo-dark.jpg so that it'll be copied when building.
-Optionally add OpenTK.dll.
-Add System, System.Drawing and OpenTK as references.
| Attachment | Size |
|---|---|
| PQTorusKnots Source Code (OpenTK 0.9.0) | 37.83 KB |


Comments
Re: PQ Torus Knots
Radeon 3870: Cache Size 14
Edit: DXtool reports a cache size of 0 (wtf?)
Re: PQ Torus Knots
Re: PQ Torus Knots
Thanks for the log, did the laptop have any power-saving settings on or enforcing vsync? There's a low at 5 and 16 verts, and imho a geforce 7 should be able to process 100k vertices faster than a Geforce 5 (~15ms vs. ~6ms). Kinda suspicious :P
Re: PQ Torus Knots
Argh, I found the magic slider in nVidia settings :)
Re: PQ Torus Knots
Thank you :) Confusing results though, there's a low at 13 verts, but it's more likely that no vertex cache strategy is used at all (cache hits should roughly halve the draw time measured). If you got a spare minute, would you please try the DXTool (linked at initial post) and see if it can detect a cache size? Would be good to have a "2nd opinion" ;)
Re: PQ Torus Knots
program detected
size: 37
Re: PQ Torus Knots Bench.
After modifying for 1.9.1 (just renaming the namespace)
AND commenting out the OpenGL version detection (which yields false on Intel GMA950 which is specified as 1.4 + ARB_vertex_buffer + EXT_shadow_funcs extensions + TexEnv shader caching)
The application now runs, the object rotation seems smooth, yet the results seem horrible.
(I'm a bit bewildered as I see the timing results separated by either a fp, or a comma - usually used to separate thousands, moreover, I wonder how precise are my results)
Is there any way repeating the benchmark after some modification yield better results?
the two major thing that might be affecting is X and compiz (which might be better to have been left off)
It seems weird that the best timing is for the largest number of vertices.
- Darian.
==================
Profiling Log for Mesa DRI Intel(R) 945G 20061017 x86/MMX/SSE2 (1.3 Mesa 7.0.3-rc2)
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0.15 P: 6 Q: 1
3 Verts per Ring. 122.782ms averaged per draw.
4 Verts per Ring. 115.521ms averaged per draw.
5 Verts per Ring. 111.450ms averaged per draw.
6 Verts per Ring. 109.288ms averaged per draw.
7 Verts per Ring. 106.695ms averaged per draw.
8 Verts per Ring. 105.226ms averaged per draw.
9 Verts per Ring. 108.604ms averaged per draw.
10 Verts per Ring. 106.859ms averaged per draw.
11 Verts per Ring. 111.982ms averaged per draw.
12 Verts per Ring. 110.323ms averaged per draw.
13 Verts per Ring. 109.019ms averaged per draw.
14 Verts per Ring. 108.363ms averaged per draw.
15 Verts per Ring. 107.661ms averaged per draw.
16 Verts per Ring. 109.275ms averaged per draw.
17 Verts per Ring. 123.947ms averaged per draw.
18 Verts per Ring. 120.290ms averaged per draw.
19 Verts per Ring. 121.464ms averaged per draw.
20 Verts per Ring. 120.476ms averaged per draw.
21 Verts per Ring. 119.660ms averaged per draw.
22 Verts per Ring. 120.158ms averaged per draw.
23 Verts per Ring. 120.615ms averaged per draw.
24 Verts per Ring. 113.487ms averaged per draw.
25 Verts per Ring. 85.760ms averaged per draw.
Re: PQ Torus Knots
If I remember correctly, the GMA950 does not have a T&L engine (vertices are processed on the CPU), which might explain the results.
It is likely that compiz plays a role too.
Re: PQ Torus Knots
Since the application only runs at ~10 fps it's quite safe to assume that there's no hardware acceleration happening at all, the slightly decreasing time per frame can probably be explained by the backface culling mechanism being able to reject more and more faces.
The major problem with this application is that the timing of the draw call is quite precise and any process running in the background (such as network traffic or input events) do affect the result notably. So it's quite hard writing a reliable benchmark that can reliably output a useful result.
For my own use I'm just optimizing meshes for a vertex cache size of 12 right now, but I'm planning to write a function that implements a weightless algorithm at some point and compare the results.
I'm not working on this project anymore and it is now pretty much only a demo how to generate a mesh procedurally and stuff it into a VBO, should probably move it to the archives?
Re: PQ Torus Knots
Thanks you both Fiddler and Inertia,
I tried this as an example to test my GMA 950 VBO capabilities and see if it yields an error running undefined extensions.
I still have not answer regarding this issue, I don't want to go off topic on this, so I'll keep it short.
I need to display between 2000 to 10000 spheres and looking for the best way doing it. with an important culprit, that i need the updated vertex data back into the programs logic when i rotate/translate objects.
Tried this VBO example to get a hold of a proper method.
- Darian