Inertia's picture

PQ Torus Knots

I had the idea for this when Fiddler mentioned triangle-strips with vertex caches. For procedural generated geometry a triangle strip has some beneficial properties that can be used to examine the Vertex Cache's behaviour, which is quite problematic with indexed triangle-lists.

Because of the attributes used when building the Torusknot, it can be used in Profiling to help identify the size of the graphic-card's Vertex Cache. To my knowledge there is no GL variable one can query to retrieve the Cache size (however there exists a DX tool http://www.clootie.ru/delphi/dxtools.html ), this functionality could be handy for a Setup/Config Tool for a main Application, that can examine the graphics card (and maybe convert all Meshes to ideal cache-layout for the current Client).
Possible Vertex Cache sizes between 8-50 are examined, and the results are stored in the Logs/ Folder (which the app will create when missing). No manipulation to the system is done except (inside) this Folder.

This program does nothing but generating a Torusknot from given parameters and offer 3 modes (Interactive, Turntable and Profiling) to put the mesh to some use and was written in 2 days from scratch, in the processes of going through the OpenTK Examples and experimenting with some stuff.

After some alpha testing, the profiling produces rather useful results now. However some graphic cards produce weird results, which could be connected to multiple parallel Vertex-Processing-Units inserting Vertices into the Cache.

Also Problematic are measurings like this. The examined graphics card has a cache of 24 and 12 would be perfect with the given tri-strip layout.

11 Vertices per Ring. 3,796ms averaged per draw.
12 Vertices per Ring. 3,821ms averaged per draw.
13 Vertices per Ring. 6,515ms averaged per draw.

To get a more clear picture about this problem, I need Your help. Please run the application and do a profiling run by pressing "P". This will take a few seconds, then a new file is created in the Logs/ Folder. If you want, you can run multiple tests with different P/Q or disabling Texture2D, however a single benchmark from the default settings will be perfectly sufficient. Please attach or c&p that text file into this thread, this will take you less than 2 minutes, if you don't start toying around in interactive mode ;)

Make sure you have OpenTK.dll available to the app.

Thanks!

Edit: As promised, the source code. Only little documentation, most of it is trivial. Use at your own risk! :P

The Torusknot.cs class itself handles the mesh from generating Vertices and Triangles up to the VBO. A Torus Knot is specified like this
Create( uint pathsteps, uint shapevertices, float radius, int p, int q )
where pathsteps is the count how many Rings are in the Knot. shapevertices defines the number of Vertices per Ring.

The other files aren't really interesting and only included so you can build the app.

---------------------------------------------------------------------------

The included Solution was created with VC# Express 2008, in case you cannot load it:

Create a new project, console application.
-Add *.cs *.jpg from the compressed archive. Set the properties of logo-dark.jpg so that it'll be copied when building.
-Optionally add OpenTK.dll.
-Add System, System.Drawing and OpenTK as references.

AttachmentSize
PQTorusKnots Source Code (OpenTK 0.9.0)37.83 KB

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Inertia's picture

Radeon 3870: Cache Size 14
Edit: DXtool reports a cache size of 0 (wtf?)

Profiling Log for ATI Radeon HD 3800 Series (2.1.7278 Release)
 
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0,15 P: 6 Q: 1
 
3 Verts per Ring. 0,410ms averaged per draw.
4 Verts per Ring. 0,382ms averaged per draw.
5 Verts per Ring. 0,367ms averaged per draw.
6 Verts per Ring. 0,354ms averaged per draw.
7 Verts per Ring. 0,347ms averaged per draw.
8 Verts per Ring. 0,666ms averaged per draw.
9 Verts per Ring. 0,619ms averaged per draw.
10 Verts per Ring. 0,657ms averaged per draw.
11 Verts per Ring. 0,619ms averaged per draw.
12 Verts per Ring. 0,651ms averaged per draw.
13 Verts per Ring. 0,619ms averaged per draw.
14 Verts per Ring. 0,646ms averaged per draw.
15 Verts per Ring. 0,619ms averaged per draw.
16 Verts per Ring. 0,643ms averaged per draw.
17 Verts per Ring. 0,619ms averaged per draw.
18 Verts per Ring. 0,640ms averaged per draw.
19 Verts per Ring. 0,619ms averaged per draw.
20 Verts per Ring. 0,638ms averaged per draw.
21 Verts per Ring. 0,619ms averaged per draw.
22 Verts per Ring. 0,636ms averaged per draw.
23 Verts per Ring. 0,619ms averaged per draw.
24 Verts per Ring. 0,635ms averaged per draw.
25 Verts per Ring. 0,619ms averaged per draw.
 
-----------------------------------------------------------------------------
 
Profiling Log for GeForce 8600 GTS/PCI/SSE2 (2.1.2)
 
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0,15 P: 6 Q: 1
 
3 Verts per Ring. 0,880ms averaged per draw.
4 Verts per Ring. 0,787ms averaged per draw.
5 Verts per Ring. 0,766ms averaged per draw.
6 Verts per Ring. 0,755ms averaged per draw.
7 Verts per Ring. 0,730ms averaged per draw.
8 Verts per Ring. 0,705ms averaged per draw.
9 Verts per Ring. 0,739ms averaged per draw.
10 Verts per Ring. 0,722ms averaged per draw.
11 Verts per Ring. 0,800ms averaged per draw.
12 Verts per Ring. 0,704ms averaged per draw.
13 Verts per Ring. 0,713ms averaged per draw.
14 Verts per Ring. 0,759ms averaged per draw.
15 Verts per Ring. 0,696ms averaged per draw.
16 Verts per Ring. 0,689ms averaged per draw.
17 Verts per Ring. 0,719ms averaged per draw.
18 Verts per Ring. 0,692ms averaged per draw.
19 Verts per Ring. 0,697ms averaged per draw.
20 Verts per Ring. 0,696ms averaged per draw.
21 Verts per Ring. 0,690ms averaged per draw.
22 Verts per Ring. 0,675ms averaged per draw.
23 Verts per Ring. 0,669ms averaged per draw.
24 Verts per Ring. 0,666ms averaged per draw.
25 Verts per Ring. 0,688ms averaged per draw.
lubos's picture
Profiling Log for GeForce Go 7600/PCI/SSE2/3DNOW! (2.0.1)
 
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0,15 P: 6 Q: 1
 
3 Verts per Ring. 16,292ms averaged per draw.
4 Verts per Ring. 15,455ms averaged per draw.
5 Verts per Ring. 13,625ms averaged per draw.
6 Verts per Ring. 15,465ms averaged per draw.
7 Verts per Ring. 16,302ms averaged per draw.
8 Verts per Ring. 15,392ms averaged per draw.
9 Verts per Ring. 16,109ms averaged per draw.
10 Verts per Ring. 15,453ms averaged per draw.
11 Verts per Ring. 15,468ms averaged per draw.
12 Verts per Ring. 15,418ms averaged per draw.
13 Verts per Ring. 15,420ms averaged per draw.
14 Verts per Ring. 16,005ms averaged per draw.
15 Verts per Ring. 15,114ms averaged per draw.
16 Verts per Ring. 14,231ms averaged per draw.
17 Verts per Ring. 16,006ms averaged per draw.
18 Verts per Ring. 15,147ms averaged per draw.
19 Verts per Ring. 15,312ms averaged per draw.
20 Verts per Ring. 15,171ms averaged per draw.
21 Verts per Ring. 16,028ms averaged per draw.
22 Verts per Ring. 15,171ms averaged per draw.
23 Verts per Ring. 16,043ms averaged per draw.
24 Verts per Ring. 15,159ms averaged per draw.
25 Verts per Ring. 15,927ms averaged per draw.
Inertia's picture

Thanks for the log, did the laptop have any power-saving settings on or enforcing vsync? There's a low at 5 and 16 verts, and imho a geforce 7 should be able to process 100k vertices faster than a Geforce 5 (~15ms vs. ~6ms). Kinda suspicious :P

lubos's picture

Argh, I found the magic slider in nVidia settings :)

Profiling Log for GeForce Go 7600/PCI/SSE2/3DNOW! (2.0.1)
 
Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0,15 P: 6 Q: 1
 
3 Verts per Ring. 1,594ms averaged per draw.
4 Verts per Ring. 1,574ms averaged per draw.
5 Verts per Ring. 1,569ms averaged per draw.
6 Verts per Ring. 1,544ms averaged per draw.
7 Verts per Ring. 1,541ms averaged per draw.
8 Verts per Ring. 1,533ms averaged per draw.
9 Verts per Ring. 1,528ms averaged per draw.
10 Verts per Ring. 1,539ms averaged per draw.
11 Verts per Ring. 1,544ms averaged per draw.
12 Verts per Ring. 1,526ms averaged per draw.
13 Verts per Ring. 1,519ms averaged per draw.
14 Verts per Ring. 1,626ms averaged per draw.
15 Verts per Ring. 1,617ms averaged per draw.
16 Verts per Ring. 1,668ms averaged per draw.
17 Verts per Ring. 1,736ms averaged per draw.
18 Verts per Ring. 1,698ms averaged per draw.
19 Verts per Ring. 1,624ms averaged per draw.
20 Verts per Ring. 1,621ms averaged per draw.
21 Verts per Ring. 1,628ms averaged per draw.
22 Verts per Ring. 1,612ms averaged per draw.
23 Verts per Ring. 1,577ms averaged per draw.
24 Verts per Ring. 1,558ms averaged per draw.
25 Verts per Ring. 1,624ms averaged per draw.
Inertia's picture

Thank you :) Confusing results though, there's a low at 13 verts, but it's more likely that no vertex cache strategy is used at all (cache hits should roughly halve the draw time measured). If you got a spare minute, would you please try the DXTool (linked at initial post) and see if it can detect a cache size? Would be good to have a "2nd opinion" ;)

lubos's picture

program detected
size: 37

Darian's picture

After modifying for 1.9.1 (just renaming the namespace)
AND commenting out the OpenGL version detection (which yields false on Intel GMA950 which is specified as 1.4 + ARB_vertex_buffer + EXT_shadow_funcs extensions + TexEnv shader caching)

The application now runs, the object rotation seems smooth, yet the results seem horrible.
(I'm a bit bewildered as I see the timing results separated by either a fp, or a comma - usually used to separate thousands, moreover, I wonder how precise are my results)

Is there any way repeating the benchmark after some modification yield better results?
the two major thing that might be affecting is X and compiz (which might be better to have been left off)

It seems weird that the best timing is for the largest number of vertices.

- Darian.

==================

Profiling Log for Mesa DRI Intel(R) 945G 20061017 x86/MMX/SSE2 (1.3 Mesa 7.0.3-rc2)

Window Size 512 x 512
Max. allowed Verts: 100000 Radius: 0.15 P: 6 Q: 1

3 Verts per Ring. 122.782ms averaged per draw.
4 Verts per Ring. 115.521ms averaged per draw.
5 Verts per Ring. 111.450ms averaged per draw.
6 Verts per Ring. 109.288ms averaged per draw.
7 Verts per Ring. 106.695ms averaged per draw.
8 Verts per Ring. 105.226ms averaged per draw.
9 Verts per Ring. 108.604ms averaged per draw.
10 Verts per Ring. 106.859ms averaged per draw.
11 Verts per Ring. 111.982ms averaged per draw.
12 Verts per Ring. 110.323ms averaged per draw.
13 Verts per Ring. 109.019ms averaged per draw.
14 Verts per Ring. 108.363ms averaged per draw.
15 Verts per Ring. 107.661ms averaged per draw.
16 Verts per Ring. 109.275ms averaged per draw.
17 Verts per Ring. 123.947ms averaged per draw.
18 Verts per Ring. 120.290ms averaged per draw.
19 Verts per Ring. 121.464ms averaged per draw.
20 Verts per Ring. 120.476ms averaged per draw.
21 Verts per Ring. 119.660ms averaged per draw.
22 Verts per Ring. 120.158ms averaged per draw.
23 Verts per Ring. 120.615ms averaged per draw.
24 Verts per Ring. 113.487ms averaged per draw.
25 Verts per Ring. 85.760ms averaged per draw.

the Fiddler's picture

If I remember correctly, the GMA950 does not have a T&L engine (vertices are processed on the CPU), which might explain the results.

It is likely that compiz plays a role too.

Inertia's picture

Since the application only runs at ~10 fps it's quite safe to assume that there's no hardware acceleration happening at all, the slightly decreasing time per frame can probably be explained by the backface culling mechanism being able to reject more and more faces.

The major problem with this application is that the timing of the draw call is quite precise and any process running in the background (such as network traffic or input events) do affect the result notably. So it's quite hard writing a reliable benchmark that can reliably output a useful result.

For my own use I'm just optimizing meshes for a vertex cache size of 12 right now, but I'm planning to write a function that implements a weightless algorithm at some point and compare the results.

I'm not working on this project anymore and it is now pretty much only a demo how to generate a mesh procedurally and stuff it into a VBO, should probably move it to the archives?

Darian's picture

Thanks you both Fiddler and Inertia,

I tried this as an example to test my GMA 950 VBO capabilities and see if it yields an error running undefined extensions.

I still have not answer regarding this issue, I don't want to go off topic on this, so I'll keep it short.

I need to display between 2000 to 10000 spheres and looking for the best way doing it. with an important culprit, that i need the updated vertex data back into the programs logic when i rotate/translate objects.

Tried this VBO example to get a hold of a proper method.

- Darian