
PQ Torus Knots
Posted Thursday, 6 December, 2007 - 22:19 by Inertia inI had the idea for this when Fiddler mentioned triangle-strips with vertex caches. For procedural generated geometry a triangle strip has some beneficial properties that can be used to examine the Vertex Cache's behaviour, which is quite problematic with indexed triangle-lists.
Because of the attributes used when building the Torusknot, it can be used in Profiling to help identify the size of the graphic-card's Vertex Cache. To my knowledge there is no GL variable one can query to retrieve the Cache size (however there exists a DX tool http://www.clootie.ru/delphi/dxtools.html ), this functionality could be handy for a Setup/Config Tool for a main Application, that can examine the graphics card (and maybe convert all Meshes to ideal cache-layout for the current Client).
Possible Vertex Cache sizes between 8-50 are examined, and the results are stored in the Logs/ Folder (which the app will create when missing). No manipulation to the system is done except (inside) this Folder.
This program does nothing but generating a Torusknot from given parameters and offer 3 modes (Interactive, Turntable and Profiling) to put the mesh to some use and was written in 2 days from scratch, in the processes of going through the OpenTK Examples and experimenting with some stuff.
After some alpha testing, the profiling produces rather useful results now. However some graphic cards produce weird results, which could be connected to multiple parallel Vertex-Processing-Units inserting Vertices into the Cache.
Also Problematic are measurings like this. The examined graphics card has a cache of 24 and 12 would be perfect with the given tri-strip layout.
11 Vertices per Ring. 3,796ms averaged per draw.
12 Vertices per Ring. 3,821ms averaged per draw.
13 Vertices per Ring. 6,515ms averaged per draw.
To get a more clear picture about this problem, I need Your help. Please run the application and do a profiling run by pressing "P". This will take a few seconds, then a new file is created in the Logs/ Folder. If you want, you can run multiple tests with different P/Q or disabling Texture2D, however a single benchmark from the default settings will be perfectly sufficient. Please attach or c&p that text file into this thread, this will take you less than 2 minutes, if you don't start toying around in interactive mode ;)
Make sure you have OpenTK.dll available to the app.
Thanks!
Edit: As promised, the source code. Only little documentation, most of it is trivial. Use at your own risk! :P
The Torusknot.cs class itself handles the mesh from generating Vertices and Triangles up to the VBO. A Torus Knot is specified like this
Create( uint pathsteps, uint shapevertices, float radius, int p, int q )
where pathsteps is the count how many Rings are in the Knot. shapevertices defines the number of Vertices per Ring.
The other files aren't really interesting and only included so you can build the app.
---------------------------------------------------------------------------
The included Solution was created with VC# Express 2008, in case you cannot load it:
Create a new project, console application.
-Add *.cs *.jpg from the compressed archive. Set the properties of logo-dark.jpg so that it'll be copied when building.
-Optionally add OpenTK.dll.
-Add System, System.Drawing and OpenTK as references.
| Attachment | Size |
|---|---|
| PQTorusKnots Source Code (OpenTK 0.9.0) | 37.83 KB |


Comments
Tested on Vista x64 with the
Tested on Vista x64 with the following results:
I'm not sure what to make of the results, they look rather random and further testing didn't show anything different. Any ideas?
I'd love to take a look at the source. Also, would you mind if I took some screenshots and used them as a favicon for this site?
You resized the window to
You resized the window to fullscreen, which made the fillrate a limiting factor too. This is simply 1 Light with fixed function Gouraud and Texture mapping, unless I'm timing the GL.Finish(); wrong the time should reflect exactly the time it took to GL.DrawElements() the VBO. Overdraw and culling affect the result aswell, that's why the model isn't moving during profiling.
According to these results your vertex cache would be estimated as 12, which is unlikely true ;)
Use them in any way you like to, but be careful that you don't attract Sceners or there will be spikeballs all over the place ><
My test results: Profiling
My test results:
There seems to be a noticeable increase right at 7 vertices. Does this mean that my vertex cache is 6 vertices?
ps. first post in forums.
Welcome Stevo, and thank you
Welcome Stevo,
and thank you for posting the result. Like Fiddler's ATi card, your effective Vertex Cache would be 12 too.
This could mean that your true Vertex Cache size is 32, but 20 Vertex Units are inserting new Vertices parallel into the Cache, decreasing the effective size because of the new Vertices added. This is not a bad thing, especially when rendering objects that share very little or no Vertices (e.g Particle Systems) your graphics card will probably exceed any other card that relies on using the Vertex Cache.
I've looked this up, and it seems like ATi cards are using the same memory for L1 Texture Cache and Vertex Cache. Would you please make another profile run with Texture2D disabled? (Hotkey Q). Also make sure the driver does not enforce Anti-Alias/Anisotropy/Tru-form, especially the last could be responsible for these low values (insert new Vertices that aren't considered by the profiling).
Thanks!
Welcome Stevo14 :) I reran
Welcome Stevo14 :)
I reran the tests with the default window and, sure enough, the results became a little clearer.
With textures:
Without:
Disabling textures doesn't seem to affect the effective size of the cache. I'll run the test on a couple of nv40 and g70 cards, to have something to compare against.
Thank you, this clarifies at
Thank you, this clarifies at least the connection between vertex and texture cache. It seems like your card's effective cache is really 12, drawing a ring with 7 vertices is 120% the time compared to 6 verts.
If you take a look at the first benchmark, the result would be a vertex Cache of 10 though, while the second results in 12. I'm rather sure these discrepancies are related to the OS performing actions in the background while the app is running, a Diagnostics.Stopwatch has the resolution to be affected by this.
What I had in mind as a backup solution was binding an "expensive" vertex shader to draw the knot. This would be useless calculations that the compiler doesn't opt out, and should increase the cost of processing a vertex alot. So the ms/draw should increase stronger when no vertex cache hits are made.
Looks like about the same
Looks like about the same result the second time with textures off:
I find it curious that it was slower this time with the textures off. Of course, it could just be the fact that something was running in the background slowing things down.
Ofcourse other Processes
Ofcourse other Processes affect the results, although Thread priority is already set to highest. The Texture is trilinear filtered, and might have caused the Vertex Cache to make room for L1 Texture Cache, just wanted to verify that's not true.
Here's one benchmark clearly indicating a Vertex Cache Size of 24, at 12 rings (all vertices from the previous ring are free).
Edit: Source Code added.
Thanks for the source. I
Thanks for the source. I will be running tests on a couple of other systems to see what comes up.
Ah, one small thing: you can register for KeyDown and KeyUp events in the GameWindow.Keyboard class, which can simplify the keyboard handling logic (if I understand how the KeyStrokeManager works). Documentation...
I had a couple of profiling
I had a couple of profiling runs on other people's laptops, but a Vertex Cache of 24 was the highest result so far. One Intel chipset only cached the last 8 vertices, i think that's the absolut minimum an OpenGL driver must provide? I'm also getting the suspicion that there may be no standard if the cache must have a FIFO or LRU logic to decide which entry gets discarded.
Well, I just had trouble with the repeating and wanted to get this done quickly. For a game this kind of behaviour from the input class is great, I didn't really look into the events as this was just a Quickstart template. I just built this app to get my mind off porting the MS3D Loader to OpenTK.Math, and kinda proving that the mathlib isn't the Problem factor ;)