The hunt for HV's FIFO/Push buffer...
my plans.
I would like to wait few days. Probably, I'll describe 3D initialization on wiki.
After that ( and getting stable cvn hosting ) I want to port some parts of mesa-nouveau ( fragment shader compiler, states setup, textures, buffers ). I want to write some kind of small gl library. Do not want to write memory and resource managers, this library will work in the exclusive mode. I want fragment shader compiler and DXT texture compressor to be standalone utils, not the core of the library.
After that ( and getting stable cvn hosting ) I want to port some parts of mesa-nouveau ( fragment shader compiler, states setup, textures, buffers ). I want to write some kind of small gl library. Do not want to write memory and resource managers, this library will work in the exclusive mode. I want fragment shader compiler and DXT texture compressor to be standalone utils, not the core of the library.
black rendering issue
I was reported about only black rendering on PS3 ( also nouveau has the same issue on PPC, probably, endianness ).
Will try to fix that. It is stopper bug, of course.
Will try to fix that. It is stopper bug, of course.
It is very funny bug, I tame it a bit.
using this shader ( gray output )
static nv_pshader_t nv30_fp = {
.num_regs = 2,
.size = (2*4),
.data = {
/* MOV R0, ( 0.5f, 0.5f, 0.5f, 0.5f ) */
0x01403e81, 0x1c9dc802, 0x0001c800, 0x3fe1c800,
0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000,
}
};
and with endiannes of 3D class setted to 0x0 I was able to get non-black ( gray, as aspected ) rendering. Check svn repo.
The problem is RGBA is converting visually as ABGR on the screen ( probably there is some workaround with that ). Very funny bug.
using this shader ( gray output )
static nv_pshader_t nv30_fp = {
.num_regs = 2,
.size = (2*4),
.data = {
/* MOV R0, ( 0.5f, 0.5f, 0.5f, 0.5f ) */
0x01403e81, 0x1c9dc802, 0x0001c800, 0x3fe1c800,
0x3f000000, 0x3f000000, 0x3f000000, 0x3f000000,
}
};
and with endiannes of 3D class setted to 0x0 I was able to get non-black ( gray, as aspected ) rendering. Check svn repo.
The problem is RGBA is converting visually as ABGR on the screen ( probably there is some workaround with that ). Very funny bug.
From reading on the web it seems that most nVidia cards can be driven as either little or big endian. A random dump I found by googling for endianness on nVidia cards seems to suggest that endianness can be specified on a per-object basis for any object that has ENGINE=GRAPHICS in the context: http://people.freedesktop.org/~kmeyer/r ... put.txt.gzIronPeter wrote:The problem is RGBA is converting visually as ABGR on the screen ( probably there is some workaround with that ). Very funny bug.
Perhaps the context the hypervisor has created is little endian for some reason?
endian issue resolved
check svn repo and Wiki page.
-
- Posts: 4
- Joined: Mon Oct 22, 2007 7:13 am
I made some grammar edits to the wiki. There were some pieces that were unclear to me however, and could use some clarification:
IronPeter, did you create a 3d class from scratch? This is what I understand from this thread and the source code. However the way it is worded on the wiki leads me to believe you found an existing 3d object, or modified an existing object to be a 3d object. Which statement is correct?
Also, the wiki says the Hypervisor makes objects in RAMIN. Is it possible to make objects anywhere other than RAMIN, or is this just a convention of the Hypervisor? I admit ignorance here as to how NVIDIA cards work.
Finally, in the FIFO workaround section where it says "So the hack consists in either patching the last operation with a NOP, or changing the FIFO write pointer to stop earlier." -- shouldn't it be changed to "changing the FIFO write pointer to stop later" since you'll (presumably) be adding commands to the end of the FIFO?
Thanks for all the hard work! I hope to contribute soon as well (once I get ps3toolchain to compile under cygwin.. grrr..)
IronPeter, did you create a 3d class from scratch? This is what I understand from this thread and the source code. However the way it is worded on the wiki leads me to believe you found an existing 3d object, or modified an existing object to be a 3d object. Which statement is correct?
Also, the wiki says the Hypervisor makes objects in RAMIN. Is it possible to make objects anywhere other than RAMIN, or is this just a convention of the Hypervisor? I admit ignorance here as to how NVIDIA cards work.
Finally, in the FIFO workaround section where it says "So the hack consists in either patching the last operation with a NOP, or changing the FIFO write pointer to stop earlier." -- shouldn't it be changed to "changing the FIFO write pointer to stop later" since you'll (presumably) be adding commands to the end of the FIFO?
Thanks for all the hard work! I hope to contribute soon as well (once I get ps3toolchain to compile under cygwin.. grrr..)
>IronPeter, did you create a 3d class from scratch
Yes, I created 3d class instance from scratch. There is no 3d class instance registered by HV in RAMHT. Probably, there is some HV's call doing that. We did not find that call.
> Is it possible to make objects anywhere other than RAMIN, or is this just a convention of the Hypervisor?
RAMIN is acronym for "the place there graphics objects are stored". This memory has strict format. This format seems to be independent from environment ( HV environment on PS3 or video driver on PC ).
Yes, I created 3d class instance from scratch. There is no 3d class instance registered by HV in RAMHT. Probably, there is some HV's call doing that. We did not find that call.
> Is it possible to make objects anywhere other than RAMIN, or is this just a convention of the Hypervisor?
RAMIN is acronym for "the place there graphics objects are stored". This memory has strict format. This format seems to be independent from environment ( HV environment on PS3 or video driver on PC ).
No, it is correct as is.cypherpunks wrote:Finally, in the FIFO workaround section where it says "So the hack consists in either patching the last operation with a NOP, or changing the FIFO write pointer to stop earlier." -- shouldn't it be changed to "changing the FIFO write pointer to stop later" since you'll (presumably) be adding commands to the end of the FIFO?
Think of it as:
Code: Select all
rptr: opA
opB
opC
x: END //stop processing, wait for hypervisor to restart GPU
eptr:
So, now you understand that, the hack is the prevent execution of the END instruction, because restarting the GPU is a privileged hypervisor operation. Either of the two techniques have the same effect (either setting wptr to x or replacing the END with a NOP) - the GPU never executes the END instruction and so continues to wait for the wptr to change again before it continues reading instructions. By never ending the list of instructions, the GPU is always waiting for us and so we never need the hypervisor to kick start the process again.
To add our own instructions to the command queue, we formulate a block of instructions, writing them to the next available position in the queue. Then, we update wptr to point to the next instruction after the last one. This means the GPU notices it can continue processing and executes up to the last one.
The next question is that what happens when we get to the end of this buffer. After all, it's only 64k long. The answer is that the GPU has a JMP/branch instruction just like a CPU. So, when we're close to running out of space, we jump back to the start of the FIFO buffer and repeat the process. There are GPU pre-fetch bugs that mean you need to target the jump into a block of NOPs, but this isn't a major problem.
Hope that helps you make sense of this!
There is, technically, a way to branch to sub fifo buffers (of any size).
But that implies to know what services the interrupt handler associated to the GPU interrupt can do. Once again, I will describe what happens with nv2A on xbox 1.
In the main fifo sequence you just put a fire interrupt command with a data code that says "please remember this address+n (return address)".
Then you put a jump command after that, to the sub fifo buffer (usually already filled with pre-calculated insanely long commands to achieve top speed). At the end of the sub fifo buffers (let's consider it as a kind of subprogram or procedure), you put a fire interupt command with a data code that says "return" followed by a jump (which address will be updated). So the interrupt handler has 2 data code that allows to remember the return address or to setup up the jump adress with the remembered return address (in order to effectively do the "return").
Plenty of magnificent tricks (like the "fences" mechanism, a kind of sync between CPU and GPU for many specific purposes) can be done once you know (or better, can edit) the interrupt handler. Not the case on PS3 of course... But maybe someday, someone will be able to just "read" the HV code and will tell us what services already offers the current interrupt handler. Useless to say it's also very useful to be able to understand error reports made by GPU (reported through the same interrupt handler) when you make a mistake when you write a wrong sequence of command somewhere in fifo buffer (better than a black screen saying nothing).
But that implies to know what services the interrupt handler associated to the GPU interrupt can do. Once again, I will describe what happens with nv2A on xbox 1.
In the main fifo sequence you just put a fire interrupt command with a data code that says "please remember this address+n (return address)".
Then you put a jump command after that, to the sub fifo buffer (usually already filled with pre-calculated insanely long commands to achieve top speed). At the end of the sub fifo buffers (let's consider it as a kind of subprogram or procedure), you put a fire interupt command with a data code that says "return" followed by a jump (which address will be updated). So the interrupt handler has 2 data code that allows to remember the return address or to setup up the jump adress with the remembered return address (in order to effectively do the "return").
Plenty of magnificent tricks (like the "fences" mechanism, a kind of sync between CPU and GPU for many specific purposes) can be done once you know (or better, can edit) the interrupt handler. Not the case on PS3 of course... But maybe someday, someone will be able to just "read" the HV code and will tell us what services already offers the current interrupt handler. Useless to say it's also very useful to be able to understand error reports made by GPU (reported through the same interrupt handler) when you make a mistake when you write a wrong sequence of command somewhere in fifo buffer (better than a black screen saying nothing).
small update
textured triangle in the repo. Just 10 minutes work.
-
- Posts: 4
- Joined: Mon Oct 22, 2007 7:13 am
another small feature
Setup of depth buffer. Just for fun this buffer is mapped into visual screen area.
ps2devman, thanks a lot.
>Shaders running on PS3!!!
also there are textures and working depth test :). I've updated demo, now it shows 3 Z-overlapping triangles.
So at this moment we have
1.) working shaders, both pixel and vertex
2.) working textures
3.) working Z test.
We need
1.) Renderstates like blend, different Z modes, alpha test. - easy
2.) Index and vertex buffers. - a bit more harder.
3.) Texture support with differenet formats, mips, swizzling. - more harder
4.) Some shader compiler ( microcode is very hard to maintain ). - hard.
I want to setup ps2rsx project soon. Probably, this week-end,
>Shaders running on PS3!!!
also there are textures and working depth test :). I've updated demo, now it shows 3 Z-overlapping triangles.
So at this moment we have
1.) working shaders, both pixel and vertex
2.) working textures
3.) working Z test.
We need
1.) Renderstates like blend, different Z modes, alpha test. - easy
2.) Index and vertex buffers. - a bit more harder.
3.) Texture support with differenet formats, mips, swizzling. - more harder
4.) Some shader compiler ( microcode is very hard to maintain ). - hard.
I want to setup ps2rsx project soon. Probably, this week-end,
I will try to give a hand with 4).
The idea is to do something similar to function pcode2mcode in pbkit.
It translates standard shaders written in pseudo code into native code.
That way public compiler Cg.exe could be used and you just include the binary (pseudo code) result in your code, then function pcode2mcode does all the translation work automatically.
Dunno if Cg.exe is good enough for the level of shader models we need.
Also need to look again all the files of Nouveau project... They surely ran into this trouble already and they may have found best solution already.
The idea is to do something similar to function pcode2mcode in pbkit.
It translates standard shaders written in pseudo code into native code.
That way public compiler Cg.exe could be used and you just include the binary (pseudo code) result in your code, then function pcode2mcode does all the translation work automatically.
Dunno if Cg.exe is good enough for the level of shader models we need.
Also need to look again all the files of Nouveau project... They surely ran into this trouble already and they may have found best solution already.
I would like to talk about performance (because we talk about shaders).
There are persistent rumours that claim that the 360 Xenos is faster than RSX. Of course, now, we are getting closer to the answer, since we will be able to count the number of vertex per frame we can enqueue (when vertex buffer mechanism will work).
Thanks to tmbinc, we could see that, currently, homebrew on 360 can expect, at least, 3.900.000 v/f at 60 fps with minimal shader (no lighting, just simple texture projection) and 3.100.000 v/f at 60 fps with gouraud lighting (1 source). The same kind of performance loss has been seen with other gpu's even if they are slower (xb1 -330.000-, ps2 -250.000-).
I think the goal, would be to have higher performance on PS3, since the machine costs more. If RSX alone is slower, then we have to use SPU's to get a smart solution. Actually the clue, I think, are the shaders. The more sophisticated they are, the more we lose performance.
So, since we are to think about compiling shaders, here is a paradox :
Maybe we shouldn't spend too much time working on sophisticated shaders. What we may try to do is to have SPU's do the preliminary calculation work and data flow towards shaders, in order to have minimal and fastest shaders running on the RSX... It's certainly unusual strategy (but remembers me vu1 working for GS on ps2).
There are persistent rumours that claim that the 360 Xenos is faster than RSX. Of course, now, we are getting closer to the answer, since we will be able to count the number of vertex per frame we can enqueue (when vertex buffer mechanism will work).
Thanks to tmbinc, we could see that, currently, homebrew on 360 can expect, at least, 3.900.000 v/f at 60 fps with minimal shader (no lighting, just simple texture projection) and 3.100.000 v/f at 60 fps with gouraud lighting (1 source). The same kind of performance loss has been seen with other gpu's even if they are slower (xb1 -330.000-, ps2 -250.000-).
I think the goal, would be to have higher performance on PS3, since the machine costs more. If RSX alone is slower, then we have to use SPU's to get a smart solution. Actually the clue, I think, are the shaders. The more sophisticated they are, the more we lose performance.
So, since we are to think about compiling shaders, here is a paradox :
Maybe we shouldn't spend too much time working on sophisticated shaders. What we may try to do is to have SPU's do the preliminary calculation work and data flow towards shaders, in order to have minimal and fastest shaders running on the RSX... It's certainly unusual strategy (but remembers me vu1 working for GS on ps2).
As long as you don't really know the performance charasteristics, it's no good to already make assumption on ways to improve performance. what if the basic performance on PS3 is bad, but it loses less performance on more sophisticated shader?
I think that even with homebrew going on it will take quite a long time to figure out how the performance really is and why.
I think that even with homebrew going on it will take quite a long time to figure out how the performance really is and why.
There is official geometry proccessing tool Edge from Sony. You can google for Edge specs ( it is open info ).
This tool does vertex processing on SPU. Skeletal animation, even back face culling. I tried to write some back face culling code. Pretty fast on the single SPU. One single SPU can provide RSX with geometry. Two SPUs can flood the graphic chip.
RSX has two memory channels - DDR and XDR. XDR memory contains push buffer and is good for dynamic spu-generated geometry. DDR memory is for render targets and textures.
The main perfomance ( just perfomance, not the core functionality ) issue, as for me, is TILE and ZCOMP setup. You can refer pbkit ( thanks to ps2devman ) or nouveau project for details.
I do not know the way to setup it from FIFO interface. pbkit and Nouveau do this setup via mmio regs. I have no ideas how can we access global GPU mmio regs.
This tool does vertex processing on SPU. Skeletal animation, even back face culling. I tried to write some back face culling code. Pretty fast on the single SPU. One single SPU can provide RSX with geometry. Two SPUs can flood the graphic chip.
RSX has two memory channels - DDR and XDR. XDR memory contains push buffer and is good for dynamic spu-generated geometry. DDR memory is for render targets and textures.
The main perfomance ( just perfomance, not the core functionality ) issue, as for me, is TILE and ZCOMP setup. You can refer pbkit ( thanks to ps2devman ) or nouveau project for details.
I do not know the way to setup it from FIFO interface. pbkit and Nouveau do this setup via mmio regs. I have no ideas how can we access global GPU mmio regs.
Don't worry we will find a way, even if it takes months to find it.
If we can have vertex buffers running, that will be already heaven.
I have the feeling the interrupt handler is the key. Since it's used to report gpu errors, I don't think it doesn't exist in HV, and I'm pretty sure, Sony engineers were lazy to strip unused/dangerous services from it.
We can try to observe existing shaders for any unusual command that would be an access request to mmio from fifo. On nv2A (xb1) it's used to disable/enable the noise flag for compressed texture, for example, right from within the command sequence in the fifo (push buffer).
If we can have vertex buffers running, that will be already heaven.
I have the feeling the interrupt handler is the key. Since it's used to report gpu errors, I don't think it doesn't exist in HV, and I'm pretty sure, Sony engineers were lazy to strip unused/dangerous services from it.
We can try to observe existing shaders for any unusual command that would be an access request to mmio from fifo. On nv2A (xb1) it's used to disable/enable the noise flag for compressed texture, for example, right from within the command sequence in the fifo (push buffer).
ps2devman, yes. The TILE and ZCOMP setup is not critical task for us.
I have working vertex buffers. With some issues.
The problem is index buffer. I was unable to find any info about index buffer in the nouveau docs. I have not PC with NV40 and linux installed to make fifo dump with gl
DrawElements call :(.
It is great if somebody can do that dump.
Edit: Vertex buffer works fine, both in XDR or DDR memory, issues were resolved.
I have working vertex buffers. With some issues.
The problem is index buffer. I was unable to find any info about index buffer in the nouveau docs. I have not PC with NV40 and linux installed to make fifo dump with gl
DrawElements call :(.
It is great if somebody can do that dump.
Edit: Vertex buffer works fine, both in XDR or DDR memory, issues were resolved.
I'm speechless... It's heaven. Thanks a lot IronPeter!
For index buffer, I don't know well enough nv40 yet, but you can see how it is done in nv20 by looking at pbkit Demo 04. There are constants in the source to define in order to have rendering by index buffer instead of vertex buffer. Also by looking at the name of the nv20 constant, you may discover how will be named the nv40 constant that does the same.
However, for me, it's heaven... Since I plan to have same homebrew game sources compile for ps3, 360, xb1 and ps2. Since ps2 doesn't support at all index buffers, I planned to use vertex buffers only anyway.
Anyway, I will send an e-mail to Nouveau project leader, to be sure he knows what point you reached. He should be able to give us nice infos.
(And Nouveau project members often have nv40 card and Linux dumper)
For index buffer, I don't know well enough nv40 yet, but you can see how it is done in nv20 by looking at pbkit Demo 04. There are constants in the source to define in order to have rendering by index buffer instead of vertex buffer. Also by looking at the name of the nv20 constant, you may discover how will be named the nv40 constant that does the same.
However, for me, it's heaven... Since I plan to have same homebrew game sources compile for ps3, 360, xb1 and ps2. Since ps2 doesn't support at all index buffers, I planned to use vertex buffers only anyway.
Anyway, I will send an e-mail to Nouveau project leader, to be sure he knows what point you reached. He should be able to give us nice infos.
(And Nouveau project members often have nv40 card and Linux dumper)
SVN repo on ps2dev
I've created ps3rsx project. Excuse some delay.
For now there is only one project inside this repo - a bit modified example with 3 triangles. z buffering, textures, vertex and pixel processing, vertex buffers.
I want to have full-scale 3D library. src folder is empty for now :).
project will have MIT license.
SVN repo:
http://svn.ps2dev.org/listing.php?repna ... rev=0&sc=0
fill free to commit.
For now there is only one project inside this repo - a bit modified example with 3 triangles. z buffering, textures, vertex and pixel processing, vertex buffers.
I want to have full-scale 3D library. src folder is empty for now :).
project will have MIT license.
SVN repo:
http://svn.ps2dev.org/listing.php?repna ... rev=0&sc=0
fill free to commit.
I have a friend with a GeForce 6800 that I could borrow to complete a FIFO dump. Only, I don't know how :( Let me know if I can help.IronPeter wrote:The problem is index buffer. I was unable to find any info about index buffer in the nouveau docs. I have not PC with NV40 and linux installed to make fifo dump with gl
DrawElements call :(.
It is great if somebody can do that dump.