The hunt for HV's FIFO/Push buffer...

Technical discussion on the newly released and hard to find PS3.

Moderators: cheriff, emoon

Post Reply
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

first, checkout that repo:

cvs -z3 -d:pserver:anonymous@nouveau.cvs.sourceforge.net:/cvsroot/nouveau co -P renouveau

second, setup VBO for indices and replace glDrawArrays to glDrawElements.

third, build the dumper and run it.

dumper homepage and docs: http://nouveau.freedesktop.org/wiki/REnouveau
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

Nouveau project leader answered my help request. Here is his reply :
Feel free to explore these dumps :
http://people.freedesktop.org/~kmeyer/renouveau_dumps/
Try to find the "test display_list" which uses index buffers

(I don't know these dumps, I can't give you more details)
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Ok, these dumps use NV40TCL_VB_ELEMENT_U16 in the begin/end block. Yes, it is way to send indexed primitives to GPU. It is very bad idea to embed indices into your push buffer. Very bad idea.

Of course, lists work in that odd way.

It is better to make dumps from glDrawElements. Index buffers are first class citizens on NV40 class hardware.
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

I've forwarded your request. Keep faith.
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

Let's dream again, about... TILE...

For those who haven't take a close look inside pbkit source (plenty of comments there), here is the explanation of the TILE concept :

When you declare a TILE, you declare a memory area. Its most spectacular usage is for the depth stencil buffer. On nv20 you could declare 8 tiles. One tile has a massive internal GPU cache associated with.
Depth stencil buffer is something accessed for reading or writing very very often when many triangles are to be displayed at the same screen location. Usually, you HAVE TO clean depth stencil buffer at beginning of each frame (Z to max, stencil to 0), then draw from closest distance to farest distance, in order to take full advantage of automatic compression and data caching because of the TILE declaration.
On xb1, in pbkit Demo 04, one of the controller button allows to switch display to the depth stencil buffer so you can look at it. Triangles that will have same depth (more or less distance to camera), will have same colors (color=depth). But... If automatic compression is active you will only see maybe 1 pixel every 4 pixels horizontally and vertically. I.e you will see groups of 4x4 pixels and only the first pixel in top left corner of the group will be lit. That's automatic compression, by using smart coding, you can have 1:4, 1:8 or 1:16 compression rate. I.e GPU doesn't need to read/write more that 1, 2 or 4 dwords for each group of 4x4 pixels (16 dwords).

So... If you manage to keep an eye on the content of the depth stencil buffer and try to move it around in memory, maybe, with luck, you will see that automatic compression active. That would mean a previous program (a game?) has declared a tile but didn't trash it before quitting...

Ok, another naive dream... But since it has been reported that some traces were left in RAMIN after a game launch in game OS... Maybe...

Anyway that's for a 30% performance gain. Not absolutely necessary.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

ps2devman, thanks for your help.

It is better to dig a bit hypervisor interfaces for TILE setup.

For example, it is Nvidia MMIO regs data base:

http://gitweb.freedesktop.org/?p=mesa/d ... veau_reg.h

Compare with http://wiki.ps2dev.org/ps3:hypervisor:lv1_gpu_attribute :

ret64 = lv1_gpu_attribute(0x100, 0x007, val, 0, 0);

It is interrupt handler setup. Here 0x100 is definitely MMIO register index.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Probably it is worth to use parameters for lv1_gpu_memory_allocate. parameter 0 is just memory size
parameter 1 is amount of some resource, up to 0x80000.
parameter 2 is amount of some resource, up to 0x300000.
parameter 3 is amount of some resource, up to 0xf //tiles?
parameter 4 is amount of some resource, up to 0x8

Seems like ZCOMP and TILE definitions. It it great if somebody is able to test these parameters and to note any side effects.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Stephane Marchesin, leader of Nouveau project, granted me with renouveau repository write access. I can commit my tests, people will run dumper on nv40 and submit dumps. Iterations are not fast, but we have a lot of time.

Peter.
tgnard
Posts: 2
Joined: Tue Nov 06, 2007 9:15 am

FIFO workaround with firmware 2.0.0

Post by tgnard »

I just wanted to confirm that the FIFO workaround (and Xv acceleration) is still valid with firmware 2.0.0
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

This workaround is hard to banish. Only with unmapping FIFO buffer from kernel memory.

Are upper memory access rights fixed?
dom
Posts: 29
Joined: Tue Oct 05, 2004 7:20 pm

Post by dom »

Hello,

I did update today to 2.0 firmware.
I tried the svn libps3rsx example, it gives for few seconds a kind of blue textured triangle (with a hole) on an orange screen.
I guess this is the right behavior.
@+
dom
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

It is not one triangle with hole, there are 3 overlapped triangles in the demo :).

Good news.
dom
Posts: 29
Joined: Tue Oct 05, 2004 7:20 pm

Post by dom »

IronPeter wrote:It is not one triangle with hole, there are 3 overlapped triangles in the demo :).

Good news.
Yes indeed,
I was far away from my screen and my binoculars are dirty and old (my eyes too by the way) ;-)
@+
dom
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

ps3rsx task list

Post by IronPeter »

Ok, I want development to be public. There are many tasks to do. I want to divide work in small parts, easy and fun to do.

The first task is DXT texture support. DXT compression can be handled by open source library like http://www.sjbrown.co.uk/?code=squish

I commited file textures.h with simple interface. Anybody is welcome to implement this interface. Implementation ( with your copyrights ) will be placed in the repository. After that you will be granted with write repo access.

If you want to contribute - email me. Feel free.

Peter.
ArtVandelae
Posts: 3
Joined: Thu Nov 08, 2007 11:39 pm

Post by ArtVandelae »

If I may ask, what exactly is the roadmap for this project? Is the plan to turn the library that is currently being developed into a full-featured 3D library by itself, or is the goal to make a basic RSX interface framework library that can be used to write a driver for something like Mesa?
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

The first milestone is working low-level API. This API will work in the exclusive fullscreen mode. But this API will be full featured and will run in user mode.

With textures, buffers, sync with RSX it will take ~1 month of development.

Shader compiler also will take ~1 month.

It is possible to make gl-like interfaces for this low-level console-style library.

MesaGL porting is more complicated. The main problem is resource management. Many months to develop and debug... Also many months to support old-style T&L pipeline. We can disscuss Mesa porting only after the first milestone.
majic12
Posts: 1
Joined: Sun Nov 11, 2007 5:57 am

Post by majic12 »

IronPeter you are the king :)

try to help for you , i write a shader compiler if you want
where to find any information from shader instruction and opcodes
RobertW
Posts: 3
Joined: Sun Nov 11, 2007 6:58 am

Post by RobertW »

@IronPeter

About mesa you might want to contact Ian Romanick. He made a announcement a few months back to port mesa to cell, although I don't know the status at this moment.

http://www.nabble.com/Mesa-on-Cell-plan-t4202805.html
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

shader tokens

Post by IronPeter »

This repo contains some basic shader compiler.

http://gitweb.freedesktop.org/?p=mesa/m ... ri/nouveau

Nouveau project has many branches. Probably, other branches are more adequate. Refer "user section" at http://gitweb.freedesktop.org/

Development is relative easy because binary layer is very close to assembler:

http://www.opengl.org/registry/specs/NV ... rogram.txt
http://www.opengl.org/registry/specs/NV ... rogram.txt

Guys, why everybody wants to write shader compiler :)? Write some basic stuff like DXT textures support as your first task.

If you want to write shader compiler - write a small working demo with basic shader assembling.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

index buffers

Post by IronPeter »

With new nouveau dumps ( thanks to marcheu ) i was able to use index buffers on RSX.

Check SVN, the triangle demo.
dom
Posts: 29
Joined: Tue Oct 05, 2004 7:20 pm

Re: shader tokens

Post by dom »

IronPeter wrote: If you want to write shader compiler - write a small working demo with basic shader assembling.
You can even write some complex shader and output them to arb program.

Though I never really used it, the nvidia sdk cg toolkit contains some support for various input and output. I don't know how to get from the
arb code to the machine code. It is far away from my knoledge.

Code: Select all

usage: cgc [-quiet] [-nocode] [-nostdlib] [-[no]fx] [-longprogs] [-v] [-strict] [-oglsl]
           [-glslWerror] [-Dmacro[=value]] [-Iinclude_dir] [-profile id]
           [-entry id | -noentry] [-profileopts opt1,opt2,...] [-o ofile] [-l lfile]
           [-[no]fastmath] [-[no]fastprecision] [-bestprecision]
           [-unroll (all|none|count=N)] [-ifcvt (all|none|count=N)]
           [-inline (all|none|count=N)]
           &#91;-type <type definition>&#125; &#91;-typefile <file>&#125; &#91;-M<...>&#93;
           &#123;file.cg&#125;
supported profiles and their supported profileopts&#58;
    glslf     profileopts&#58;
    glslv     profileopts&#58;
    ps_1_3    profileopts&#58;
        MaxPixelShaderValue=<val>
    ps_1_2    profileopts&#58;
        MaxPixelShaderValue=<val>
    ps_1_1    profileopts&#58;
        MaxPixelShaderValue=<val>
    dx8ps     profileopts&#58;
        MaxPixelShaderValue=<val>
    fp20      profileopts&#58;
    generic   profileopts&#58;
    ps_3_0    profileopts&#58;
    fp40unlimited profileopts&#58;
    fp40      profileopts&#58;
        NumTemps=<val>
        NumInstructionSlots=<val>
        OutColorPrec=<val>
        MaxLocalParams=<val>
    vs_3_0    profileopts&#58;
        MaxLocalParams=<n>
        MaxInstructions=<n>
    vp40      profileopts&#58;
        NumTemps=<val>
        NumInstructionSlots=<val>
        MaxLocalParams=<val>
    arbfp1    profileopts&#58;
        NumTemps=<val>
        NumInstructionSlots=<val>
        NoDependentReadLimit=<val>
        NumTexInstructionSlots=<val>
        NumMathInstructionSlots=<val>
        MaxTexIndirections=<val>
        MaxDrawBuffers=<val>
        MaxLocalParams=<val>
    ps_2_x    profileopts&#58;
        NumTemps=<val>
        NumInstructionSlots=<val>
        Predication=<val>
        ArbitrarySwizzle=<val>
        GradientInstructions=<val>
        NoDependentReadLimit=<val>
        NoTexInstructionLimit=<val>
    ps_2_0    profileopts&#58;
    dx9ps2    profileopts&#58;
    fp30unlimited profileopts&#58;
    fp30      profileopts&#58;
        NumInstructionSlots=<val>
        NumTemps=<val>
    vs_2_x    profileopts&#58;
        DynamicFlowControlDepth=<0 or 24>
        NumTemps=<12 to 32>
        MaxLocalParams=<n>
    vs_2_0    profileopts&#58;
        MaxLocalParams=<n>
    dxvs2     profileopts&#58;
        MaxLocalParams=<n>
    arbvp1    profileopts&#58;
        NumTemps=<12 to 32>
        MaxInstructions=<n>
        MaxLocalParams=<n>
    vs_1_1    profileopts&#58;
        dcls
        MaxLocalParams=<n>
    dx8vs     profileopts&#58;
        dcls
        MaxLocalParams=<n>
    vp20      profileopts&#58;
    vp30      profileopts&#58;
for example, if you want to get the arb code for this vertex shader :

Code: Select all

attribute vec4 testattrib;

void
main &#40; void &#41; &#123;

  gl_Position = ftransform &#40; &#41;;

  gl_FrontColor = testattrib;

  return;
&#125;
you can use :

Code: Select all

cgc -oglsl filename
and you get this arb output :

Code: Select all

vattrib.vert
18 lines, 0 errors.
vs_1_1
// cgc version 1.5.0014, build date Sep 18 2006 21&#58;56&#58;59
// command line args&#58; -oglsl
// source file&#58; vattrib.vert
//vendor NVIDIA Corporation
//version 1.5.0.14
//profile vs_1_1
//program main
//semantic gl_ModelViewProjectionMatrixTranspose &#58; STATE.MATRIX.MVP
//var float4 gl_Position &#58; $vout.POSITION &#58; HPOS &#58; -1 &#58; 1
//var float4 gl_Vertex &#58; $vin.POSITION &#58; ATTR0 &#58; -1 &#58; 1
//var float4 gl_FrontColor &#58; $vout.COLOR0 &#58; COL0 &#58; -1 &#58; 1
//var float4x4 gl_ModelViewProjectionMatrixTranspose &#58; STATE.MATRIX.MVP &#58; c&#91;0&#93;, 4 &#58; -1 &#58; 1
//var float4 testattrib &#58; $vin.ATTR1 &#58; ATTR1 &#58; -1 &#58; 1
mov oD0, v1
dp4 oPos.w, v0, c3
dp4 oPos.z, v0, c2
dp4 oPos.y, v0, c1
dp4 oPos.x, v0, c0
@+
dom
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

dom, you still need binary assembler. cgc compiler output is in the text form.

NV_fragment_program assembler will be great for us.

The main problem with assembler is register compactification. We need that to reduce number of temp registers.
ps2devman
Posts: 259
Joined: Mon Oct 09, 2006 3:56 pm

Post by ps2devman »

Grats on the index buffer breakthrough!

Can't help now because I'm lacking free time, but I will just describe how native shader assembly could be done for nv2A, for xbox1 :
- First the shader model must be identified. For nv2A it's SM 1.1
- Cgc.exe (from NVidia SDK 9.5) translates high level text language into low level assembly text language
- vsa.exe and psa.exe (from earlier Nvidia SDK or DirectX SDK) translate low level assembly text language into binary pseudo code (standard DirectX8 pseudo code)
- in pbkit.c, function pcode2mcode translates pseudo code into native code
(done by studying a lots of binary samples and comparing binary native code and matching pseudo code)

Nouveau stuff study is probably a good way to start. Can't help more for now. If something public, similar to vsa.exe and psa.exe could be found, it may do all the registers optimizations for us. Then the pseudo to native translation should be simple, assuming native code encoding is understood. I'm not experienced with DirectX9 yet, but there might be tools already available (may be more recent versions of vsa.exe and psa.exe, I don't know). Do we assume we are targetting SM 3.0?
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

ps2devman, our target is NV_fragment_program, not SM3.0. NV_fragment_program is very close to hardware and has many unique features ( like pack-unpack ).

We must avoid some PS3.0 core features, such as dynamic branches. This branching is very slow on NV40 class hardware ( I have large PC experience with that ).
Glaurung
Posts: 49
Joined: Thu Oct 11, 2007 4:54 am

xorg driver with blending

Post by Glaurung »

Hi,

For those interested, I've managed to get some time to update my experimental xorg based on IronPeter's and the nouveau team work. It now supports a lot more Composite operations, including alpha blending, through the 3D engine. That means accelerated translucent windows, and it works with Xv too (so you can have accelerated translucent video over your desktop, with windows dropping shadows, etc...).
Still, there are some nasty artifacts on standard rendering (e.g. moving a standard window around without xcompmgr running will lead to serious artifacts) and solid fills are not accelerated, so it is hardly usable for every day use. Moreover, the code is a big patchwork and needs a lot of cleanup. I now plan to accelerate solid fills with the 3D engine too, and get rid of the remaining artifacts. This experimental driver is only proof-of-concept, to check we have everything we need for accelerated X on PS3. Once the driver is functionnal (usable for every day use), I plan to find a way to merge back with nouveau, probably by writing a drm driver.

Code is available here:
http://mandos.homelinux.org/~glaurung/g ... eo-ps3.git

IronPeter, concerning the 3D side, did you check Gallium?
http://www.tungstengraphics.com/wiki/in ... /Gallium3D
I think writing a driver for it shares some common goals with libps3rsx. In particular, it assumes availability of pixel and vertex shaders, and is supposed to be independent from OpenGL.

Final note: I'm using firmware 2.0 now.
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Glaurung, i'll check Gallium, thanks. At first look it is ugly abstraction layer.

Have you some ideas about this topic:
http://forums.ps2dev.org/viewtopic.php?t=9317 ?
gigi
Posts: 10
Joined: Sat Nov 03, 2007 8:10 am

Post by gigi »

just to add the the l33t b33f l33t cod3 a probable meaning , i found this reference googling:

http://www.artima.com/insidejvm/whyCAFEBABE.html

Feel free to remove the post , if you think it's not in topic.

ciao
gigi
unsolo
Posts: 155
Joined: Mon Apr 16, 2007 2:39 am
Location: OSLO Norway

Post by unsolo »

Glaurung

I finished my spu solid (coming in spu medialib) yesterday (last bug gone i think) it has no alignement restrictions, tho its probably useless for you.

Going to work at a copy today so allthough not "gpu" it will hopefully serve as a more permanent SPU driver for non GPU cell / spu sollutions

After i finish copy i would like to start a X driver from scratch using these functions If anyone wants to assist any help is appriciated.

cheers
Don't do it alone.
Falcon
Posts: 4
Joined: Mon Nov 12, 2007 5:23 pm

Post by Falcon »

Just to be sure if i did everything right.

After installing kernel 2.6.23 and applying the kernel gpu patch from Glaurung i tried to run the simple-triangle test
from IronPeter on a non HD screen but nothing shows up on the screen.

Can this test only run on a HD(-ready) screen?
IronPeter
Posts: 207
Joined: Mon Aug 06, 2007 12:46 am
Contact:

Post by IronPeter »

Falcon, just my bad coding. I hardcoded 1280x1024 ( not 720 ) resolution. Just change these hardcoded constants.

Of course, it is better to determine screen resolution from ps3fb, I'll fix that.
Post Reply