remotejoy improvement idea

Discuss the development of new homebrew software, tools and libraries.

Moderators: cheriff, TyRaNiD

Post Reply
User avatar
jean
Posts: 489
Joined: Sat Jan 05, 2008 2:44 am

remotejoy improvement idea

Post by jean »

Sorry if i post this BEFORE i take a look into psplink/remotejoy sources, but i think that quite obviously the remote screen feature is implemented sending continuously the content of entire vram to host. So this project (http://www.psp-ita.com/?module=news&id= ... ew_reply=1) basically uses openGL to blit images and nothing else, i guess (feel free to stop me if i say too many bulls****). I was wondering if this could be accelerated by handling graphic functions on host. I mean: let's hook sceGu* functions with something that sends over USB or WiFi just the command to call the corrispondent graphic primitive on host. sceGuTexImage, for instance should also load texture data to host (that could cache it, too). This could speed up things a lot if PSP binary being "redirected" does not expensive per-pixel operations. The question is (if a question has to be made):
am i right? How are things implemented right now? Is video uncompressed as i suspect? If you are versed in this kind of things and like to discuss it, then i will exploit your knowledge, else -please- don't flame me saying to search and look into sources: I will surely do, but i liked to share my ideas as soon as possible without a several-weeks-indeep-research.
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

Yes, remotejoy sends the whole framebuffer to the host each frame (and if set, reduces it to 16bpp and/or half size) and the openGL remotejoy mod improves the host application by rendering the sent framebuffer through OpenGL (hence allowing hardware filtering through pixelshaders to be applied).
However, it wouldn't be plausible to hook all sceGu* functions to send just the commands, since as you already noticed, you'd also have to send all the textures and geometry data, which very likely turns out to be much much larger than the actual rendered frame. If you wanted to apply "caching" on host as you suggested, you'd need to write a protocol for identifying used models/textures through an ID, then if the host doesn't have that ID cached, he needs to send a request back to the PSP to transmit the resource.
Apart from the fact that you need a lot of logic for this on the PSP side (which decreases game performance probably more than a simple buffer copy due to branches etc. possibly killing icache performance), the transmits would occupy a large amount of transfer speed and generate a lot of peeks that will lag the game inconsistently again.
With all that comes the additional problem of a working sceGu emulator/interpreter on the host side. It's nowhere near impossible to do, but it surely won't be easy to achieve in a highly compatible way. And what about games that do not use sceGu functions for some things they do?

In the end, you'll just get a less compatible, more laggy version of remotejoy and that's surely not something we would want. After all, Tyranid chose a clever method for the matter and the options for halfsize and 16bpp downsample should decrease performance issues already.
The only other method worth discussing would probably be implementing an additional method of compression on the framebuffer data, but that was already a point of discussion before the release of remotejoy. Some (including me) argued that it wouldn't be too plausible because of the required CPU power on the PSP side. Tyranid prooved wrong (at least halfway) with the 32bit->16bit transform, which already is a form of compression though not very complex and easily achieved in a fast way by using VFPU commands.
<Don't push the river, it flows.>
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki

Alexander Berl
TyRaNiD
Posts: 907
Joined: Sun Jan 18, 2004 12:23 am

Post by TyRaNiD »

I agree with that, emulating in effect the front end of the GE would be difficult indeed, if not impossible and probably not worth trying, still could be proved wrong. Of course you also have to take into account weirdness in the vram (for example swizzling) and you would have do weird things with the commands...

As for compression yah I don't think there is much you can do, the 16bit down sample was almost in effect free (I had to after all copy the buffer anyway through some means). A real compression algorithm would waste too much CPU cycles and simple block elimination might only be useful on static images rather than actual game data defeating the object somewhat.
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

You may try those copy functions :

Code: Select all

# void transfer8888_64&#40;void *target, const void *source, int n /* 64-byte unit */ &#41;
    .set	noreorder
    .set	noat
    .text
    .global	transfer8888_64
    .ent	transfer8888_64
transfer8888_64&#58;

    li			$at, 0x40000000
    cache		0x1e, 0&#40;$a1&#41;		# fill cache line
    or			$a0, $a0, $at
0&#58;  addiu		$a2, $a2, -1
    lv.q		 c000, 0&#40;$a1&#41;		# load 64 bytes
    lv.q		 c010, 16&#40;$a1&#41;
    lv.q		 c020, 32&#40;$a1&#41;
    lv.q		 c030, 48&#40;$a1&#41;
    sv.q		 c000, 0&#40;$a0&#41;, wb	# store 64 bytes
    sv.q		 c010, 16&#40;$a0&#41;, wb
    sv.q		 c020, 32&#40;$a0&#41;, wb
    sv.q		 c030, 48&#40;$a0&#41;, wb
    cache		0x1e, 64&#40;$a1&#41;		# fill next cache line
    addiu		$a1, $a1, 64
    addiu		$a0, $a0, 64
    mfvc       $zr, $255
    bgtz		 $a2, 0b				# loop again ?
    vnop							# keep this one just after the last sv.q and before any other VFPU instruction !
    j			 $ra
    nop
    
    .end	convert8888_to_5650_64

# void convert8888_to_5650_64&#40;void *target, const void *source, int n /* 64-byte unit */&#41;
    .set	noreorder
    .set	noat
    .text
    .global	convert8888_to_5650_64
    .ent	convert8888_to_5650_64
convert8888_to_5650_64&#58;

    li			$at, 0x40000000
    cache		0x1e, 0&#40;$a1&#41;		# fill cache line
    or			$a0, $a0, $at
0&#58;  addiu		$a2, $a2, -1		# 
    lv.q		 c000, 0&#40;$a1&#41;		# load 64 bytes
    lv.q		 c010, 16&#40;$a1&#41;
    lv.q		 c020, 32&#40;$a1&#41;
    lv.q		 c030, 48&#40;$a1&#41;
    vt5650.q	c000, c000
    vt5650.q	c002, c010
    vt5650.q	c010, c020
    vt5650.q	c012, c030
    sv.q		 c000, 80&#40;$a0&#41;, wb	# store 32 bytes
    sv.q		 c010, 16&#40;$a0&#41;, wb
    cache		0x1e, 64&#40;$a1&#41;		# fill next cache line
    addiu		$a1, $a1, 64
    addiu		$a0, $a0, 32
    mfvc       $zr, $255
    bgtz		 $a2, 0b				# loop again ?
    vnop							# keep this one just after the last sv.q and before any other VFPU instruction !
    j			 $ra
    nop
    
    .end	convert8888_to_5650_64
kurokaze
Posts: 1
Joined: Thu Aug 07, 2008 4:32 pm

Post by kurokaze »

I was at one point considering the possibility of rewriting the RJ transfer to transfer frames in bits and pieces rather than all at once, so as to reduce PSP memory overhead. PSP memory is currently the thing that is killing stability. I decided it was too big a project for a novice like myself though, so I settled for the 16-bit limitation and multiple binaries with different memory locations and thus different crashes.

I guess that's what the asm above is doing? Transferring in 64 byte chunks? Or is it just another method for doing what RJ already does that might be better in some other way? I certainly can't read asm, so I have no idea. Implementing the above is definitely way beyond me at the moment.

Regarding the original idea... I think there's some potential in that direction. Certainly not with HLE of Gu functions or anything of that nature, and possibly/probably with a performance hit that makes it unusable for anything but fooling around, but couldn't we, for example, grab the depth buffer (assuming there is one; I'm not versed on the PSP's graphics) and pipe that to RJGL for use in shaders? There's some interesting shaders out there that work on render-to-texture + depth buffer.
User avatar
jean
Posts: 489
Joined: Sat Jan 05, 2008 2:44 am

Post by jean »

Well, thank you all to "rationalize" my "stream of consciousness" to say it as Joyce did... I recognize the difficulty intrinsic in such a work and i don't think i could ever escape alone from the intrinsic complexity of PSP graphic hardware, BUT i still think that an idea following what i first wrote could be funny. Useless for that of you owning a slim with video-out cable, but funny. Listen, there are many people working on "emulators" that still need a good GU emulation layer..when you want to code something but you worry about usefulness, the keyword is "reusability": following the specs included in that txt file in pspsdk source pack, it's not impossible to code a _coarse_ openGL based GU emulator that could be used in a project like this as well. I'll write down a first basement of code when i'll have the time to.
User avatar
Raphael
Posts: 646
Joined: Tue Jan 17, 2006 4:54 pm
Location: Germany
Contact:

Post by Raphael »

If you do, consider writing a direct interpreter for the actual GE commands (the ones from the gu/doc text file) rather than the sceGu* functions. It's surely easier to hook the Ge list and transfer the raw commands rather than hooking all single sceGu* functions.
<Don't push the river, it flows.>
http://wordpress.fx-world.org - my devblog
http://wiki.fx-world.org - VFPU documentation wiki

Alexander Berl
TyRaNiD
Posts: 907
Joined: Sun Jan 18, 2004 12:23 am

Post by TyRaNiD »

Ultimately the PSP (at least old one) doesn't have an awful lot of free memory to randomly stick stuff, you might be able to reduce memory usage by sending only small chunks but if I recall you take a fair hit on performance, of course you might be able to do something about that by double buffering and making usbhostfs more asynchronous, which _is_ alot of work :P
hlide
Posts: 739
Joined: Sun Sep 10, 2006 2:31 am

Post by hlide »

kurokaze wrote:I guess that's what the asm above is doing? Transferring in 64 byte chunks? Or is it just another method for doing what RJ already does that might be better in some other way? I certainly can't read asm, so I have no idea. Implementing the above is definitely way beyond me at the moment.
the idea was to prefetch a cacheline through VFPU to store it into another place the content through the write-buffer. The first is for transferring RGBA8888 colors (but you need 64-byte aligned addresses). The second to transfer RGBA8888 into RGBA5650. The second may also be modified to transfer 8 RGBA8888 into 4 RGBA5650 instead of 4 RGBA8888 into 2 RGBA5650 per loop.

but apparently "cache 0x1e, 64($a1)" on the next cacheline would stall any load/store operation (~70 cycles) as Allegrex cache has no "hit under miss" sadly, so this not a good idea to keep this "cache" instruction in those codes.
User avatar
jean
Posts: 489
Joined: Sat Jan 05, 2008 2:44 am

Post by jean »

If you do, consider writing a direct interpreter for the actual GE commands (the ones from the gu/doc text file) rather than the sceGu* functions. It's surely easier to hook the Ge list and transfer the raw commands rather than hooking all single sceGu* functions.
...exactly what i was thinking about...
Ultimately the PSP (at least old one) doesn't have an awful lot of free memory to randomly stick stuff, you might be able to reduce memory usage by sending only small chunks but if I recall you take a fair hit on performance, of course you might be able to do something about that by double buffering and making usbhostfs more asynchronous, which _is_ alot of work :P
...uhm...never thougt of this. I think that first attempt should be on his own rather than to get coded into existing remotejoy system. Oh, and i think that first attempt will use WiFi (= little or no work on PC side for data exchange). Has anyone stressed effective PSP's bandwhidths? Is WiFi a real 54Mbps? Does USB reach 480Mbps? (i guess no, so the next question is: how far it pushes?) Think i will do some benchmarking, even if for such a method, bandwidth is not the bigger of all the problems...
@hlide: thanks for sharing your code
Cpasjuste
Posts: 214
Joined: Sun May 29, 2005 8:28 am

Post by Cpasjuste »

When working with a psp ftpd code, i was not able to get a better transfer speed of approximatively 400 kBytes/sec, while the psp is running at 333Mhz (dunno why but the BW seems improved while O/C).
pspZorba
Posts: 156
Joined: Sat Sep 22, 2007 11:45 am
Location: NY

Post by pspZorba »

my two cents:

PSP's wifi is not 802.11g but 802.11b with a bandwidth of only 11Mbps.
I have the same measurements as Cpasjuste.
(But these measurements are hard to rely on, since it depends of losts of external things as the router and PC).

Anyway I feel that the kernel software part is not as good it should be on the PSP side.
--pspZorba--
NO to K1.5 !
Post Reply