ASM in C program? Include it externally? Some info plz?

sg57 · Post by **sg57** » Sat Aug 12, 2006 4:14 am

Just for fun, I want to learn the basics of ASM and have gotten down variable manipulations and thought I'd make a small calculator application to test my knowledge on it.

I don't know enough to make an entire PSP application out of ASM (don't know how to include, start main, program, loop, etc.) so I was just going to use something like:

Code: Select all

while&#40;1&#41; &#123;

     extern "ASM" &#123;
          mul $product,$factor,$factor; 
          // multiply the factor, return into product register
     &#125;
&#125;

I don't know 'all' the functions such as exponents, but I'm learning.

Tell me if this will work or not? Thanks!

(oh and any links to good ASM teaching tutorials would be awesome ;) Raphael, you have any?)

Raphael · Post by **Raphael** » Sat Aug 12, 2006 4:31 am

if you want to know if this works, why not try to compile? The compiler will tell you if it understands what you want it to do (I'd say definately NO if I was him though...)
Well, if it compiles, I wouldn't run that though... you know... infinite loop and such...

Also I'm not sure if you're already matched for assembly, it's somewhat more complex than C or C++ and seeing how you still lack there...
but well, maybe reading the first few chapters of the following wouldn't hurt for you either: http://chortle.ccsu.edu/AssemblyTutoria ... tents.html

Kojima · Post by **Kojima** » Sat Aug 12, 2006 4:37 am

I'm not sure if this applies to C as well (I use ++ exclusively) but this is how you include asm within a C++ program.

Code: Select all


void AddTest&#40;&#41; 
&#123; 


int num1=5; 
int num2=10; 
int out=0; 
int to = 5;
float r17,r16;
r16=0;
r17=0;
for&#40;int i=0;i<to;i++&#41;
&#123;
	float res = num1*num2;
	r16 = res;
	r17 = r16+r17;
&#125;
printf&#40;"C Res&#58;%f",r17&#41;;

num1 =5;
num2 = 10;
out = 0;
to = 5;
asm __volatile__ &#40;"\n\ 
ori $5,$0,0\n\
ori $17,$0,0\n\
loop&#58;\n\
mul %1,%2\n\
mflo $16\n\
addu $19,$16,$17\n\
move $17,$19\n\
addiu $5,$5,1\n\
bne $5,%3,loop\n\
nop\n\
move %0,$17\n\
"&#58;"=r"&#40;out&#41;&#58;"r"&#40;num1&#41;,"r"&#40;num2&#41;,"r"&#40;to&#41;&#41;; 

	printf&#40;"Asm Result&#58;%d \n",out&#41;; 
&#125;

siberianstar · Post by **siberianstar** » Sat Aug 12, 2006 4:41 am

You never have to write an entire application in assembly,
use inline assembly:

asm (
"istruction 1\n"
"istruction 2\n"
: gcc output constraints : gcc input constraints );

for example

float output, op1, op2;
asm (
"mul.s %0, %1, %2\n"
: "=f" (output) : "f" (op1), "f" (op2) );

note that allegrex doesn't have double registers/instructions

sg57 · Post by **sg57** » Sat Aug 12, 2006 5:28 am

That was what I was aiming for really... I just want to know it incase I want/need to use it somewhere where Im stuck.

Is there any real place to use ASM rather than C/C++, as in a more efficient place?

Post by **TyRaNiD** » Sat Aug 12, 2006 7:49 am

For the most part other than utilising instructions not normally utilised by the C compiler or instruction is just cannot utilised like the VFPU it is rarely necessary to use straight asm.

Take for example byte swapping such as doing htonl in the network code, you can use the wsbw instruction (or the __builtin_allegrex_wsbw gcc instruction) which gives you endian swapping in one instruction rather than doing all the shifts.

The only other reason to use asm tends to be for size coding but most people dont care about doing that :)

sg57 · Post by **sg57** » Sun Aug 13, 2006 4:01 am

So, there's no real need to learn all of ASM, but only in those areas you specified? (I don't care about sizing either)

Kojima · Post by **Kojima** » Sun Aug 13, 2006 4:31 am

Hmm, what about speed? Surely a maths intensive, logic intensive loop can be done at a faster pace in asm if you know what you're doing? (I don't :) )

dsn · Post by **dsn** » Sun Aug 13, 2006 3:33 pm

Modern compilers are better than most programmers at optimizing for speed. There's some debate on this of course--and a number of counter-examples--but the level of skill required to write faster assembly than a compiler is beyond the reach of most programmers. Learning assembly can give you a better understanding of how computer systems work, but the days of using it to write entire programs are long gone.

sg57 · Post by **sg57** » Sun Aug 13, 2006 6:43 pm

I can see how. Assembly is pretty Low-level, right?

C is a pretty medium level language while Python and LUA are way up there in the high-classes.

I still can't believe that is true <astonished> How long ago were developers sitting there, looking at a bunch of symbols and numbers, a few real strings here and there, then all of a sudden, someone decided to just create a new language to replicate (?) ASM but more 'readable' if you will...

Seems like i need a histroy lesson in computer programming <sobbing at lack of knowledge>

Anyone know of some good 'history' lessons out there/ Id love to learn more about this so called 'programming whole programs in ASM are long gone'. Just seems a little creepy at how fast this society's technology is growing...

(I guess that hover boardd coming out in 2016 should come outa tad sooner eh? :))

Djakku · Post by **Djakku** » Mon Aug 14, 2006 12:49 am

AFAIK, asm is platform dependent, that mean the program that you write have to be understood by the processor and porting a ASM app is a real pain. On the other hand it is faster..
Most of the demo in the psp scene were coded in Assembly. I've always thought that ASM was indeed harder to learn but more efficent and stable than C but since now everyone recommend C, i'm confused. Either way i'm not a coder (yet.. :) ) so I could be wrong..
Good luck with your app dude :)

groepaz · Post by **groepaz** » Mon Aug 14, 2006 7:42 am

i doubt most of the demos are written in asm :=P

skistovel · Post by **skistovel** » Mon Aug 14, 2006 10:41 am

Assembly still has a place, but it is usually inline a C/C++ program or in demo scene!

Usually after you finish debugging and you are near release version, you can start optimize your code by replacing often used functions with assembly. Of course you do that after u have checked how you compiler convert it to assembly and if it took a "longer" approach.
On the other hand, coding right will create no such problems, like breaking an algorithm into 3 parts than just doing in it one line. The problem is that there are different optimization u can perform but vary with the compiler u use and hardware architecture u are coding for..

My advice? Stick to C/C++, master it and then start looking for "extra" optimizations!

Post by **TyRaNiD** » Tue Aug 15, 2006 3:29 am

Just to prove a point that ASM coding isn't totally dead i've uploaded my mini fire demo to the main ps2dev.org page. Have fun working out my weirdo asm code :)

Saotome · Post by **Saotome** » Tue Aug 15, 2006 4:43 am

Very nice. Good example how to make eboots as small as possible, i like that :)

Not a good solution for 2.00+ firmwares, because of the hardcoded syscalls, but cool anyway ;)

Post by **TyRaNiD** » Tue Aug 15, 2006 6:10 am

True, but I think the technique would work for up to 2.5 as I think 2.6 was where they started randomising syscall numbers, could be wrong mind.

If people really wanted it I could add proper syscalls to the proceedings, not a hard task, just would add 200 bytes or so to the proceeding which for what I was originally writing the code for seems a waste ;)

Fanjita · Post by **Fanjita** » Tue Aug 15, 2006 8:48 am

TyRaNiD wrote:True, but I think the technique would work for up to 2.5 as I think 2.6 was where they started randomising syscall numbers, could be wrong mind.

For the record, they started at 2.5, although the effect was less obvious - perhaps the size of the random constant was a little smaller or something.

bradskins · Post by **bradskins** » Thu Sep 14, 2006 5:28 pm

what are the thoughts on solving this randomization thing? Or is psp assembly officially dead at 2.5?

EDIT: nm I have a working solution

hlide · Post by **hlide** » Thu Sep 14, 2006 7:16 pm

TyRaNiD wrote:For the most part other than utilising instructions not normally utilised by the C compiler or instruction is just cannot utilised like the VFPU it is rarely necessary to use straight asm.

Take for example byte swapping such as doing htonl in the network code, you can use the wsbw instruction (or the __builtin_allegrex_wsbw gcc instruction) which gives you endian swapping in one instruction rather than doing all the shifts.

The only other reason to use asm tends to be for size coding but most people dont care about doing that :)

1) Do you know where I could find a whole list of this __builtin_allegrex_* ? being in Windows and haven't any place to install a linux (a real one or a vmware-d one). It is quite a nightmare to find the source of psp-gcc.

2) Do these intrisics include VFPU ones ? I suppose not since it looks as if GCC doesn't clobber VFPU registers.

3) Where can I find details of constraints matching of asm gcc for allegrex ?

Yes I know I ask a lot but still up ti now, I am unable to find those answers on internet :(((

hlide · Post by **hlide** » Thu Sep 14, 2006 8:05 pm

siberianstar wrote: note that allegrex doesn't have double registers/instructions

Are you speaking about them ?

Table 1-10 Extensions to the ISA: Load and Store Instructions

OpCode Description
LD Load Doubleword
LDL Load Doubleword Left
LDR Load Doubleword Right
LL Load Linked
LLD Load Linked Doubleword
LWU Load Word Unsigned
SC Store Conditional
SCD Store Conditional Doubleword
SD Store Doubleword
SDL Store Doubleword Left
SDR Store Doubleword Right
SYNC Sync

Table 1-11 Extensions to the ISA: Arithmetic Instructions (ALU Immediate)

OpCode Description
DADDI Doubleword Add Immediate
DADDIU Doubleword Add Immediate Unsigned

Table 1-12 Extensions to the ISA: Multiply and Divide Instructions

OpCode Description
DMULT Doubleword Multiply
DMULTU Doubleword Multiply Unsigned
DDIV Doubleword Divide
DDIVU Doubleword Divide Unsigned

etc.

By the way, does someone know if ll/sc are available on allegrex ?

EDIT:

I try this one :

Code: Select all

void enter_critical_section&#40;int *r1&#41;
&#123;
  __asm__ volatile &#40;
    "loop1&#58;\n"
    "ll $v1, &#40;%0&#41;\n"
    "blez $v1, loop1\n"
    //"nop\n" // gcc seems to add it itself
    "addi $v0, $v1, -1\n"
    "sc $v0, &#40;%0&#41;\n"
    "beq $v0, 0, loop1\n"
    //"nop\n" // gcc seems to add it itself
    &#58; &#58; "r"&#40;r1&#41; &#58; "v0", "v1"&#41;;
&#125;

and added in cellshading.c (pspsdk sample for Gu):

Code: Select all


int cs = 1;

int main&#40;int argc, char* argv&#91;&#93;&#41; &#123;
	/* Setup Homebutton Callbacks */
	setupCallbacks&#40;&#41;;

	enter_critical_section&#40;&cs&#41;;

	/* Generate Torus */
...

After compiling and checking the assembly code to see if enter_critical_section contains ll/sc pair and is called by main, I run it succesfully on my PSP.

I wonder if LD/LDL/LDR/LLD/LWU/SCD/SD/SDL/SDR/etc. work too.

cheriff · Post by **cheriff** » Thu Sep 14, 2006 11:13 pm

The difference is that all those instructions refer to doublewords, which is simply twice the length of a word (probably 64 bits since PSP is a 32b machine)
On ps2 there are instructions dealing with quadwords - 128bit quantities.

Without looking anything up, I'd hazard a guess that the add, multiply and so on inteructions listed above are all integer operations on what would effectively be a long int. (Feel free to add a grain of salt to that one, however)

What neither of the machines support are double precision floats and associated arithmetic. So if you define two doubles and try to add them, the compiler will include all this extra code to deal with the floating point stuff. If you instead added two floats (ie single precision) then this can be done in a single instruction.

EDIT: After thinking a bit and looking something up I see the seem to apply to machines with 64bits in each register (I was assuming you could maybe fuse r2 and r3 togther to make 64bits, like elsewhere). So i guess it all comes down to how wide each GPR is in PSP. I really shouldn't be posting after 11pm ;)

hlide · Post by **hlide** » Fri Sep 15, 2006 3:55 am

cheriff wrote:The difference is that all those instructions refer to doublewords, which is simply twice the length of a word (probably 64 bits since PSP is a 32b machine)
On ps2 there are instructions dealing with quadwords - 128bit quantities.

Without looking anything up, I'd hazard a guess that the add, multiply and so on inteructions listed above are all integer operations on what would effectively be a long int. (Feel free to add a grain of salt to that one, however)

What neither of the machines support are double precision floats and associated arithmetic. So if you define two doubles and try to add them, the compiler will include all this extra code to deal with the floating point stuff. If you instead added two floats (ie single precision) then this can be done in a single instruction.

EDIT: After thinking a bit and looking something up I see the seem to apply to machines with 64bits in each register (I was assuming you could maybe fuse r2 and r3 togther to make 64bits, like elsewhere). So i guess it all comes down to how wide each GPR is in PSP. I really shouldn't be posting after 11pm ;)

well, trying to compile

Code: Select all

asm &#40;"ld $v0,0&#40;$a0&#41;"&#41;

gives me to my surprise after dissassembling the produced .o file

Code: Select all

lw v0,0&#40;a0&#41;
lw v1,4&#40;a0&#41;

instead of an expected instruction exception :/

adrahil · Post by **adrahil** » Fri Sep 15, 2006 4:31 am

Yeah, it's normal since it loads the doubleword into two registers, which each contain a word :)
v0 now contains the 4 upper, and v1 the 4 lower...

cheriff · Post by **cheriff** » Fri Sep 15, 2006 9:05 am

cool, so it does split the 64 bits over two 32b registers, I wasn't sure if that was the case... My MIPS has unfortunately been slipping recently and I'm out of practice :(

hlide · Post by **hlide** » Sat Sep 16, 2006 8:23 am

adrahil wrote:Yeah, it's normal since it loads the doubleword into two registers, which each contain a word :)
v0 now contains the 4 upper, and v1 the 4 lower...

well, I thought in MIPS64, v0 is 64-bit wide and not 32-bit wide. So a "ld v0,(a0)" was not exactly equivalent to "lw v0,0(a0);lw v1,4(a0)" for me because v1 should not be modified as being another 64-bit register. Just imagine my X86 assembler will transform "mov eax,[di]" into "mov ax,[di+ 0]; mov dx,2[di + 2]" because I want to generate a binary for a 8086, it would be weird and false!!! I have the same reaction here !

but if you tell me that in fact a 64-bit register in mips64 is spanning two registers (here, v0 and v1) the same way as "ax" is spanning "ah" and "al", well I can understand its transformation in mips32.

cheriff · Post by **cheriff** » Sat Sep 16, 2006 10:41 am

hlide wrote: well, I thought in MIPS64, v0 is 64-bit wide and not 32-bit wide.

This is in fact true on a 64 bit platform. From some random google search on the mpis64:

LD rt, offset(base) Load doubleword. The contents of the 64-bit doubleword at the memory location specified by the effective address are loaded as a signed value and stored in GPR rt.

However, on MIPS32:

ld Rdest, address Load Double-Word Load the 64-bit quantity at address into registers Rdest and Rdest + 1

That is the architectural definition of what the instruction does. Now either psp doesnt support the LD opcode (or at least that the assembler is aware of) so it instead is treats is as a pseudo-op and generates equivalent instructions.

It is assumed that the programmer is aware that LD into v0 will also overwrite v1, as per the definition.

Hope this helps somewhat :)

hlide · Post by **hlide** » Sat Sep 16, 2006 10:43 am

a lot :)

Raphael · Post by **Raphael** » Sun Sep 17, 2006 8:25 am

The PSP is a MIPS32 architecture actually. That's why 64bit ops will be aliased to two 32bit instructions.