try using aligned addresses for source/dest. (i'd start with page aligned, then reduce alignment until it crashes again so you can find out whats the minimum required)
int sceDmacMemcpy(void *dest, const void *source, unsigned int size);
int sceDmacTryMemcpy(void *dest, const void *source, unsigned int size);
I used sceDmacMemcpy only once in an experiment to see if it would speed up transferring data from ram to vram (or vram to vram, don't have the sourcecode of those tests anymore). Transferring an array of 512 bytes worked fine for me, an array of 272*228*2 (124032) bytes worked too.
If the source data was placed in ram, I've either been lucky with the memory alignment, or it could be alignment didn't matter.
If I did place te source data placed in vram after all, the start address would've been 0x44100000.
Destination address in all cases was at pixel 0 of a display line in vram (with 15bit color mode).
I was probably just lucky my parameters met the alignment requirements, but at least you should be able to create something that works now. After that you can try breaking it to find out how many bytes alignment is needed (my guess would be 2 or 4 bytes).