What I've come up with is to create an extra 8 bit framebuffer, one that is palettized to a grayscale palette. Then for each light I render a light texture (which is basically a sphere) to this buffer (additively). Afterwards I render a fullscreen quad with this light buffer set as texture and the texfunc set to GU_TFX_MODULATE.
example light texture (point light)

Problem: you can only render to a 16 or 32 bit texture.
Could I still do this by treating the light buffer as 32 bit? Then each color component (R,G,B,A) would just be a different light pixel. So I would actually store 4 pixels in 1 light buffer pixel as well as the point-light texture (see pic).
The main thing I don't know about this setup is if the Alpha component of the light buffer will get proper additive blending. In other words is each component treated the same or is Alpha a special case?