Just quickly would it not be quicker to use the Extension register as a fixed offset from 256 byte pages - it can be used on both pointers and then only change the High Bytes to move through pages?
In my SCRUMPI serial bootstrap I use a fixed page endpoint (rather than like you creating a general COPY) but, you could make that part of a pointer... PUTCHAR is the serial transmit in this instance of a similar approach
Code:
LDI /PUTCHR-1
XPAH P3
LDI #PUTCHR-1&255
XPAL P3
REPEAT:
LDI /MESSAGE-1
XPAH P2
LDI #MESSAGE-1&255
XPAL P2
LOOP:
LD @1(P2)
XPPC P3
LD -1(P2)
XRI #0x04 ;Is it EOT
JNZ LOOP