Pretty smart, but those jumps to get the exchanges right are hard to follow. I like how you use E to save the value. I never had so much trouble with forks in the data flow as I do on SC/MP.
But in the end the core loop does not look faster to me. I unrolled the loop of the 256 byte copy and now the speed is good enough for me:
Code:
copyloop1:
ccl
ldi 2
copypage1:
xae
ld @1(p1) ; copy from source to destination
st @1(p2)
ld @1(p1) ; copy from source to destination
st @1(p2)
lde
adi 2
jnz copypage1
ld 1(p1) ; copy next to last byte of page
st 1(p2)
ld 2(p1) ; copy last byte of page
st 2(p2)
ccl ; 16 bit increment of p1
xpal p1
adi 1
xpal p1
xpah p1
adi 0
xpah p1
ccl ; 16 bit increment of p2
xpal p2
adi 1
xpal p2
xpah p2
adi 0
xpah p2
ld @-1(p3) ; decrement p3
xpal p3
xae
lde
xpal p3
lde
jnz copyloop1
I should be on a good way with my boot ROM now, thanks a lot!
Michael