Hello Heinrich,
1) There are two reasons for the delay on power up. One is that the AT3 module itself has to initialize and begin transmitting (this takes from 200 to 500ms). Then on the receiver side (the USB stick), the module has to find the transmitting signal via search mode. Search mode can vary in how fast it will find the signal.
There is not much you can do about the AT3 initialization. Because the AT3 has two chips, it does take a bit longer to initialize than the AP1 or AP2, but of course you cannot run sensRcore on those other chips.
On the receiver side, make sure you disable low priority search (under advanced, set the timeout to 0). You can easily test how long it takes to acquire a 32 Hz signal by setting up two devices in antware and counting how many messages it takes before the receiver finds the signal. I did a few runs of this and it appears to take as long 500ms.
So with your setup you are looking at about 1 second total time to find your signal, which is about  what you are seeing. The only way to reduce the search time would be to increase the channel period from 32 Hz (the higher this number, the faster the search time). Of course power usage goes up as you increase the channel period, but this might not be a problem for you.
2) I can't say exactly what you should do, but I think you are on the right track. To reduce search times you need to have a very high channel period. There is no way around this. I think you can measure delay more precisely by counting messages. Being able to monitor both sides is key, and I would suggest doing some simulations with antware or even simple custom programs first.
3) If you made a custom module using AP2, you might be able to reduce the delay a bit since AP2 is faster to initialize. The AT3 module is not optimized for speed, so if you did your own design you might be able to make the initialization faster. You do need to understand a target of 0.2s end to end is definitely pushing the limits of what ANT is capable of.