Would it then be correct to overly simply things by saying that the source and destination referred to in the NAT/Mangle etc. settings, are from the perspective of the point of origin (which ever side starts the conversation)?
I guess you could say that but it is really overly simplified and thus bit ambiguous.
In NAT, the decision whether and how to translate any addresses actually happens ONLY during the first packet of the connection. The decision is then saved into conntrack entry and applied to any subsequent packet. Due to that, NAT rules won't ever see any reply or any subsequent packet. (therefore your statement is true)
in firewall filter, USUALLY only first packet is taken into account for filtering while reply and any subsequent packets are USUALLY handled by "accept established/related" which will simply accept any packet which belongs to the same connection, no matter whether it goes from A to B or from B to A. (this applies for usual configurations and again, in such case your statement is true. There may be situations where it does not apply)
in RAW, every packet is processed (unless the whole connection is fasttracked) and therefore I wouldn't dare to simplify it same way as you did because it becomes imprecise. RAW focuses purely on separate packets and occurs before each packet gets assigned to its relevant connection (very simply said)
in Mangle, it depends on user's needs and very simply said, it may be considered as a mix of filter+raw - it is often used to process each packet separately but unlike RAW, it is aware of connections so it really depends how people set it up.
in regard to your second question, You seem to be curious soul! I like that
I would definitely recommend you to go through following article:
If you really understand this in every detail, it can be used to explain practically every process, which happens in your router. There are also some examples on the bottom including VPN, tunnels and VLANs
hopefully that helps