My only additional consideration would be that most encryption algorithms in use have 128 bit block sizes, so software crypto on 64 bit architectures is significantly faster.
Typically encryption/decryption routines use SIMD instructions and data size can be anything. Or contemporary instructions. Think about the table of HW-accelerated encryption algorithms … it doesn’t have a direct relation to neither CPU bitness nor CPU architecture, it’s about accelerators (instructions) either available or not.
Think floating point arithmetics, numbers can be single precision (32 bits for a number), double precision (64 bits) and some fetaured also extended precision (80 bits). And that has nothing to do with bitness of a CPU, these were available already in intel 80287 (FP coprocessor for 80286 which was a 16-bit processor) and 8087 (coprocessor for 16-bit 8086).
Yes, 64-bit CPU will enable integer arithmetics on 64-bit integers natively, but 64-bit is not enough for most contemporary algorithms. But you’re right, compound arithmetics is faster if the large numbers can be dplit into lesser components, so even 128-bit integer aritjnetics will take less CPU cycles on 64-bit CPU than on 32-bit CPU.