Flex Logix has developed an alternative ram architecture that’s able to speed up neural networks significantly, by facilitating very fast (parallelised) access to edge weights and their multiplication and accumulation: https://spectrum.ieee.org/tech-talk/semiconductors/processors/flex-logix-says-its-solved-deep-learnings-dram-problem
Intuitively, this doesn’t seem too far off from doing the same with cuckoo cycle’s edge counting/flagging? But then I only know the high-level idea of how cuckoo cycle works, so maybe it doesn’t apply at all?