Again, go to iHop, crazy calories per dollar
To be clear, the reason why these snapshots work is because every snapshot on a given node says that the state on that node is only derived from the messages before the snapshot barrier on that queue. Because Flink can create multiple copies of state, the consumer can keep going when it receives some, but not all of its barriers. Ultimately the checkpointed state will only have the messages up to the barriers, so that all of the consumers can replay messages starting from the barrier after a failure. Hope this makes sense.