Alexa is listening for a keyword, the wake word, which alerts the system that you want to interact with it in some fashion. That information is not locally stored. There's nothing stored locally on the device. Once the keyword is recognized, it follows that. There's a light on the device that tells you that the device is now active, and the subsequent sound in the room is then streamed to the cloud.
The first thing the cloud does is to double-check the wake word. The software on the device often isn't sophisticated, so it occasionally makes mistakes. The cloud will then recognize that it wasn't a wake word, and then it will shut off the stream. However, if the cloud confirms that the wake word was used, that stream is then taken through a natural language processing system, which essentially produces a text output of it. From there, the systems take the next action that the user was asking for.