This illustrates the basic architecture of the Voice Ordering Framework ecosystem, with cloud services, cloud storage files, database, and clients. There is a Chatbot Service, which has a natural language understanding (NLU) pipeline that processes incoming requests from customers who are patrons of service providers (e.g. a customer at a pizza shop). An example customer request would be “give me a pizza”. The Chatbot Service has logic to determine what to do next during a given turn (referred to as “dialogue control flow management”). Turn-specific tasks include processing user input (possibly a response to  a chatbot-initiated question that took place during the previous turn), internal management actions such as storing newly-acquired information and state, determining the next action, which is usually creation of a chatbot prompt or response to the user, and generating the natural language text of the response/prompt for the user, e.g. “what size pizza do you want?”. 

Clients must perform speech recognition and speech generation. The DialogueSession object’s CreateDialogueSession method is called by clients in order to establish a dialogue session for a customer. Once a dialogue session has been created and is in-progress, the client will repeatedly call the HandleTurn API, providing the user input for the current turn to the Chatbot Service. The client must also process the response from the chatbot during that same turn.