External Actions allow Vocode agents to take actions outside the realm of a phone call. In particular, Vocode agents can decide to push information to external systems via an API request, and pull information from the API response in order to:

  1. change the agent’s behavior based on the pulled information
  2. give the agent context to inform the rest of the phone call

How it Works

Configuring the External Action

The Vocode Agent will determine after each turn of conversation if its the ideal time to interact with the External API based primarily on the configured External Action’s description and input_schema!

input_schema Field

The input_schema field is a JSON Schema object that instructs how to properly form a payload to send to the External API.

For example, in the Meeting Assistant Example below we formed the following JSON payload:

{
  "type": "object",
  "properties": {
    "length": {
      "type": "string",
      "enum": ["30m", "1hr"]
    },
    "time": {
      "type": "string",
      "pattern": "^d{2}:d0[ap]m$"
    }
  }
}

This is stating the External API is expecting:

  • Two fields
    • length (string): either “30m” or “1hr”
    • time (string): a regex pattern defining a time ending in a zero with am/pm on the end ie: 10:30am

💡 Note

If you’re noticing that this looks very familiar to OpenAI function calling, it is! The Vocode API treats OpenAI LLMs as first-class and uses the function calling API when the agent uses an OpenAI LLM.

The lone difference is that the top level input_schema JSON schema must be an object - this is so we can use JSON to send over parameters to the user’s API.

description Field

The description is best used to descibe your External Action’s purpose. As its passed through directly to the LLM, its the best way to convey instructions to the underlying Vocode Agent.

For example, in the Meeting Assistant Example below we want to book a meeting for 30 minutes to an hour so we set the description as Book a meeting for a 30 minute or 1 hour call.

💡 Note

The description field is passed through and heavily affects how we do our function decisioning so we recommend treating it in the same way you would a prompt to an LLM!

Other Fields to Determine Agent Behavior

  • speak_on_send: if True, then the underlying LLM will generate a message to be spoken into the phone call as the API request is being sent. - url: The API request is sent to this URL in the format defined below in Responding to External Action API Requests

  • speak_on_receive: if True, then the Vocode Agent will invoke the underlying LLM to respond based on the result from the API Response or the Error encountered.

Responding to External Action API Requests

Once an External Action has been created, the Vocode Agent will issue API requests to the defined url during the course of a phone call based on the configuration noted above The Vocode API will wait a maximum of 10 seconds before timing out the request.

In particular, Vocode will issue a POST request to url with a JSON payload that matches input_schema , specifically (using the Meeting Assistant Example below):

POST url HTTP/1.1
Accept: application/json
Content-Type: application/json
x-vocode-signature: <encoded_signature>

{
	"call_id": <UUID>,
	"payload": {
	  "length": "30m",
	  "time": "10:30am"
  }
}

Signature Validation

A cryptographically signed signature of the request body and a randomly generated byte hash in included as a header (under x-vocode-signature) in the outbound request so that the user’s API can validate the identity of the incoming request.

The signature secret is contained in the External Action’s API object and can be found when creating an object (as noted below in the Meeting Assistant Example), or by getting the API object via the /v1/actions?id=ACTION_ID endpoint:

curl --request GET \
  --url https://api.vocode.dev/v1/actions?id=<EXAMPLE_ACTION_ID>\
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <API_KEY>'

Use the following code snippet to check the signature in an inbound request:

import base64
import hashlib
import hmac

async def test_requester_encodes_signature(
request_signature_value: str, signature_secret: str, payload: dict
):
"""
Asynchronous function to check if the request signature is encoded correctly.

    Args:
        request_signature_value (str): The request signature to be decoded.
        signature_secret (str): The signature to be decoded and used for comparison.
        payload (dict): The payload to be used for digest calculation.

    Returns:
        None
    """
    signature_secret_as_bytes = base64.b64decode(signature_secret)
    decoded_digest = base64.b64decode(request_signature_value)
    calculated_digest = hmac.new(signature_secret_as_bytes, payload, hashlib.sha256).digest()
    assert hmac.compare_digest(decoded_digest, calculated_digest) is True

Response Formatting

Vocode expects responses from the user’s API in JSON in the following format:

Response {
	result: Any
	agent_message: Optional[str] = None
}
  • result is a payload containing the result of the action on the user’s side, and can be in any format
  • agent_message optionally contains a message that will be synthesized into audio and sent back to the phone call (see Configuring the External Action above for more info)

In the Meeting Assistant Example below, the user’s API could return back a JSON response that looks like:

{
  "result": {
    "success": true
  },
  "agent_message": "I've set up a calendar appointment at 10:30am tomorrow for 30 minutes"
}

Meeting Assistant Example:

This is an example of a Meeting Assistant which will attempt to book a meeting for 30 minutes or an hour at any time ending in a zero (ie 10:30am is okay but 10:35am is not)

vocode_client.actions.create_action(
    request={
        "type": "action_external",
        "config": {
            "name": "Meeting_Booking_Assistant",
            "description": ("Book a meeting for a 30 minute or 1 hour call."),
            "url": "http://example.com/booking",
            "speak_on_send": True,
            "speak_on_receive": True,
            "input_schema": {
                "type": "object",
                "properties": {
                    "length": {
                        "type": "string",
                        "enum": ["30m", "1hr"],
                    },
                    "time": {
                        "type": "string",
                        "pattern": "^\d{2}:\d0[ap]m$",
                    },
                },
            },
        },
    },
)