For many practical use cases it may be necessary to give bots information; but, due to limited
context windows, there may not be enough room in the prompt. Thus, Vocode allows you to plug into
vector databases that contain embeddings.
Each time the bot receives a message, it can query for the most similar embeddings; these embeddings will
be shown to the agent to guide its responses.
Currently, we support Pinecone. Under the hood, we use an approach similar
to LangChain to store the documents in Pinecone. Each vector in Pinecone must have two pieces of metadata
to be compatible with Vocode:
text: The text that will be shown to the agent.
source: The name of the document where the text comes from. This could be the title of an article,
for example.
You can manaully add documents to Pinecone any way you like as long as you include the required metadata.
If you have a folder of PDFs, docx files, text files, etc. that you want to add to pinecone, you can use
the below script which uses Unstructured to parse
many kinds of files types, extract the text, and add it to pinecone.
The script was tested with these package versions: