On our journey to GenAI-enabled Enterprise Knowledge we have been using Amazon Q Business to build a proof of concept. In the spirit of “starting small”, we focused on the Sales Persona and Sales Knowledge. In this article, we look into using the power of Generative AI for Sales Knowledge. And once again we will build our solution with Amazon Q.
Ideally, using our Sales Knowledge proof of concept, Sales people can get immediate answers to questions like:
- “What was the most recent proposal we have submitted to Client X?”
- “Find me examples of sales templates for AWS Cloud Migration projects”
- “What were the meeting notes and key actions from the ABC project kickoff on the week commencing 9 June 2024?”
Current working practice requires Sales people to use standard search tools and to canvas fellow colleagues for this information. It can take time, often hours to find the right information. It also disturbs colleagues when asking for their help, leading to context switching and productivity drain. We wanted to significantly improve efficiency and productivity for our Sales people.
Types of Sales Documents
The goal was to ingest (using RAG, or Retrieval Augmented Generation) all of our UK Sales documents into Amazon Q. The documents had some structure in terms of folder and file naming conventions and versioning, but on the whole could be considered as unstructured data. The size of this data was approximately 200 Gigabytes (GBs).
We quickly discovered two things:
- ingesting this amount of data is slow (5 hours for a full sync), computationally expensive and possibly financially expensive. This would not scale to all data for all countries within our organisation, which is a magnitude larger amount of data – 100s of Terabytes (TBs).
- much of the data in the Sales folder was neither suitable nor needed for our AI model.
Our Sales documents are organised around clients, as you might expect. Within each client’s main folder there are sub-folders based on opportnities (or work packages) that we have sold or are trying to sell to that client.
A typical opportunity folder contains these types of documents:
- Statements of Work (SoW)
- Purchase Orders (POs)
- Working at Risk (WAR) Requests and Approvals
- Job Descriptions (JDs) of the roles needed for the opportunity
- Curriculum Vitae (CVs) of candidates for the opportunity
- Pricing spreadsheets
- Resource planning spreadsheets
- Other supporting documents
What to do with our data?
Note that this may not be the best way to organise Sales documentation, but it highlights issues that we face with any ingestion of enterprise data into an AI model – much of the data is not needed and not relevant.
- How do we deal with low or no value information?
- How do we feed the AI with only relevant data?
We came up with several options, and pros and cons with each option.
1. Manually clean the data
Pros:
- good AI responses
- fast for a proof of concept
Cons:
- labour intensive at scale
- error prone
- deals with past data, but not future data
2. Automatically clean the data
Pros:
- good AI responses
- works at scale
- works for past and future data
Cons:
- requires data retention rules
- requires automation engineering
- due to the above, it was a blocker for our proof of concept
3. Leave the data as is, and ingest everything
Pros:
- easiest option
- no need to worry about data cleaning
Cons:
- poor responses from the AI model
- does not address the storage of no-value data
- computationally and financially the most expensive option
- may not scale
- paying for no-value data twice – once to store, and again to ingest in an AI model
4. Leave the data as is, but only ingest a subset of data based on labels, tags or naming conventions
Pros:
- good AI responses
- no need to worry about data cleaning
Cons:
- does not address the storage of no-value data
- labelling can be time consuming, possibly requires manual effort
- Amazon Q does not yet support ingestion based on labelling
In my view, it is always best to start with option 1 – manually clean a small subset of the data. It is only by manually trolling through the many folders of sales documents that we learn what is actually being stored in the folders, and what data is relevant for GenAI. Through this experience of manual investigation and cleaning we can then come up with data retention rules – what data do we really need, and how long do we need it.
The ideal option in my opinion is option 2 – automatically clean the data. But we cannot automate clean the data until we have data retention rules to understand what should stay and what can be deleted.
Option 3 did not work for us as mentioned above. This option is unlikely to scale to 100s of TBs of data using Amazon Q.
Option 4 is definitely a viable option if your AI ingestion mechanisms support labelled data. At present, Amazon Q does not, so option 4 was not available to us.
Data Retention Rules
Through manual investigation and cleaning of documents in our Sales folders, we came up with the following data retention rules.
For all the documentation we store in our sales drives, in the long term we only need two documents per work package for our Generative AI for Sales solution – a countersigned Statement of Work, and a countersigned Purchase Order. None of the other versions of these documents, or other internal-only documents are needed once the work is finished. Thus about 80% of the data is of no value to an AI model!
Once we understand and create our data retention rules, we can then create automated scripts to clean our data based on these rules. Also note that many people say that it is impossible to clean enterprise data. “There is too much”. “We don’t even know where to start.” But using the above rules we could manually clean the entire UK Sales drive probably in a couple of days. So while the work may be manual and tedious, it is definitely not impossible.
Given the above investigation we can now come up with an Implementation Plan.
Implementation Plan
Phase 1: Proof of Concept
1. Manual Cleaning:
- Select a small subset of clients.
- Apply data retention rules to clean the data manually.
2. Data Ingestion:
- Use Amazon Q to ingest the cleaned subset of data.
- Configure RAG to improve the relevance of retrieved information.
3. Testing and Validation:
- Test the system with typical sales queries.
- Validate the accuracy and relevance of responses.
Phase 2: Scaling Up
1. Define and Automate Data Retention Rules:
- Based on insights from the proof of concept, define comprehensive data retention rules.
- Develop scripts or automation tools to clean data according to these rules.
2. Automated Data Cleaning:
- Implement automated data cleaning across the UK Sales drive.
- Prepare for scaling to other regions/countries.
3. Continuous Ingestion and Updates:
- Set up automated ingestion pipelines to continuously update Amazon Q with cleaned data.
- Ensure the system handles new data as it arrives.
Summary and Further Steps
Hopefully, this deep dive into ingesting unstructured data into a GenAI model is instructive and will help with utilizing Generative AI for Sales.
What we learned:
- Up to 80% of our unstructured data is not needed for our GenAI models
- Cleaning the data is not as hard as it may appear
First steps to a working Enterprise Knowledge solution:
- Manually clean a subset of data to build the proof of concept.
- Use insights to create data retention rules for automated cleaning.
- Implement automated data cleaning and ingestion processes for scalability.
Future Steps
- Expand to other regions after successful implementation in the UK.
- Continuously refine AI models based on user feedback and changing business needs.
By following this structured approach, you can build a robust Sales Knowledge product that leverages Amazon Q to enhance productivity and efficiency for your sales team.
Unlock Generative AI for Sales Knowledge
Building a GenAI-powered Sales Knowledge product with Amazon Q Business is achievable and can significantly enhance sales productivity. By starting small, focusing on high-value data, and establishing robust data management practices, organizations can overcome the challenges of unstructured data and unlock the full potential of generative AI for sales teams.
If you are interested in exploring other GenAI solutions, follow the last part of our Enterprise Knowledge journey to see a quick comparison between Amazon Q Business, Google Gemini and Overlayer.
This is Part 5 of our AI for Enterprise Knowledge series. Start with the introduction in Part 1, see best practices for creating Enterprise Knowledge AI assistant in Part 2, learn about the Importance of data management in Part 3, explore cost management in Part 4.