AI Allstate NI

Allstate

Nominated Award:
Best Application of AI in a Large Enterprise

Website of Company:
https://www.allstateni.com

Company Background and Market Position                                                                                                  The Allstate Corporation is the largest publicly listed personal lines property and casualty (P&C) insurer in United States of America. Allstate Northern Ireland (ANI) was established in Belfast in 1999 to provide high quality software development services and business solutions in support of our U.S. parent’s global operations. Allstate operates predominantly in USA and Canada, and also in Europe and Australia. Allstate is one of the top 100 companies on the Fortune 500 list in USA, with revenues of more than $53Bn, assets of more than $99Bn, 54k employees and 190M policies in force. ANI is the largest Information Technology employer in Northern Ireland, with staff of ~2,400, dedicated to supporting the Allstate corporation.

Product and Services                                                                                                                                      Automotive insurance is a major product line for Allstate. It provides the opportunity for cross selling other product lines such as Home, Renters, and other personal lines. In addition to this business, Allstate provides products/services in adjacent markets, such as
• Allstate Roadside Services
• Leisure boats and Recreational Vehicles
• Protection Services – Phones, Laptops, domestic appliances
• Automotive vehicle extended warranties
• Identity Protection and Recovery

Additionally, Allstate has established a company focused on Automotive Telematics, called Arity, headquartered in Belfast. This is a success of our innovations using mobile telematics to score driving behaviour.

Our Data Analytics and AI Journey                                                                                                                               The last six years have seen large growth in our AI and Machine learning with AI capabilities embedded the majority product lines.

Example applications that have been developed by these teams include:
• Natural language processing to make sense of speech and text for example in customer services applications, which includes sentiment and tonal analysis.
• Development of driver scores for our Telematics business.
• Discovery of digital footprint
• Claims processing including the use of OCR technology to scan and assess claims.
• Vision systems supported by AI to assess car, capable of detecting total loss scenarios quickly and reducing the claims handling cycle time.

The Mission of the Data & Intelligence System team at Allstate is to is to revolutionize how we operate as a company by ensuring every business decision made by 2027 is driven by machine learning & analytics. To tackle this huge undertaking our strategy is to solve the right problems the right way leveraging cutting edge tools and techniques as and when required.

Reason for Nomination:

Over the past 18 months in partnership with Allstate’s Law & Regulation team we have been rolling out our Legal Automation Toolkit. The toolkit automates the gathering of relevant data, the generation and completion of legal documents and the correct placement of the documents in the legal departments existing document management platform. We took inspiration for this project after meeting with legal teams on the ground who highlighted their frustration at some of the repetitive and manual tasks they had to complete, often taking time away from more complex legal matters.

As a result, our prioritization criteria was made up of three factors:
• Impact on our legal team’s job satisfaction.
• Potential time savings.
• Complexity (could it be solved by vendor/other team)

Our playbook for developing the tools followed these steps:
1.Understand Business Problem
2.Understand Customer’s Needs & Outcomes
3.Determine Assumptions, Risks & Data requirements
4. Perform Research
5.Complete PoC
6.Gather Feedback
7.Run a Pilot
8. Gather Feedback
9.Production Rollout
10.Continuous feedback and improvement

From our research we knew a challenge would be that each jurisdiction/state in the US had different legal processes. To overcome this, we engaged heavily with our legal partners throughout the process. From a technical standpoint we overcame this by building the initial PoC in a way where most of the components would be reusable as we rolled out countrywide.

To date we have rolled out the Claim File Summary Tool countrywide and launched our Legal Answers Tool in 9 jurisdictions on track to deliver in at least 11 jurisdictions by the end of the year. Analysis shows that the tool saves around 75k hours per annum which
equates to $millions in time savings. The feedback we have gathered to date from our legal teams has been fantastic and it is making a huge impact on the 1000’s of legal professionals that Allstate employs.

Feature 1 – Claim File Summary:

Overview & Objective:
It is a requirement of the court to provide a claim summary. Traditionally this was done by an attorney or paralegal who read and searched 100s of pages of file notes, police reports, witness statements and more. This data was spread over multiple systems and in varying formats (which could differ by jurisdiction). This process can take minutes to days but on average around 1 hour.

The goal of this project was to automate the gathering of this data in a central location and provide a summary in the form of a word document that consisted of around 30 fields in a standardized format across every jurisdiction. The project was a collaboration between the R&D ML team, RPA team and of course the legal team. (Example output in attachment CFS_Output)

RPA Component:
As the document management platform is a vended product this could not be done programmatically. This was a challenge the RPA bot overcame by having the ability to navigate through the user interface and upload the documents for the correct case in the correct folder in seconds. The RPA bot also coordinated the gathering of data and calling the ML API to generate the responses.

ML Component:
The method of retrieving the relevant information differed by field for example:
1. Name – DB query
2. Photo’s present– Q&A techniques
3. Injury’s & Treatments – wav2vec/regex

Defining the injury & treatment was a significant challenge as there are 10,000s of injuries and treatments and the claims handlers often use their own shorthand when documenting this. We used a word2vec model to solve this problem. It allowed us to take a list of a few 100 injury terms provided by an attorney and expand it to 10,000 similar terms. The custom model was trained on 60k files and the outputs checked and evaluated by our legal partners.

Feature 2 – Legal Answers Generation:

At the beginning of a case Allstate is served with a summons & complaints petition. This outlines the case being made and has a list of allegations in the form of paragraphs. In the traditional process the legal team has to address each allegation and format a response document to submit to the court. The formats and typical responses vary by jurisdiction and case type e.g. home, auto, dog bite etc.

Using AI/ML tools we deployed a solution that pulls in the relevant documents and uses ML to generate the final correctly formatted response document on which the ML/AI model predicts the attorney’s responses.

This reduced the end to end processing time from 50 to 5 minutes per case.

High Level Flow
1. Petition Files Sent to the ML API in PDF Format
2. ML service leverages AWS textract for OCR
3. Response cleaned & processed. Relevant data is also extracted ready for modelling.
    a.Regex
    b.Rules
    c.Named entity recognition
4.  Case type and individual paragraph answers generated(NN&xgboost)
5. Formatted word document returned from the ML service for final review. (see attachments GIA_Output&GIA_arch).

The main challenge came from the variety of input & expected output formats and the answers to each paragraph. This required new training data, models, scrappers, and output formatting for each jurisdiction & case type. Fortunately, we had 10,000s of historical petition & answer pairs to learn from.

We evaluated the model responses based on a holdout set from the historical cases & also during a parallel test involving multiple attorneys. The models prediction on case type & paragraph answers had human level accuracy.Each tool and technique we used are not revolutionary but how we pulled them together to solve a real business problem was. The project highlights the importance of using the right tool for the job from simple rules and regex to transformer models. We believe the key to success was the collaboration between the technology teams ML & RPA and invaluable feedback the legal teams provided. Using a combination of RPA, off the shelf AI & custom ML model allowed us to build a solution that made their jobs that little bit easier.

Additional Information:

Legal Answers ML models.

Creation of Training Set:
Input data was created using the output of the OCR on historical records. As the input data could potentially be over 20 pages of text this made it difficult to train a model as there was a lot of noise. To solve this problem we engaged with attorneys to find what sections of the documents where most relevant when returning a response for specific questions. These could then be extracted using rules and regex. At this stage multiple ocr
tools where evaluated from azure,aws and a textract implementation.

Case Type:
In each jurisdiction we test multiple approaches including hugginface zero shot classifier, xgoost using word embeddings as features and FastAI leveraging language model pretraining. The final approach depends on the requirements and complexity of the jurisdiction.

Paragraph Predictions:
Each petition can have anywhere from between 10 to 50 paragraphs that need reviewed and answered. We created our features using tdidfs based off bigrams/unigrams and trained an xgboost model. It was critical we have a fast training and inference time as we needed to build multiple models.

Claim file summary tool: Word2vec model overview Data : ~60k claims files
Standard Processing: Lower casing, Removal of non-alpha numeric characters, Stopword removal, Bigram creation (genism phraser) Tuned Parameters: Min_count (irrelevant terms), Size (dimensionality of model), Window (context)
Evaluation: No easy way to quantitatively evaluate the model so we had to engage our legal team for feedback and checking.