Better Questions, Better Outputs: The Impact of Prompt Engineering in GenAI and Tax


Related topics

Unlock the full potential of AI in tax with prompt engineering.

Contribution by Bas Pelk


In brief:

  • Prompt engineering can enhance GenAI output quality in Tax by 14% on average.
  • Techniques like Few Shot and Persona prompts lead to more accurate tax content.
  • Even without prompt engineering, GPT-4 shows strong performance in tax tasks.

In the rapidly evolving landscape of tax, innovation is key to staying ahead. One of the latest tools in our arsenal as tax specialists is Generative AI (GenAI), specifically models like GPT-4, which have shown promise in various professional domains. However, the effectiveness of these models often hinges on a crucial aspect: how the users engineer their prompts. But what exactly is prompt engineering, why does it matter in the world of tax?

Understanding Prompt Engineering in Tax

Prompt engineering is the process of crafting specific, nuanced instructions (or prompts) to elicit the best possible responses from GenAI models like GPT-4. This is particularly relevant in the tax profession, where accuracy, precision, and traceability are paramount. By refining how we communicate tasks to AI, we can significantly enhance the quality of the outputs, making them more useful for tax professionals across various domains.

The Experiment: Testing the Impact of Prompt Engineering

To quantify the impact of prompt engineering on tax-related tasks, we conducted an experiment involving specialists from various tax departments: direct tax, indirect tax, payroll tax, international tax, tax compliance, and transfer pricing. These specialists provided us with standard tasks typical of their practice areas. The tasks were aligned with five key GenAI use cases: generating tax outputs, summarizing tax text, classifying items for tax purposes, translating tax outputs, and conducting tax research.

We initially executed these tasks using a secure GenAI solution based on GPT-4, asking the specialists to rate the outputs according to an ISO standard for data quality. Following this, we repeated the tasks but incorporated various prompt engineering techniques, including Few Shot, Persona, Audience, Output/Instruction, Template, and Chain-of-Thought techniques.

Our Findings: The Power of Prompt Engineering

The results of our experiment were clear: prompt engineering significantly improved the quality of outputs across all tax tasks, with an overall increase in output quality score of 14% on average. According to our colleagues, the 'report grades' of the answers shot from a 7.4 to almost an 8.5.

When breaking down the results by use case, we observed that prompt engineering had the most significant impact on tax research, with a remarkable 28% improvement in output quality. This suggests that prompt engineering is particularly effective in scenarios requiring complex reasoning and the synthesis of information. On the other hand, the impact on summarization tasks was more modest, with only a 1% improvement, indicating that GenAI is already quite adept at condensing information in a useful way.

Prompt engineering has the greatest impact on the quality of responses in tax investigations, with a 28% improvement in quality.

In terms of data quality metrics, the most significant improvement was seen in traceability, with a 50% increase. This means that the GenAI solution was much better at explaining how it arrived at certain conclusions, a critical factor in tax work where the reasoning behind decisions is as important as the decisions themselves. However, the least improvement was noted in correctness and precision, with a 6% increase, suggesting that while GenAI can follow instructions better with prompt engineering, its inherent ability to generate accurate and precise content is already strong.

The effectiveness of specific prompt engineering techniques was also noteworthy. Few Shot prompting yielded the highest gains in output quality scores, with a 23% improvement, closely followed by persona prompts, which resulted in a 22% increase. These techniques, which guide the model to respond as if it were a specific type of expert or based on minimal examples, clearly enhance the relevance and accuracy of the generated content.

The surprising adequacy of GenAI

Despite the clear benefits of prompt engineering, it's important to note that the overall effect was somewhat less pronounced than we initially expected. This suggests that large language models like GPT-4 are already fairly competent in tax-related work when provided with the same instructions a human tax consultant would receive, even without additional prompt engineering. This finding is encouraging, as it highlights the readiness of these models to be integrated into the tax profession with minimal additional training.

By embracing prompt engineering, we can unlock the full potential of GenAI in the tax profession, driving efficiency and accuracy to new heights.


Summary

Prompt engineering is key in leveraging Generative AI for tax tasks, improving output quality and traceability. Our experiment with tax specialists revealed a 14% average increase in quality, with the most significant gains in tax research. Despite GenAI's inherent capabilities, prompt engineering techniques like Few Shot and Persona prompts can further refine AI performance, making it a vital tool for tax professionals.


About this article