Message Title

Michael Hamann created an issue

LLM AI Integration /

Issue Type:	New Feature
Affects Versions:	0.2.1
Assignee:	Unassigned
Created:	21/Feb/24 17:30
Priority:	Major
Reporter:	Michael Hamann

We need an evaluation framework to test how well the system works for different types of tasks.

We need to develop or use a framework that supports:

List(s) of tasks/prompts for different types of tasks
Example content to index as context for these tasks
Storing answers of the LLM
Automated evaluation of the answers with another LLM
Manual evaluation by a human
Storing evaluation results
Generating visualizations how well the LLM performed on different tasks

This message was sent by Atlassian Jira