AB Testing
AB testing lets you compare different versions of your app to see which one performs better.
What is AB testing
AB testing is a method used to compare two versions of a product to determine which performs better.
Comparing on a single criteria is hard, especially for LLM apps. Indeed, the performance of a product can be measured in many ways.
In phosho, the way AB testing is done is by comparing the analytics distribution of two versions: the candidate one and the control one.
Prerequisites to run an AB test
You need to have setup event detection in your project. This will run analytics to measure the performance of your app:
- Tags: eg. topic of the conversation
- Scores: eg. sentiment of the conversation (between 1 and 5)
- Classifiers: eg. user intent ("buy", "ask for help", "complain")
Run an AB test from the platform
-
Click on the button "Create an AB test" on the phospho platform. If you want, customize the
version_id
, which is the name of the test. -
Send data to the platform by using an SDK, an integration, a file, or more. All new incomming messages will be tagged with the
version_id
.
Alternative: Specify the version_id
in your code
Alternatively, you can specify the version_id
in your code. This will override the version_id
set in the platform.
When logging to phospho, add a field version_id
with the name of your version in metadata
. See the example below:
Run offline tests
If you want to run offline tests, you can use the phospho command line interface. Results of the offline tests are also available in the AB test tab.
-
phospho CLI
Learn more about the phospho command line interface