A/B Test Dashboard – Search UX Evaluation

A/B Testing

notebook

python

Author

Atila Madai

Published

July 1, 2025

🧪 Overview

This post summarizes an A/B test comparing two variants of a search experience using real usage logs. The goal was to evaluate:

Click engagement
User dwell time
Positive feedback rate

The full analysis is available via an interactive dashboard.

🔗 Interactive Dashboard

📊 Hosted on Streamlit Cloud. Click above to explore test results interactively.

🧮 Experiment Design

We compared:

Variant A: Control – current production layout
Variant B: Test – new LLM-enhanced formatting

Session-level metrics were computed and compared using independent t-tests and Mann–Whitney U tests.

📈 Metric Summary

Code

import pandas as pd

# Load previously exported summary from the notebook
df = pd.read_csv("data/metric_summary.csv")

# Rename and organize columns for presentation
df = df.rename(columns={
    "variant": "Variant",
    "description": "Description",
    "click_rate": "Click Rate",
    "avg_dwell_time": "Dwell Time (s)",
    "feedback_positive_rate": "Positive Feedback",
    "click_rate_p_value": "Click Rate p",
    "dwell_time_p_value": "Dwell Time p",
    "feedback_score_p_value": "Feedback p"
})

# Format for presentation
df.set_index("Variant").style.format({
    "Click Rate": "{:.2%}",
    "Dwell Time (s)": "{:.1f}",
    "Positive Feedback": "{:.2%}",
    "Click Rate p": "{:.2e}",
    "Dwell Time p": "{:.2e}",
    "Feedback p": "{:.2e}"
})

	Description	Click Rate	Dwell Time (s)	Positive Feedback	Click Rate p	Dwell Time p	Feedback p
Variant
A	Control – current experience	45.00%	32.4	21.00%	1.20e-02	8.90e-02	4.00e-03
B	Test – improved LLM format	52.00%	35.1	33.00%	1.00e-03	8.90e-02	4.00e-03

🔍 Observations

Click Rate improved significantly in the new variant.
Dwell Time showed a small, non-significant increase.
Positive Feedback rate increased and was statistically significant.

🛠 Reproducibility

The dashboard was built using:

streamlit, pandas, plotly, scipy
Supports data uploads in .csv, .xlsx, .parquet, .db

Source code available on GitHub.

📌 Future Work

JSON & API integration for real-time experiment ingestion
Support for multiple test variants
Trend analysis across time windows