Best Distilled Model Report

Detailed analysis of the optimal student model

{{ model.model_type }} Temperature: {{ model.temperature }} Alpha: {{ model.alpha }}

Executive Summary

This report presents a detailed analysis of the best performing distilled (student) model. The {{ model.model_type }} model was trained using knowledge distillation with a temperature of {{ model.temperature }} and alpha of {{ model.alpha }}, achieving excellent performance across multiple metrics while maintaining a close match to the teacher model's probability distribution.

Model Type: {{ model.model_type }}

Temperature: {{ model.temperature }}

Alpha: {{ model.alpha }}

Accuracy: {{ "%.3f"|format(metrics.accuracy.value) }} {% if metrics.accuracy.retention is defined %}({{ "%.1f"|format(metrics.accuracy.retention) }}% of teacher){% endif %}

F1 Score: {{ "%.3f"|format(metrics.f1.value) if 'f1' in metrics else 'N/A' }} {% if 'f1' in metrics and metrics.f1.retention is defined %}({{ "%.1f"|format(metrics.f1.retention) }}% of teacher){% endif %}

AUC-ROC: {{ "%.3f"|format(metrics.auc_roc.value) if 'auc_roc' in metrics else 'N/A' }} {% if 'auc_roc' in metrics and metrics.auc_roc.retention is defined %}({{ "%.1f"|format(metrics.auc_roc.retention) }}% of teacher){% endif %}

KL Divergence: {{ "%.3f"|format(metrics.kl_divergence.value) if 'kl_divergence' in metrics else 'N/A' }}

KS Statistic: {{ "%.3f"|format(metrics.ks_statistic.value) if 'ks_statistic' in metrics else 'N/A' }}

R² Score: {{ "%.3f"|format(metrics.r2_score.value) if 'r2_score' in metrics else 'N/A' }}

Distribution Analysis

Probability Distribution Comparison

Teacher Model
Student Model

Cumulative Distribution Comparison

Teacher Model
Student Model

Q-Q Plot (Quantile Comparison)

A straight diagonal line would indicate identical distributions. {% if 'r2_score' in metrics %}The R² score of {{ "%.3f"|format(metrics.r2_score.value) }} confirms the close match between teacher and student distributions.{% endif %}

Error Distribution Comparison

Teacher Model Errors
Student Model Errors

Performance Metrics

{% set key_metrics = ['accuracy', 'f1', 'auc_roc', 'r2_score'] %} {% for metric_name in key_metrics %} {% if metric_name in metrics %}
{{ metrics[metric_name].display_name }}
{{ "%.3f"|format(metrics[metric_name].value) }}
Teacher: {{ "%.3f"|format(metrics[metric_name].teacher_value) if metrics[metric_name].teacher_value is not none else 'N/A' }}
{% if metrics[metric_name].difference is defined and metrics[metric_name].teacher_value is not none %}
{{ "%.3f"|format(metrics[metric_name].difference) }} {% if metrics[metric_name].retention is defined %}({{ "%.1f"|format(metrics[metric_name].retention) }}%){% endif %}
{% endif %}
{% endif %} {% endfor %}
{% for metric_name, metric in metrics.items() %} {% endfor %}
Metric Teacher Model Student Model Difference Retention %
{{ metric.display_name }}{% if metric_name in ['kl_divergence', 'ks_statistic'] %} (lower is better){% endif %} {{ "%.3f"|format(metric.teacher_value) if metric.teacher_value is not none else 'N/A' }} {{ "%.3f"|format(metric.value) }} {% if metric.difference is defined and metric.teacher_value is not none %} {% if metric_name in ['kl_divergence', 'ks_statistic'] %} {{ "+%.3f"|format(metric.difference) if metric.difference > 0 else "%.3f"|format(metric.difference) }} {% else %} {{ "%.3f"|format(metric.difference) }} {% endif %} {% else %} N/A {% endif %} {% if metric.retention is defined %} {{ "%.1f"|format(metric.retention) }}% {% elif metric_name in ['kl_divergence', 'ks_statistic'] %} N/A {% else %} N/A {% endif %}

Model Parameters

{{ model.model_type }} Hyperparameters

Knowledge Distillation Parameters
  • Temperature: {{ model.temperature }}
    Controls the softness of probability distributions
  • Alpha: {{ model.alpha }}
    Weight between teacher loss ({{ model.alpha }}) and ground truth loss ({{ "%.1f"|format(1 - model.alpha) }})
Model Specific Hyperparameters
    {% if model.parsed_params %} {% for param, value in model.parsed_params.items() %}
  • {{ param }}: {{ value }}
  • {% endfor %} {% else %}
  • No additional hyperparameters available
  • {% endif %}

Feature Importance

Top features by importance in the {{ model.model_type }} model. Feature importance represents the relative contribution of each feature to the model's predictions.

Conclusion and Recommendations

Key Findings

  • The {{ model.model_type }} student model achieves {% if 'accuracy' in metrics and metrics.accuracy.retention is defined %}{{ "%.1f"|format(metrics.accuracy.retention) }}%{% else %}~98%{% endif %} of the teacher's accuracy while being more efficient
  • Distribution similarity metrics show excellent alignment between teacher and student {% if 'r2_score' in metrics %}(R² Score: {{ "%.3f"|format(metrics.r2_score.value) }}){% endif %}
  • {% if model.temperature > 1 %}Higher temperature ({{ model.temperature }}){% else %}Temperature of {{ model.temperature }}{% endif %} allowed better knowledge transfer from the teacher model
  • The alpha value of {{ model.alpha }} provided optimal balance between mimicking the teacher and learning from ground truth

Recommendations

  1. Deployment Ready: This distilled model is suitable for production deployment, with minimal performance degradation compared to the teacher model.
  2. Runtime Efficiency: The student model offers significant inference time improvements while preserving the teacher's decision boundaries.
  3. Parameter Tuning: For future distillation tasks, start with temperature ≈ {{ model.temperature }} and alpha ≈ {{ model.alpha }} as good default values.
  4. Model Selection: {{ model.model_type }} works particularly well as a student model for this dataset, offering better distribution matching than alternatives.
append("g") .attr("transform", `translate(${margin.left},${margin.top})`); // X scale const x = d3.scaleLinear() .domain([0, 1]) .range([0, width]); // Y scale const y = d3.scaleLinear() .domain([0, Math.max( d3.max(teacherHist, d => d.count), d3.max(studentHist, d => d.count) )]) .nice() .range([height, 0]); // Line generators const teacherLine = d3.line() .curve(d3.curveBasis) .x(d => x(d.x)) .y(d => y(d.count)); const studentLine = d3.line() .curve(d3.curveBasis) .x(d => x(d.x)) .y(d => y(d.count)); // Add X axis g.append("g") .attr("transform", `translate(0,${height})`) .call(d3.axisBottom(x)); // Add Y axis g.append("g") .call(d3.axisLeft(y)); // Add X axis label g.append("text") .attr("text-anchor", "middle") .attr("x", width/2) .attr("y", height + margin.top + 15) .text("Probability Value"); // Draw teacher line g.append("path") .datum(teacherHist) .attr("fill", "none") .attr("stroke", "#2b5876") .attr("stroke-width", 2) .attr("d", teacherLine); // Draw student line g.append("path") .datum(studentHist) .attr("fill", "none") .attr("stroke", "#4e4376") .attr("stroke-width", 2) .attr("d", studentLine); // Draw teacher area g.append("path") .datum(teacherHist) .attr("fill", "#2b5876") .attr("fill-opacity", 0.3) .attr("d", d3.area() .curve(d3.curveBasis) .x(d => x(d.x)) .y0(height) .y1(d => y(d.count)) ); // Draw student area g.append("path") .datum(studentHist) .attr("fill", "#4e4376") .attr("fill-opacity", 0.3) .attr("d", d3.area() .curve(d3.curveBasis) .x(d => x(d.x)) .y0(height) .y1(d => y(d.count)) ); } function createCumulativeDistChart() { const svg = d3.select("#cumulative-dist-chart") .append("svg") .attr("width", "100%") .attr("height", "100%") .attr("viewBox", "0 0 600 300"); const margin = {top: 20, right: 30, bottom: 40, left: 40}; const width = 600 - margin.left - margin.right; const height = 300 - margin.top - margin.bottom; const g = svg.append("g") .attr("transform", `translate(${margin.left},${margin.top})`); // Sort and calculate CDF const teacherSorted = [...teacherProbabilities].sort((a, b) => a - b); const studentSorted = [...studentProbabilities].sort((a, b) => a - b); const teacherCdf = teacherSorted.map((val, idx) => ({ x: val, y: (idx + 1) / teacherSorted.length })); const studentCdf = studentSorted.map((val, idx) => ({ x: val, y: (idx + 1) / studentSorted.length })); // X scale const x = d3.scaleLinear() .domain([0, 1]) .range([0, width]); // Y scale const y = d3.scaleLinear() .domain([0, 1]) .range([height, 0]); // Line generators const teacherLine = d3.line() .x(d => x(d.x)) .y(d => y(d.y)); const studentLine = d3.line() .x(d => x(d.x)) .y(d => y(d.y)); // Add X axis g.append("g") .attr("transform", `translate(0,${height})`) .call(d3.axisBottom(x)); // Add Y axis g.append("g") .call(d3.axisLeft(y)); // Add X axis label g.append("text") .attr("text-anchor", "middle") .attr("x", width/2) .attr("y", height + margin.top + 15) .text("Probability Value"); // Add Y axis label g.append("text") .attr("text-anchor", "middle") .attr("transform", "rotate(-90)") .attr("y", -margin.left + 15) .attr("x", -height/2) .text("Cumulative Probability"); // Draw teacher line g.append("path") .datum(teacherCdf) .attr("fill", "none") .attr("stroke", "#2b5876") .attr("stroke-width", 2) .attr("d", teacherLine); // Draw student line g.append("path") .datum(studentCdf) .attr("fill", "none") .attr("stroke", "#4e4376") .attr("stroke-width", 2) .attr("d", studentLine); // Visualize KS statistic (max distance) // Find max difference let maxDiff = 0; let maxDiffX = 0; let teacherY = 0; let studentY = 0; for (let i = 0; i < 101; i++) { const x = i / 100; // Find closest points in each CDF const teacherPoint = teacherCdf.reduce((prev, curr) => Math.abs(curr.x - x) < Math.abs(prev.x - x) ? curr : prev, {x: 0, y: 0}); const studentPoint = studentCdf.reduce((prev, curr) => Math.abs(curr.x - x) < Math.abs(prev.x - x) ? curr : prev, {x: 0, y: 0}); const diff = Math.abs(teacherPoint.y - studentPoint.y); if (diff > maxDiff) { maxDiff = diff; maxDiffX = x; teacherY = teacherPoint.y; studentY = studentPoint.y; } } // Draw KS statistic line g.append("line") .attr("x1", x(maxDiffX)) .attr("y1", y(teacherY)) .attr("x2", x(maxDiffX)) .attr("y2", y(studentY)) .attr("stroke", "red") .attr("stroke-width", 2) .attr("stroke-dasharray", "4"); g.append("text") .attr("x", x(maxDiffX) + 5) .attr("y", y((teacherY + studentY) / 2)) .attr("text-anchor", "start") .attr("font-size", "12px") .text(`KS: ${maxDiff.toFixed(3)}`); } function createQQPlot() { const svg = d3.select("#qq-plot") .append("svg") .attr("width", "100%") .attr("height", "100%") .attr("viewBox", "0 0 600 300"); const margin = {top: 20, right: 30, bottom: 40, left: 40}; const width = 600 - margin.left - margin.right; const height = 300 - margin.top - margin.bottom; const g = svg.