CodeReviewBench

Sleeping

App Files Files Community

Alex commited on Jul 3

Commit

b31be61

1 Parent(s): 982b341

space updated

Browse files

Files changed (3) hide show

SUBMISSION_EXAMPLE.md +266 -0
src/display/css_html_js.py +9 -8
src/display/formatting.py +6 -6

SUBMISSION_EXAMPLE.md ADDED Viewed

	@@ -0,0 +1,266 @@

+# 📝 Model Submission Example
+This guide shows you exactly how to submit your code review model to the leaderboard.
+## 🚀 Step-by-Step Submission Process
+### 1. **Access the Submission Form**
+- Open the CodeReview Leaderboard in your browser
+- Navigate to the **📝 Submit Model** tab
+- Click on the "📝 Submit New Model Results" accordion to expand the form
+### 2. **Fill in Basic Information**
+#### **Model Name** ✨
+```
+Example: microsoft/CodeT5-base
+Format: organization/model-name
+```
+#### **Programming Language** 🔍
+```
+Select: Python
+(or Java, JavaScript, C++, Go, Rust, etc.)
+```
+#### **Comment Language** 🌍
+```
+Select: English
+(or Chinese, Spanish, French, German, etc.)
+```
+#### **Taxonomy Category** 🏷️
+```
+Select: Bug Detection
+(or Security, Performance, Code Style, etc.)
+```
+### 3. **Performance Scores** (0.0 - 1.0)
+#### **BLEU Score**
+```
+Example: 0.742
+Range: 0.0 to 1.0
+Description: Measures similarity between generated and reference reviews
+```
+#### **Pass@1**
+```
+Example: 0.685
+Range: 0.0 to 1.0
+Description: Success rate when model gets 1 attempt
+```
+#### **Pass@5**
+```
+Example: 0.834
+Range: 0.0 to 1.0
+Description: Success rate when model gets 5 attempts
+```
+#### **Pass@10**
+```
+Example: 0.901
+Range: 0.0 to 1.0
+Description: Success rate when model gets 10 attempts
+```
+### 4. **Quality Metrics** (0 - 10)
+Rate your model across these 10 dimensions:
+#### **Readability: 8**
+```
+How clear and readable are the generated code reviews?
+Scale: 0 (unreadable) to 10 (very clear)
+```
+#### **Relevance: 7**
+```
+How relevant are the reviews to the actual code changes?
+Scale: 0 (irrelevant) to 10 (highly relevant)
+```
+#### **Explanation Clarity: 8**
+```
+How well does the model explain identified issues?
+Scale: 0 (unclear) to 10 (very clear explanations)
+```
+#### **Problem Identification: 7**
+```
+How effectively does it identify real code problems?
+Scale: 0 (misses issues) to 10 (finds all problems)
+```
+#### **Actionability: 6**
+```
+How actionable and useful are the suggestions?
+Scale: 0 (not actionable) to 10 (very actionable)
+```
+#### **Completeness: 7**
+```
+How thorough and complete are the reviews?
+Scale: 0 (incomplete) to 10 (comprehensive)
+```
+#### **Specificity: 6**
+```
+How specific are the comments and suggestions?
+Scale: 0 (too generic) to 10 (very specific)
+```
+#### **Contextual Adequacy: 7**
+```
+How well does it understand the code context?
+Scale: 0 (ignores context) to 10 (perfect context understanding)
+```
+#### **Consistency: 6**
+```
+How consistent is the model across different code reviews?
+Scale: 0 (inconsistent) to 10 (very consistent)
+```
+#### **Brevity: 5**
+```
+How concise are the reviews without losing important information?
+Scale: 0 (too verbose/too brief) to 10 (perfect length)
+```
+### 5. **Submit Your Model**
+- Click the **🚀 Submit Model** button
+- Wait for validation and processing
+- Check for success/error message
+## 📋 Complete Example Submission
+Here's a real example of submitting the CodeT5-base model:
+```yaml
+Model Information:
+  Model Name: "microsoft/CodeT5-base"
+  Programming Language: "Python"
+  Comment Language: "English"
+  Taxonomy Category: "Bug Detection"
+Performance Scores:
+  BLEU Score: 0.742
+  Pass@1: 0.685
+  Pass@5: 0.834
+  Pass@10: 0.901
+Quality Metrics:
+  Readability: 8
+  Relevance: 7
+  Explanation Clarity: 8
+  Problem Identification: 7
+  Actionability: 6
+  Completeness: 7
+  Specificity: 6
+  Contextual Adequacy: 7
+  Consistency: 6
+  Brevity: 5
+```
+## 🔒 Security & Rate Limiting
+### **IP-based Rate Limiting**
+- **5 submissions per IP address per 24 hours**
+- Submissions are tracked by your IP address
+- Rate limit resets every 24 hours
+### **Validation Rules**
+- Model name must follow `organization/model` format
+- All performance scores must be between 0.0 and 1.0
+- All quality metrics must be between 0 and 10
+- Pass@1 ≤ Pass@5 ≤ Pass@10 (logical consistency)
+## ✅ After Submission
+### **Immediate Feedback**
+You'll see one of these messages:
+#### **Success ✅**
+```
+✅ Submission recorded successfully!
+```
+#### **Error Examples ❌**
+```
+❌ Rate limit exceeded: 5/5 submissions in 24 hours
+❌ Model name contains invalid characters
+❌ Pass@1 score cannot be higher than Pass@5
+❌ Score BLEU out of range: 1.2 (must be between 0 and 1)
+```
+### **View Your Results**
+- Your model will appear in the **🏆 Leaderboard** tab
+- Use filters to find your specific submission
+- Check the **📈 Analytics** tab for submission history
+## 🎯 Tips for Better Submissions
+### **Model Naming**
+```
+✅ Good: "microsoft/CodeT5-base"
+✅ Good: "facebook/bart-large"
+✅ Good: "my-org/custom-model-v2"
+❌ Bad: "my model"
+❌ Bad: "[email protected]"
+```
+### **Performance Scores**
+- Be honest and accurate with your evaluations
+- Use proper evaluation methodology
+- Ensure Pass@k scores are logically consistent
+- Document your evaluation process
+### **Quality Metrics**
+- Rate based on actual model performance
+- Consider multiple test cases
+- Be objective in your assessment
+- Document your rating criteria
+## 🤝 Need Help?
+If you encounter issues:
+1. Check the error message for specific guidance
+2. Verify all fields are filled correctly
+3. Ensure you haven't exceeded rate limits
+4. Contact maintainers if problems persist
+---
+**Ready to submit your model? Head to the 📝 Submit Model tab and follow this guide!** 🚀

src/display/css_html_js.py CHANGED Viewed

@@ -12,8 +12,8 @@ DARK_THEME_CSS = """
     --text-primary: #e6edf3;
     --text-secondary: #7d8590;
     --border-color: #30363d;
-    --accent-color: #238636;
-    --accent-hover: #2ea043;
     --danger-color: #da3633;
     --warning-color: #d29922;
     --info-color: #1f6feb;
@@ -101,14 +101,14 @@ DARK_THEME_CSS = """
 .gradio-container input:focus, .gradio-container select:focus, .gradio-container textarea:focus {
     border-color: var(--accent-color) !important;
-    box-shadow: 0 0 0 2px rgba(35, 134, 54, 0.2) !important;
 }
 /* Buttons */
 .gradio-container button {
     background: var(--accent-color) !important;
-    color: white !important;
-    border: none !important;
     border-radius: 6px !important;
     padding: 8px 16px !important;
     font-weight: 500 !important;
@@ -118,6 +118,7 @@ DARK_THEME_CSS = """
 .gradio-container button:hover {
     background: var(--accent-hover) !important;
     transform: translateY(-1px) !important;
 }
 .gradio-container button:active {
@@ -158,7 +159,7 @@ DARK_THEME_CSS = """
 .gradio-container .slider input[type="range"]::-webkit-slider-thumb {
     background: var(--accent-color) !important;
-    border: 2px solid var(--bg-secondary) !important;
     border-radius: 50% !important;
     width: 18px !important;
     height: 18px !important;
@@ -193,8 +194,8 @@ DARK_THEME_CSS = """
 /* Status messages */
 .gradio-container .success {
-    background: rgba(35, 134, 54, 0.1) !important;
-    color: var(--accent-color) !important;
     border: 1px solid var(--accent-color) !important;
     border-radius: 6px !important;
     padding: 12px 16px !important;

     --text-primary: #e6edf3;
     --text-secondary: #7d8590;
     --border-color: #30363d;
+    --accent-color: #ffffff;
+    --accent-hover: #f0f0f0;
     --danger-color: #da3633;
     --warning-color: #d29922;
     --info-color: #1f6feb;
 .gradio-container input:focus, .gradio-container select:focus, .gradio-container textarea:focus {
     border-color: var(--accent-color) !important;
+    box-shadow: 0 0 0 2px rgba(255, 255, 255, 0.2) !important;
 }
 /* Buttons */
 .gradio-container button {
     background: var(--accent-color) !important;
+    color: var(--bg-primary) !important;
+    border: 1px solid var(--border-color) !important;
     border-radius: 6px !important;
     padding: 8px 16px !important;
     font-weight: 500 !important;
 .gradio-container button:hover {
     background: var(--accent-hover) !important;
     transform: translateY(-1px) !important;
+    color: var(--bg-primary) !important;
 }
 .gradio-container button:active {
 .gradio-container .slider input[type="range"]::-webkit-slider-thumb {
     background: var(--accent-color) !important;
+    border: 2px solid var(--bg-primary) !important;
     border-radius: 50% !important;
     width: 18px !important;
     height: 18px !important;
 /* Status messages */
 .gradio-container .success {
+    background: rgba(255, 255, 255, 0.1) !important;
+    color: var(--text-primary) !important;
     border: 1px solid var(--accent-color) !important;
     border-radius: 6px !important;
     padding: 12px 16px !important;

src/display/formatting.py CHANGED Viewed

@@ -53,13 +53,13 @@ def format_metric_score(score: int, metric_name: str) -> str:
     # Color coding based on score
     if score >= 8:
-        color = "#28a745"  # Green
     elif score >= 6:
-        color = "#ffc107"  # Yellow
     elif score >= 4:
-        color = "#fd7e14"  # Orange
     else:
-        color = "#dc3545"  # Red
     return f"<span style='color: {color}; font-weight: 600;'>{score}</span>"
@@ -101,9 +101,9 @@ def format_taxonomy_badge(category: str) -> str:
         "Code Style": "#6f42c1",
         "Performance": "#fd7e14",
         "Security": "#e83e8c",
-        "Maintainability": "#20c997",
         "Documentation": "#17a2b8",
-        "Testing": "#28a745",
         "Architecture": "#6c757d",
         "Best Practices": "#007bff",
         "Refactoring": "#ffc107"

     # Color coding based on score
     if score >= 8:
+        color = "#ffffff"  # White
     elif score >= 6:
+        color = "#d0d0d0"  # Light gray
     elif score >= 4:
+        color = "#a0a0a0"  # Gray
     else:
+        color = "#707070"  # Dark gray
     return f"<span style='color: {color}; font-weight: 600;'>{score}</span>"
         "Code Style": "#6f42c1",
         "Performance": "#fd7e14",
         "Security": "#e83e8c",
+        "Maintainability": "#ffffff",
         "Documentation": "#17a2b8",
+        "Testing": "#ffffff",
         "Architecture": "#6c757d",
         "Best Practices": "#007bff",
         "Refactoring": "#ffc107"