🛡️ Risk Management
Identification, assessment and mitigation of risks in the Evaluator.gr AI system
📋 Purpose of this Document
This document records the known risks associated with the operation of the Evaluator.gr AI system, the mitigation strategies that have been implemented, as well as the monitoring and incident response protocols. It is prepared in accordance with the principles of Article 9 of the EU AI Act as a voluntary best practice, given that Evaluator.gr is classified as a Limited Risk system (Article 52).
Assessment Legend
📊 Risk Register
Position Bias
The model assigns different weight to criteria depending on the order in which they appear in the prompt.
Impact
Inaccurate scoring of criteria that appear later in the prompt, leading to an unrepresentative evaluation.
Mitigation
Consistent and fixed ordering of criteria in every evaluation. Triple evaluator system for Pitch Decks (academic, investor, communications) to balance perspectives.
AI Hallucination
The model generates information not grounded in the submitted document.
Impact
User receives inaccurate feedback based on fabricated data, potentially leading to incorrect business decisions.
Mitigation
Explicit instructions in every prompt prohibiting fabrication of data. Requirement for each evaluation to be documented with reference to specific points in the document. If something is not mentioned, the system is guided to state this explicitly.
Over-reliance
Users adopt recommendations without critical assessment, treating the system as an authority.
Impact
Application of inappropriate recommendations without verification, potentially harming the user's business trajectory.
Mitigation
Clear advisory disclaimers on every output. Transparency pages (transparency.php, system-card.php). Planned: verification checklist in high-importance results.
Heterogeneous Outcomes
The quality of feedback varies depending on the completeness of the submitted document.
Impact
Users with incomplete documents receive less useful feedback, creating inequality in service value.
Mitigation
Document submission guidelines provided. Planned: input quality indicator before evaluation, with suggestions to complete missing information.
Language Bias
Reduced output quality for documents in languages other than Greek.
Impact
Lower evaluation quality for non-Greek-speaking users or English-language documents.
Mitigation
The system has been optimised for both Greek and English with a dedicated business terminology glossary. Clear user communication regarding language optimisation.
Output Inconsistency
The same document may receive slightly different scores across different evaluation runs.
Impact
Users who re-run an evaluation receive different results, reducing the system's reliability.
Mitigation
Temperature set to 0.3 (low) for maximum output consistency. Structured prompts with a strictly defined output format.
Model Version Change
A change in the underlying Claude model may affect the quality and consistency of outputs.
Impact
Non-comparable results across different time periods, making academic reproducibility difficult.
Mitigation
Recording of model version with every evaluation (audit trail). Versioning system that documents each model change with date and rationale.
Industry Bias
The system may evaluate tech startups more favourably than traditional industries.
Impact
Unrepresentative evaluations for businesses outside the technology sector, potentially discouraging non-tech entrepreneurs.
Mitigation
Stage-adaptive evaluation that adjusts by industry. Planned: systematic bias testing by industry with documentation of findings.
Data Privacy Risk
Sensitive business data submitted by users may be exposed.
Impact
Exposure of confidential business information, GDPR violation, legal consequences.
Mitigation
Encrypted MySQL connections. Secure API calls. Data not accessible by third parties without consent. Sharing with investors only with explicit user consent.
API Downtime
An Anthropic API outage affects the availability of the service.
Impact
Inability to use the platform, loss of in-progress data.
Mitigation
Graceful error handling with user-friendly error messages. User progress saving. User notification of temporary unavailability.
Search Result Quality
Real-time web search results used by the 8 evaluation modules may be inaccurate or outdated.
Impact
Evaluation based on incorrect market data, potentially misleading for the user.
Mitigation
Clear indication in results that market data comes from web search. Planned: date/source indicator for search results.
👁️ Monitoring Protocols
Continuous Monitoring
- Real-time API error logging
- Completion rate monitoring per module
- Response time logging
Periodic Review
- Monthly error log review
- Quarterly output quality assessment
- Bi-annual review of this document
Incident Reporting
- Users can report inaccurate evaluations
- Feedback collection for prompt improvement
- Documentation of significant deviations
🚨 Incident Response Protocol
Identification
Detection of incident via error logs, user reports or periodic review.
Assessment
Severity classification (low/medium/high) and estimation of the number of affected users.
Response
Immediate measures based on severity: module deactivation, user notification, temporary suspension.
Correction
Implementation of a permanent solution: prompt modification, methodology update, code improvement.
Documentation
Recording of incident, actions taken and update of this document.