Risk Management - Evaluator.gr

📋 Purpose of this Document

This document records the known risks associated with the operation of the Evaluator.gr AI system, the mitigation strategies that have been implemented, as well as the monitoring and incident response protocols. It is prepared in accordance with the principles of Article 9 of the EU AI Act as a voluntary best practice, given that Evaluator.gr is classified as a Limited Risk system (Article 52).

Assessment Legend

Likelihood High Medium Low

Severity High Medium Low

Status Implemented Partial Planned

📊 Risk Register

R01

Position Bias

The model assigns different weight to criteria depending on the order in which they appear in the prompt.

LikelihoodMedium

SeverityMedium

StatusImplemented

Impact

Inaccurate scoring of criteria that appear later in the prompt, leading to an unrepresentative evaluation.

Mitigation

Consistent and fixed ordering of criteria in every evaluation. Triple evaluator system for Pitch Decks (academic, investor, communications) to balance perspectives.

R02

AI Hallucination

The model generates information not grounded in the submitted document.

LikelihoodMedium

SeverityHigh

StatusImplemented

Impact

User receives inaccurate feedback based on fabricated data, potentially leading to incorrect business decisions.

Mitigation

Explicit instructions in every prompt prohibiting fabrication of data. Requirement for each evaluation to be documented with reference to specific points in the document. If something is not mentioned, the system is guided to state this explicitly.

R03

Over-reliance

Users adopt recommendations without critical assessment, treating the system as an authority.

LikelihoodHigh

SeverityHigh

StatusPartial

Impact

Application of inappropriate recommendations without verification, potentially harming the user's business trajectory.

Mitigation

Clear advisory disclaimers on every output. Transparency pages (transparency.php, system-card.php). Planned: verification checklist in high-importance results.

R04

Heterogeneous Outcomes

The quality of feedback varies depending on the completeness of the submitted document.

LikelihoodHigh

SeverityMedium

StatusPartial

Impact

Users with incomplete documents receive less useful feedback, creating inequality in service value.

Mitigation

Document submission guidelines provided. Planned: input quality indicator before evaluation, with suggestions to complete missing information.

R05

Language Bias

Reduced output quality for documents in languages other than Greek.

LikelihoodLow

SeverityLow

StatusImplemented

Impact

Lower evaluation quality for non-Greek-speaking users or English-language documents.

Mitigation

The system has been optimised for both Greek and English with a dedicated business terminology glossary. Clear user communication regarding language optimisation.

R06

Output Inconsistency

The same document may receive slightly different scores across different evaluation runs.

LikelihoodMedium

SeverityMedium

StatusImplemented

Impact

Users who re-run an evaluation receive different results, reducing the system's reliability.

Mitigation

Temperature set to 0.3 (low) for maximum output consistency. Structured prompts with a strictly defined output format.

R07

Model Version Change

A change in the underlying Claude model may affect the quality and consistency of outputs.

LikelihoodLow

SeverityMedium

StatusImplemented

Impact

Non-comparable results across different time periods, making academic reproducibility difficult.

Mitigation

Recording of model version with every evaluation (audit trail). Versioning system that documents each model change with date and rationale.

R08

Industry Bias

The system may evaluate tech startups more favourably than traditional industries.

LikelihoodMedium

SeverityMedium

StatusPlanned

Impact

Unrepresentative evaluations for businesses outside the technology sector, potentially discouraging non-tech entrepreneurs.

Mitigation

Stage-adaptive evaluation that adjusts by industry. Planned: systematic bias testing by industry with documentation of findings.

R09

Data Privacy Risk

Sensitive business data submitted by users may be exposed.

LikelihoodLow

SeverityHigh

StatusImplemented

Impact

Exposure of confidential business information, GDPR violation, legal consequences.

Mitigation

Encrypted MySQL connections. Secure API calls. Data not accessible by third parties without consent. Sharing with investors only with explicit user consent.

R10

API Downtime

An Anthropic API outage affects the availability of the service.

LikelihoodLow

SeverityMedium

StatusImplemented

Impact

Inability to use the platform, loss of in-progress data.

Mitigation

Graceful error handling with user-friendly error messages. User progress saving. User notification of temporary unavailability.

R11

Search Result Quality

Real-time web search results used by the 8 evaluation modules may be inaccurate or outdated.

LikelihoodMedium

SeverityMedium

StatusPartial

Impact

Evaluation based on incorrect market data, potentially misleading for the user.

Mitigation

Clear indication in results that market data comes from web search. Planned: date/source indicator for search results.

👁️ Monitoring Protocols

📊

Continuous Monitoring

Real-time API error logging
Completion rate monitoring per module
Response time logging

📅

Periodic Review

Monthly error log review
Quarterly output quality assessment
Bi-annual review of this document

🔔

Incident Reporting

Users can report inaccurate evaluations
Feedback collection for prompt improvement
Documentation of significant deviations

🚨 Incident Response Protocol

Identification

Detection of incident via error logs, user reports or periodic review.

Assessment

Severity classification (low/medium/high) and estimation of the number of affected users.

Response

Immediate measures based on severity: module deactivation, user notification, temporary suspension.

Correction

Implementation of a permanent solution: prompt modification, methodology update, code improvement.

Documentation

Recording of incident, actions taken and update of this document.

🛡️ Risk Management

📋 Purpose of this Document

Assessment Legend

📊 Risk Register

Position Bias

Impact

Mitigation

AI Hallucination

Impact

Mitigation

Over-reliance

Impact

Mitigation

Heterogeneous Outcomes

Impact

Mitigation

Language Bias

Impact

Mitigation

Output Inconsistency

Impact

Mitigation

Model Version Change

Impact

Mitigation

Industry Bias

Impact

Mitigation

Data Privacy Risk

Impact

Mitigation

API Downtime

Impact

Mitigation

Search Result Quality

Impact

Mitigation

👁️ Monitoring Protocols

Continuous Monitoring

Periodic Review

Incident Reporting

🚨 Incident Response Protocol

Identification

Assessment

Response

Correction

Documentation