📋 Purpose of this Document

This document records the known risks associated with the operation of the Evaluator.gr AI system, the mitigation strategies that have been implemented, as well as the monitoring and incident response protocols. It is prepared in accordance with the principles of Article 9 of the EU AI Act as a voluntary best practice, given that Evaluator.gr is classified as a Limited Risk system (Article 52).

Assessment Legend

Likelihood High Medium Low
Severity High Medium Low
Status Implemented Partial Planned

📊 Risk Register

R01

Position Bias

The model assigns different weight to criteria depending on the order in which they appear in the prompt.

LikelihoodMedium
SeverityMedium
StatusImplemented

Impact

Inaccurate scoring of criteria that appear later in the prompt, leading to an unrepresentative evaluation.

Mitigation

Consistent and fixed ordering of criteria in every evaluation. Triple evaluator system for Pitch Decks (academic, investor, communications) to balance perspectives.

R02

AI Hallucination

The model generates information not grounded in the submitted document.

LikelihoodMedium
SeverityHigh
StatusImplemented

Impact

User receives inaccurate feedback based on fabricated data, potentially leading to incorrect business decisions.

Mitigation

Explicit instructions in every prompt prohibiting fabrication of data. Requirement for each evaluation to be documented with reference to specific points in the document. If something is not mentioned, the system is guided to state this explicitly.

R03

Over-reliance

Users adopt recommendations without critical assessment, treating the system as an authority.

LikelihoodHigh
SeverityHigh
StatusPartial

Impact

Application of inappropriate recommendations without verification, potentially harming the user's business trajectory.

Mitigation

Clear advisory disclaimers on every output. Transparency pages (transparency.php, system-card.php). Planned: verification checklist in high-importance results.

R04

Heterogeneous Outcomes

The quality of feedback varies depending on the completeness of the submitted document.

LikelihoodHigh
SeverityMedium
StatusPartial

Impact

Users with incomplete documents receive less useful feedback, creating inequality in service value.

Mitigation

Document submission guidelines provided. Planned: input quality indicator before evaluation, with suggestions to complete missing information.

R05

Language Bias

Reduced output quality for documents in languages other than Greek.

LikelihoodLow
SeverityLow
StatusImplemented

Impact

Lower evaluation quality for non-Greek-speaking users or English-language documents.

Mitigation

The system has been optimised for both Greek and English with a dedicated business terminology glossary. Clear user communication regarding language optimisation.

R06

Output Inconsistency

The same document may receive slightly different scores across different evaluation runs.

LikelihoodMedium
SeverityMedium
StatusImplemented

Impact

Users who re-run an evaluation receive different results, reducing the system's reliability.

Mitigation

Temperature set to 0.3 (low) for maximum output consistency. Structured prompts with a strictly defined output format.

R07

Model Version Change

A change in the underlying Claude model may affect the quality and consistency of outputs.

LikelihoodLow
SeverityMedium
StatusImplemented

Impact

Non-comparable results across different time periods, making academic reproducibility difficult.

Mitigation

Recording of model version with every evaluation (audit trail). Versioning system that documents each model change with date and rationale.

R08

Industry Bias

The system may evaluate tech startups more favourably than traditional industries.

LikelihoodMedium
SeverityMedium
StatusPlanned

Impact

Unrepresentative evaluations for businesses outside the technology sector, potentially discouraging non-tech entrepreneurs.

Mitigation

Stage-adaptive evaluation that adjusts by industry. Planned: systematic bias testing by industry with documentation of findings.

R09

Data Privacy Risk

Sensitive business data submitted by users may be exposed.

LikelihoodLow
SeverityHigh
StatusImplemented

Impact

Exposure of confidential business information, GDPR violation, legal consequences.

Mitigation

Encrypted MySQL connections. Secure API calls. Data not accessible by third parties without consent. Sharing with investors only with explicit user consent.

R10

API Downtime

An Anthropic API outage affects the availability of the service.

LikelihoodLow
SeverityMedium
StatusImplemented

Impact

Inability to use the platform, loss of in-progress data.

Mitigation

Graceful error handling with user-friendly error messages. User progress saving. User notification of temporary unavailability.

R11

Search Result Quality

Real-time web search results used by the 8 evaluation modules may be inaccurate or outdated.

LikelihoodMedium
SeverityMedium
StatusPartial

Impact

Evaluation based on incorrect market data, potentially misleading for the user.

Mitigation

Clear indication in results that market data comes from web search. Planned: date/source indicator for search results.

👁️ Monitoring Protocols

📊

Continuous Monitoring

  • Real-time API error logging
  • Completion rate monitoring per module
  • Response time logging
📅

Periodic Review

  • Monthly error log review
  • Quarterly output quality assessment
  • Bi-annual review of this document
🔔

Incident Reporting

  • Users can report inaccurate evaluations
  • Feedback collection for prompt improvement
  • Documentation of significant deviations

🚨 Incident Response Protocol

1

Identification

Detection of incident via error logs, user reports or periodic review.

2

Assessment

Severity classification (low/medium/high) and estimation of the number of affected users.

3

Response

Immediate measures based on severity: module deactivation, user notification, temporary suspension.

4

Correction

Implementation of a permanent solution: prompt modification, methodology update, code improvement.

5

Documentation

Recording of incident, actions taken and update of this document.