Skip to content

naikmubashir/Document-Scanner

Repository files navigation

Document Scanner

A document processing system designed to extract and analyze information from Greek government documents using OCR and NLP technologies.

Features

  • OCR processing with support for Greek and English text
  • Intelligent metadata extraction using NLP
  • Document type classification
  • REST API for document management
  • Authentication and authorization
  • MongoDB database integration

Prerequisites

  • Node.js 18+
  • MongoDB
  • Tesseract OCR with Greek and English language support

Getting Started

  1. Clone the repository
  2. Install dependencies:
    npm install
  3. Set up environment variables by copying .env.example to .env and configuring:
    PORT=3000
    MONGODB_URI=mongodb://localhost:27017/govdoc-scanner
    JWT_SECRET=your-secret-key
    API_KEY=your-external-api-key
    OCR_LANG=ell,eng
    

Development

Start the development server:

npm run dev

Run tests:

npm test

API Endpoints

Documents

  • GET /api/documents - List all documents
  • GET /api/documents/:id - Get a specific document
  • POST /api/documents - Upload and process a new document
  • PATCH /api/documents/:id - Update document metadata
  • DELETE /api/documents/:id - Delete a document

Document Processing

The system processes documents in the following steps:

  1. OCR Processing: Extracts text from document images using Tesseract OCR
  2. NLP Analysis: Analyzes the extracted text to identify:
    • Company names
    • Legal representatives
    • Board members
    • Important dates
    • Document type

Security

  • JWT-based authentication
  • Role-based access control (User/Admin)
  • API key protection for external services
  • Request validation and sanitization

Error Handling

The system implements comprehensive error handling with:

  • Custom error classes
  • Standardized error responses
  • Detailed logging in development mode
  • Generic error messages in production

License

MIT License

About

A Node.js and TypeScript-based application for scanning and processing documents efficiently.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published