Document Scanner

A document processing system designed to extract and analyze information from Greek government documents using OCR and NLP technologies.

Features

OCR processing with support for Greek and English text
Intelligent metadata extraction using NLP
Document type classification
REST API for document management
Authentication and authorization
MongoDB database integration

Prerequisites

Node.js 18+
MongoDB
Tesseract OCR with Greek and English language support

Getting Started

Clone the repository
Install dependencies:
```
npm install
```

Set up environment variables by copying .env.example to .env and configuring:

PORT=3000
MONGODB_URI=mongodb://localhost:27017/govdoc-scanner
JWT_SECRET=your-secret-key
API_KEY=your-external-api-key
OCR_LANG=ell,eng

Development

Start the development server:

npm run dev

Run tests:

npm test

API Endpoints

Documents

GET /api/documents - List all documents
GET /api/documents/:id - Get a specific document
POST /api/documents - Upload and process a new document
PATCH /api/documents/:id - Update document metadata
DELETE /api/documents/:id - Delete a document

Document Processing

The system processes documents in the following steps:

OCR Processing: Extracts text from document images using Tesseract OCR
NLP Analysis: Analyzes the extracted text to identify:
- Company names
- Legal representatives
- Board members
- Important dates
- Document type

Security

JWT-based authentication
Role-based access control (User/Admin)
API key protection for external services
Request validation and sanitization

Error Handling

The system implements comprehensive error handling with:

Custom error classes
Standardized error responses
Detailed logging in development mode
Generic error messages in production

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
uploads		uploads
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Scanner

Features

Prerequisites

Getting Started

Development

API Endpoints

Documents

Document Processing

Security

Error Handling

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

naikmubashir/Document-Scanner

Folders and files

Latest commit

History

Repository files navigation

Document Scanner

Features

Prerequisites

Getting Started

Development

API Endpoints

Documents

Document Processing

Security

Error Handling

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages