DeepDocGen

🌟 Welcome to DeepDoc Generator World 🌟

[CODE]

This repository hosts all the code used to build our cutting-edge synthetic document pipeline. Dive into the world of synthetic document generation and object detection powered by deep learning!

📝 Paper Title

Generation and Detection of Objects in Documents by Deep Learning Neural Network Models (DeepDocGen)
📢 Our paper is coming soon! Stay tuned for the official release.

📖 About the Project

DeepDocGen is a pipeline designed to overcome the challenges of limited annotated datasets in document analysis. By generating synthetic documents, DeepDocGen empowers researchers and developers to train and fine-tune object detection models with high accuracy and robustness.

🛠️ Technologies Used

DeepDocGen leverages the powerful Mesh-Candidate Bestfit, an automated pipeline designed to synthesize large-scale, high-quality, and visually coherent document layout detection datasets.

In addition, DeepDocGen integrates the Content Generator, which intelligently fills bounding boxes with diverse types of content such as titles, text, tables, lists and more others.
This content placement ensures precise annotation, providing accurate knowledge of the exact position and type of each element within the generated layouts.

Depending on the label associated with each bounding box, the Content Generator applies specific formatting rules:

These strategies ensure that the synthetic documents generated by DeepDocGen maintain a realistic and structured layout, enhancing the quality of datasets for document layout detection tasks.

🎨 Visual Overview

🛠️ DeepDocGen Workflow

DeepDocGen Workflow

📊🎥 Some Document Sample Created

DeepDocGen Demo


💡 How to Get Started

  1. Clone this repository:
    git clone https://github.com/Geo99pro/DeepDocGen.git