Repo for the search and displace ingest module that takes odf, docx and pdf and transforms it into .md to be used with search and displace operations
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
Orzu Ionut f42d01b3a4 Improve README. Cleanup 2 years ago
app Improve README. Cleanup 2 years ago
bootstrap Initial commit. Process files and send response via webhook 3 years ago
config Initial commit. Process files and send response via webhook 3 years ago
database Improve README. Cleanup 2 years ago
public Initial commit. Process files and send response via webhook 3 years ago
resources Improve README. Cleanup 2 years ago
routes Improve README. Cleanup 2 years ago
storage Initial commit. Process files and send response via webhook 3 years ago
tests Improve README. Cleanup 2 years ago
.editorconfig Initial commit. Process files and send response via webhook 3 years ago
.env.example Improve README. Cleanup 2 years ago
.gitattributes Initial commit. Process files and send response via webhook 3 years ago
.gitignore Initial commit. Process files and send response via webhook 3 years ago
.styleci.yml Initial commit. Process files and send response via webhook 3 years ago
README.md Improve README. Cleanup 2 years ago
artisan Initial commit. Process files and send response via webhook 3 years ago
composer.json Recreate orginal document with S&D data 3 years ago
composer.lock WIP OCR 3 years ago
get-pip.py Initial commit. Process files and send response via webhook 3 years ago
phpunit.xml Initial commit. Process files and send response via webhook 3 years ago
server.php Initial commit. Process files and send response via webhook 3 years ago

README.md

Search and Displace Ingest

🌀 Server Requirements:

Build with:

  • Laravel Framework ^6.2

🚀 Installation

Ubuntu Packages

# LibreOffice
apt-get install python-software-properties
apt-add-repository ppa:libreoffice/ppa
apt-get update
apt-get install libreoffice

# Python
apt-get update
apt-get install software-properies-common
add-apt-repository ppa:deadsnakes/ppa
apt-get install supervisor python3.8 python3.8-dev

# Redis
apt-get install redis-server

# PDF Convertor
apt-get install libpoppler-cpp-dev
apt-get install poppler-utils

# Tesseract OCR
add-apt-repository ppa:alex-p/tesseract-ocr-devel
apt-get update
apt-get install tesseract-ocr

# Unpaper
apt-get install unpaper

# DOCX to PDF Convertor
apt-get install unoconv

Libraries Packages

# Pip
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
rm -rf get-pip.py
pip install --upgrade pip

# Pdftotext
pip install pdftotext

# Supervisor
pip install supervisor
systemctl enable supervisor
mkdir /var/log/amqp
mkdir /var/log/queue

# Deskew
cd DESKEW_INSTALLATION_DIRECTORY
cd Bin
./deskew INPUT OUTPUT

# Dewarp
pip3 install opencv-python

cd DEWARP_INSTALLATION_DIRECTORY
pip3 install -r requirements.txt

Install app

# Generate environment file
cp .env.example .env

# Install backend packages
composer install

# Generate app key
php artisan key:generate

# Change the value for the QUEUE_CONNECTION to redis, if it is not set already

# Deploy supervisor
php artisan queue:deploy-supervisor

supervisorctl start all

Search and Displace Core Setup

  • Install the Search and Displace Core app, found here https://git.law/newroco/searchanddisplace-core
  • Get the URL of the Search and Displace Core app and add it to the WEBHOOK_CORE_URL variable in .env
  • Add in .env the WEBHOOK_CORE_SECRET value which needs to be the same value as the WEBHOOK_CLIENT_SECRET in the Search and Displace Core app's .env file

PHP Packages