You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
11 months ago | |
---|---|---|
app | 11 months ago | |
bootstrap | 2 years ago | |
config | 2 years ago | |
database | 2 years ago | |
public | 2 years ago | |
resources | 2 years ago | |
routes | 2 years ago | |
storage | 2 years ago | |
tests | 2 years ago | |
.editorconfig | 2 years ago | |
.env.example | 2 years ago | |
.gitattributes | 2 years ago | |
.gitignore | 2 years ago | |
.styleci.yml | 2 years ago | |
README.md | 11 months ago | |
artisan | 2 years ago | |
composer-env.nix | 2 years ago | |
composer.json | 2 years ago | |
composer.lock | 2 years ago | |
default.nix | 2 years ago | |
get-pip.py | 2 years ago | |
php-packages.nix | 2 years ago | |
phpunit.xml | 2 years ago | |
server.php | 2 years ago |
README.md
Search and Displace Ingest
🌀 Server Requirements:
- php7.4 [https://www.php.net] LICENSE
- apache [https://httpd.apache.org] LICENSE
- python 3.8 [https://www.python.org/] LICENSE
- composer [https://getcomposer.org/] LICENSE
⚡ Build with:
- Laravel Framework ^6.2
🚀 Installation
NOTE
The installation steps below were tested on an Ubuntu 20.04 LTS machine, all commands assume sudo being used unless specified otherwise and should be adapted for each specific environment.
Disk size for this service should be at least 10GB.
Update package repository
apt-get update -y
Install Apache2
apt-get -y install \
apache2 \
apache2-doc \
apache2-utils \
libapache2-mod-fcgid
Install PHP and the required extensions
apt-get -y install software-properties-common && \
add-apt-repository ppa:ondrej/php -y && \
apt-get update -y && \
apt-get -y install \
php7.4 \
php7.4-common \
php7.4-fpm \
php7.4-mbstring \
php7.4-sqlite3 \
php7.4-xml \
php7.4-zip
Configure Apache2 and PHP
a2enmod \
rewrite \
actions \
fcgid \
alias \
proxy_fcgi \
remoteip && \
sed -i "s/DocumentRoot \/var\/www\/html/DocumentRoot \/var\/www\/html\/searchanddisplace-ingest\/public/g" /etc/apache2/sites-available/000-default.conf && \
sed -i "/^[[:blank:]]ErrorLog/i\ <FilesMatch \.php\$>" /etc/apache2/sites-available/000-default.conf && \
sed -i "/^[[:blank:]]ErrorLog/i\ SetHandler \"proxy:unix:\/var\/run\/php\/php7.4-fpm.sock|fcgi:\/\/localhost\"" /etc/apache2/sites-available/000-default.conf && \
sed -i "/^[[[:blank:]]ErrorLog/i\ </\FilesMatch>" /etc/apache2/sites-available/000-default.conf && \
bash -c 'echo "RemoteIPHeader X-Forwarded-For" >> /etc/apache2/apache2.conf' && \
sed -i "s/LogFormat \"%v:%p %h/LogFormat \"%v:%p %a/g" /etc/apache2/apache2.conf && \
sed -i "s/LogFormat \"%h/LogFormat \"%a/g" /etc/apache2/apache2.conf && \
chown -R www-data /var/www/html && \
chmod -R 755 /var/www/html && \
sed -i "s/AllowOverride None/AllowOverride All/g" /etc/apache2/apache2.conf && \
systemctl restart apache2
Install Composer
apt-get -y install composer
Ubuntu Packages
# LibreOffice
apt-get update -y && \
apt-add-repository -y ppa:libreoffice/ppa && \
apt-get update -y && \
apt-get install -y libreoffice
# Python
apt-get update -y && \
apt-get install -y software-properties-common && \
add-apt-repository -y ppa:deadsnakes/ppa && \
apt-get install -y \
build-essential \
libpoppler-cpp-dev \
pkg-config \
supervisor \
python3 \
python3-dev
# Redis
apt-get install -y redis-server
# PDF Convertor
apt-get install -y \
libpoppler-cpp-dev \
poppler-utils
# Tesseract OCR
add-apt-repository -y ppa:alex-p/tesseract-ocr-devel && \
apt-get update -y && \
apt-get install -y tesseract-ocr
# Unpaper
apt-get install -y unpaper
# DOCX to PDF Convertor
apt-get install -y unoconv
# Pandoc
apt-get install -y pandoc
Libraries Packages
# Pip
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
python3 get-pip.py && \
rm -rf get-pip.py && \
pip3 install --upgrade pip
# Pdftotext
pip3 install pdftotext
# Supervisor
pip3 install supervisor && \
systemctl enable supervisor && \
mkdir /var/log/amqp && \
mkdir /var/log/queue
Queues Supervisor config
Config file path: /etc/supervisor/conf.d/queue-worker-search-and-displace-ingest-production.conf
[program:queue-worker-search-and-displace-ingest-production]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/html/searchanddisplace-ingest/artisan queue:listen --queue=sd_ingest,default --tries=2 --timeout=180
autostart=true
autorestart=true
user=www-data
numprocs=3
redirect_stderr=true
stdout_logfile=/var/log/queue/queue-worker-search-and-displace-ingest-production.log
The value for the command key should reflect the app path (in the example above the app's path is /var/www/html/searchanddisplace-ingest).
The stdout_logfile value is the log file. All parent directories must already exist.
Install app
- Download app
cd /var/www/html && \
git clone https://git.law/newroco/searchanddisplace-ingest.git && \
chown -R www-data:www-data searchanddisplace-ingest && \
cd searchanddisplace-ingest
- Install Dewarp
# Dewarp
cd /var/www/html/searchanddisplace-ingest/resources/python/dewarp && \
pip3 install opencv-python
- Install and configure app
# Generate environment file
cp .env.example .env
# Install backend packages
composer install
# Generate app key
php artisan key:generate
# Change in .env the value for the QUEUE_CONNECTION to redis, if it is not set already
# Deploy supervisor
supervisorctl reread
supervisorctl update
supervisorctl start all
- Check the queue worker is running
supervisorctl status
Search and Displace Core Setup
- Install the
Search and Displace Core
app, found here https://git.law/newroco/searchanddisplace-core - Get the URL of the
Search and Displace Core
app and add it to theWEBHOOK_CORE_URL
variable in.env
- Add in
.env
theWEBHOOK_CORE_SECRET
value which needs to be the same value as theWEBHOOK_CLIENT_SECRET
in theSearch and Displace Core
app's.env
file