You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
5.6 KiB
5.6 KiB
Search and Displace Ingest
🌀 Server Requirements:
- php7.4 [https://www.php.net] LICENSE
- apache [https://httpd.apache.org] LICENSE
- python 3.8 [https://www.python.org/] LICENSE
- composer [https://getcomposer.org/] LICENSE
⚡ Build with:
- Laravel Framework ^6.2
🚀 Installation
NOTE
The installation steps below were tested on an Ubuntu 20.04 LTS machine and should be adapted for each specific environment.
Update package repository
sudo apt-get update -y
Install Apache2
apt-get -y install \
apache2 \
apache2-doc \
apache2-utils \
libapache2-mod-fcgid
Install PHP and the required extensions
apt-get -y install software-properties-common && \
add-apt-repository ppa:ondrej/php -y && \
apt-get update -y && \
apt-get -y install \
php7.4 \
php7.4-common \
php7.4-fpm \
php7.4-mbstring \
php7.4-sqlite3 \
php7.4-xml \
php7.4-zip
Configure Apache2 and PHP
a2enmod \
rewrite \
actions \
fcgid \
alias \
proxy_fcgi \
remoteip && \
sed -i "/^[[:blank:]]ErrorLog/i\ <FilesMatch \.php\$>" /etc/apache2/sites-available/000-default.conf && \
sed -i "/^[[:blank:]]ErrorLog/i\ SetHandler \"proxy:unix:\/var\/run\/php\/php7.4-fpm.sock|fcgi:\/\/localhost\"" /etc/apache2/sites-available/000-default.conf && \
sed -i "/^[[[:blank:]]ErrorLog/i\ </\FilesMatch>" /etc/apache2/sites-available/000-default.conf && \
bash -c 'echo "RemoteIPHeader X-Forwarded-For" >> /etc/apache2/apache2.conf' && \
sed -i "s/LogFormat \"%v:%p %h/LogFormat \"%v:%p %a/g" /etc/apache2/apache2.conf && \
sed -i "s/LogFormat \"%h/LogFormat \"%a/g" /etc/apache2/apache2.conf && \
chown -R www-data /var/www/html && \
chmod -R 755 /var/www/html && \
sed -i "s/AllowOverride None/AllowOverride All/g" /etc/apache2/apache2.conf && \
systemctl restart apache2
Install Composer
apt-get -y install composer
Ubuntu Packages
# LibreOffice
apt-get update -y && \
apt-add-repository -y ppa:libreoffice/ppa && \
apt-get update -y && \
apt-get install -y libreoffice
# Python
apt-get update -y && \
apt-get install -y software-properties-common && \
add-apt-repository -y ppa:deadsnakes/ppa && \
apt-get install -y \
build-essential \
libpoppler-cpp-dev \
pkg-config \
supervisor \
python3 \
python3-dev
# Redis
apt-get install -y redis-server
# PDF Convertor
apt-get install -y \
libpoppler-cpp-dev \
poppler-utils
# Tesseract OCR
add-apt-repository -y ppa:alex-p/tesseract-ocr-devel && \
apt-get update -y && \
apt-get install -y tesseract-ocr
# Unpaper
apt-get install -y unpaper
# DOCX to PDF Convertor
apt-get install -y unoconv
# Pandoc
apt-get install -y pandoc
Libraries Packages
# Pip
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
python3 get-pip.py && \
rm -rf get-pip.py && \
pip3 install --upgrade pip
# Pdftotext
pip3 install pdftotext
# Supervisor
pip3 install supervisor && \
systemctl enable supervisor && \
mkdir /var/log/amqp && \
mkdir /var/log/queue
Queues Supervisor config
Config file path: /etc/supervisor/conf.d/queue-worker-search-and-displace-ingest-production.conf
[program:queue-worker-search-and-displace-ingest-production]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/html/searchanddisplace-ingest/artisan queue:listen --queue=sd_ingest,default --tries=2 --timeout=180
autostart=true
autorestart=true
user=www-data
numprocs=3
redirect_stderr=true
stdout_logfile=/var/log/queue/queue-worker-search-and-displace-ingest-production.log
The value for the command key should reflect the app path (in the example above the app's path is /var/www/html/searchanddisplace-ingest).
The stdout_logfile value is the log file. All parent directories must already exist.
Install app
# Generate environment file
cp .env.example .env
# Install backend packages
composer install
# Generate app key
php artisan key:generate
# Change the value for the QUEUE_CONNECTION to redis, if it is not set already
# Deploy supervisor
supervisorctl start all
Search and Displace Core Setup
- Install the
Search and Displace Core
app, found here https://git.law/newroco/searchanddisplace-core - Get the URL of the
Search and Displace Core
app and add it to theWEBHOOK_CORE_URL
variable in.env
- Add in
.env
theWEBHOOK_CORE_SECRET
value which needs to be the same value as theWEBHOOK_CLIENT_SECRET
in theSearch and Displace Core
app's.env
file