## Search and Displace Ingest ## :cyclone: Server Requirements: - php7.4 [https://www.php.net] [LICENSE](https://www.php.net/license/index.php) - apache [https://httpd.apache.org] [LICENSE](hhttps://www.apache.org/licenses/LICENSE-2.0) - python 3.8 [https://www.python.org/] [LICENSE](https://docs.python.org/3/license.html) - composer [https://getcomposer.org/] [LICENSE](https://github.com/composer/composer/blob/main/LICENSE) ## :zap: Build with: - Laravel Framework ^6.2 ## :rocket: Installation **NOTE** The installation steps below were tested on an Ubuntu 20.04 LTS machine, all commands assume sudo being used unless specified otherwise and should be adapted for each specific environment. Disk size for this service should be at least 10GB. --- ### Update package repository ``` apt-get update -y ``` ### Install Apache2 ``` apt-get -y install \ apache2 \ apache2-doc \ apache2-utils \ libapache2-mod-fcgid ``` ### Install PHP and the required extensions ``` apt-get -y install software-properties-common && \ add-apt-repository ppa:ondrej/php -y && \ apt-get update -y && \ apt-get -y install \ php7.4 \ php7.4-common \ php7.4-fpm \ php7.4-mbstring \ php7.4-sqlite3 \ php7.4-xml \ php7.4-zip ``` ### Configure Apache2 and PHP ``` a2enmod \ rewrite \ actions \ fcgid \ alias \ proxy_fcgi \ remoteip && \ sed -i "s/DocumentRoot \/var\/www\/html/DocumentRoot \/var\/www\/html\/searchanddisplace-ingest\/public/g" /etc/apache2/sites-available/000-default.conf && \ sed -i "/^[[:blank:]]ErrorLog/i\ " /etc/apache2/sites-available/000-default.conf && \ sed -i "/^[[:blank:]]ErrorLog/i\ SetHandler \"proxy:unix:\/var\/run\/php\/php7.4-fpm.sock|fcgi:\/\/localhost\"" /etc/apache2/sites-available/000-default.conf && \ sed -i "/^[[[:blank:]]ErrorLog/i\ " /etc/apache2/sites-available/000-default.conf && \ bash -c 'echo "RemoteIPHeader X-Forwarded-For" >> /etc/apache2/apache2.conf' && \ sed -i "s/LogFormat \"%v:%p %h/LogFormat \"%v:%p %a/g" /etc/apache2/apache2.conf && \ sed -i "s/LogFormat \"%h/LogFormat \"%a/g" /etc/apache2/apache2.conf && \ chown -R www-data /var/www/html && \ chmod -R 755 /var/www/html && \ sed -i "s/AllowOverride None/AllowOverride All/g" /etc/apache2/apache2.conf && \ systemctl restart apache2 ``` ### Install Composer `apt-get -y install composer` ### Ubuntu Packages ``` # LibreOffice apt-get update -y && \ apt-add-repository -y ppa:libreoffice/ppa && \ apt-get update -y && \ apt-get install -y libreoffice ``` ``` # Python apt-get update -y && \ apt-get install -y software-properties-common && \ add-apt-repository -y ppa:deadsnakes/ppa && \ apt-get install -y \ build-essential \ libpoppler-cpp-dev \ pkg-config \ supervisor \ python3 \ python3-dev ``` ``` # Redis apt-get install -y redis-server ``` ``` # PDF Convertor apt-get install -y \ libpoppler-cpp-dev \ poppler-utils ``` ``` # Tesseract OCR add-apt-repository -y ppa:alex-p/tesseract-ocr-devel && \ apt-get update -y && \ apt-get install -y tesseract-ocr ``` ``` # Unpaper apt-get install -y unpaper ``` ``` # DOCX to PDF Convertor apt-get install -y unoconv ``` ``` # Pandoc apt-get install -y pandoc ``` ### Libraries Packages ``` # Pip curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \ python3 get-pip.py && \ rm -rf get-pip.py && \ pip3 install --upgrade pip ``` ``` # Pdftotext pip3 install pdftotext ``` ``` # Supervisor pip3 install supervisor && \ systemctl enable supervisor && \ mkdir /var/log/amqp && \ mkdir /var/log/queue ``` ### Queues Supervisor config Config file path: **/etc/supervisor/conf.d/queue-worker-search-and-displace-ingest-production.conf** ```bash [program:queue-worker-search-and-displace-ingest-production] process_name=%(program_name)s_%(process_num)02d command=php /var/www/html/searchanddisplace-ingest/artisan queue:listen --queue=sd_ingest,default --tries=2 --timeout=180 autostart=true autorestart=true user=www-data numprocs=3 redirect_stderr=true stdout_logfile=/var/log/queue/queue-worker-search-and-displace-ingest-production.log ``` The value for the **command** key should reflect the app path (in the example above the app's path is **/var/www/html/searchanddisplace-ingest**). The **stdout_logfile** value is the log file. All parent directories must already exist. ### Install app - Download app ``` cd /var/www/html && \ git clone https://git.law/newroco/searchanddisplace-ingest.git && \ chown -R www-data:www-data searchanddisplace-ingest && \ cd searchanddisplace-ingest ``` - Install Dewarp ``` # Dewarp cd /var/www/html/searchanddisplace-ingest/resources/python/dewarp && \ pip3 install opencv-python ``` - Install and configure app ```bash # Generate environment file cp .env.example .env # Install backend packages composer install # Generate app key php artisan key:generate # Change in .env the value for the QUEUE_CONNECTION to redis, if it is not set already # Deploy supervisor supervisorctl reread supervisorctl update supervisorctl start all ``` - Check the queue worker is running ``` supervisorctl status ``` ### Search and Displace Core Setup - Install the `Search and Displace Core` app, found here https://git.law/newroco/searchanddisplace-core - Get the URL of the `Search and Displace Core` app and add it to the `WEBHOOK_CORE_URL` variable in `.env` - Add in `.env` the `WEBHOOK_CORE_SECRET` value which needs to be the same value as the `WEBHOOK_CLIENT_SECRET` in the `Search and Displace Core` app's `.env` file ## PHP Packages - cebe/markdown [LICENSE](https://github.com/cebe/markdown/blob/master/LICENSE) - fideloper/proxy [LICENSE](https://github.com/fideloper/TrustedProxy/blob/master/LICENSE.md) - laravel/framework [LICENSE](https://github.com/laravel/framework/blob/7.x/LICENSE.md) - laravel/tinker [LICENSE](https://github.com/laravel/tinker/blob/2.x/LICENSE.md) - league/html-to-markdown [LICENSE](https://github.com/thephpleague/html-to-markdown/blob/master/LICENSE) - phpoffice/phpword [LICENSE](https://github.com/PHPOffice/PHPWord/blob/0.17.0/LICENSE) - predis/predis [LICENSE](https://github.com/php-enqueue/amqp-bunny/blob/master/LICENSE) - spatie/laravel-webhook-server [LICENSE](https://github.com/spatie/laravel-webhook-server/blob/master/LICENSE.md) - spatie/pdf-to-text [LICENSE](https://github.com/spatie/pdf-to-text/blob/main/LICENSE.md) - thiagoalessio/tesseract_ocr [LICENSE](https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/MIT-LICENSE)