# Search and Displace Core --- **NOTE** The installation steps below were tested on an Ubuntu 20.04 LTS machine, all commands assume sudo being used unless specified otherwise and should be adapted for each specific environment. Disk size for this service should be at least 15GB. --- ## Install ### Update package repository ``` apt-get update -y ``` ### Install Apache2 ``` apt-get -y install \ apache2 \ apache2-doc \ apache2-utils \ libapache2-mod-fcgid ``` ### Install PHP and the required extensions ``` apt-get -y install software-properties-common && \ add-apt-repository ppa:ondrej/php -y && \ apt-get update -y && \ apt-get -y install \ php7.4 \ php7.4-calendar \ php7.4-common \ php7.4-fileinfo \ php7.4-ftp \ php7.4-fpm \ php7.4-gettext \ php7.4-iconv \ php7.4-json \ php7.4-mbstring \ php7.4-opcache \ php7.4-pdo \ php7.4-phar \ php7.4-posix \ php7.4-readline \ php7.4-sockets \ php7.4-sqlite3 \ php7.4-tokenizer \ php7.4-xml ``` ### Configure Apache2 and PHP ``` a2enmod \ rewrite \ actions \ fcgid \ alias \ proxy_fcgi \ remoteip && \ sed -i "s/DocumentRoot \/var\/www\/html/DocumentRoot \/var\/www\/html\/searchanddisplace-core\/public/g" /etc/apache2/sites-available/000-default.conf && \ sed -i "/^[[:blank:]]ErrorLog/i\ " /etc/apache2/sites-available/000-default.conf && \ sed -i "/^[[:blank:]]ErrorLog/i\ SetHandler \"proxy:unix:\/var\/run\/php\/php7.4-fpm.sock|fcgi:\/\/localhost\"" /etc/apache2/sites-available/000-default.conf && \ sed -i "/^[[[:blank:]]ErrorLog/i\ " /etc/apache2/sites-available/000-default.conf && \ bash -c 'echo "RemoteIPHeader X-Forwarded-For" >> /etc/apache2/apache2.conf' && \ sed -i "s/LogFormat \"%v:%p %h/LogFormat \"%v:%p %a/g" /etc/apache2/apache2.conf && \ sed -i "s/LogFormat \"%h/LogFormat \"%a/g" /etc/apache2/apache2.conf && \ chown -R www-data /var/www/html && \ chmod -R 755 /var/www/html && \ sed -i "s/AllowOverride None/AllowOverride All/g" /etc/apache2/apache2.conf && \ systemctl restart apache2 ``` ### Install Composer ``` apt-get -y install composer ``` ### Install NodeJS 16 LTS , npm ``` curl -s https://deb.nodesource.com/setup_16.x | sudo bash ``` ``` apt-get -y install \ nodejs \ yarn ``` ### Install and Configure the app - Download the app ``` cd /var/www/html && \ git clone https://git.law/newroco/searchanddisplace-core.git && \ chown -R www-data:www-data searchanddisplace-core && \ cd searchanddisplace-core ``` - Create the `.env` file by copying the contents from the `.env.example` file. `cp .env.example .env` - For the 'QUEUE_CONNECTION' variable in `.env` you can use either `sync` or `redis` (recommended). If you choose to use `redis` then you need to make sure that it is installed on your machine. ``` apt-get -y install redis-server ``` - Install the `Search and Displace Ingest` app, found here https://git.law/newroco/searchanddisplace-ingest - Get the URL of the `Search and Displace Ingest` app and add it to the `SD_INGEST_URL` variable in `.env` - Add in `.env` the `WEBHOOK_CLIENT_SECRET` value which needs to be the same value as the `WEBHOOK_CORE_SECRET` in the `Search and Displace Ingest` app `.env` file - Add in `.env` the `SD_DUCKLING_URL` value which by default is `http://0.0.0.0:8000/parse`. You can find details about installing Facebook Duckling in a section below. - Install composer dependencies `composer install` - Install npm dependencies ``` rm -rf node_modules && \ npm install ``` - Compile frontend assets `npm run production` - Generate the app key by running the following command: `php artisan key:generate` - Migrate DB tables ``` touch ./database/database.sqlite chown www-data:www-data ./database/database.sqlite php artisan migrate ``` ### Queues Supervisor config - Install supervisor `apt-get install supervisor -y` - Config file path: **/etc/supervisor/conf.d/queue-worker-search-and-displace-core-production.conf** ```bash [program:queue-worker-search-and-displace-core-production] process_name=%(program_name)s_%(process_num)02d command=php /var/www/html/searchanddisplace-core/artisan queue:listen --queue=sd_core,default --tries=2 --timeout=180 autostart=true autorestart=true user=www-data numprocs=3 redirect_stderr=true stdout_logfile=/var/log/queue/queue-worker-search-and-displace-core-production.log ``` The value for the **command** key should reflect the app path (in the example above the app's path is **/var/www/html/searchanddisplace-core**). The **stdout_logfile** value is the log file. All parent directories must already exist. ` mkdir /var/log/queue` - Start Supervisor (after adding the Supervisor configs detailed below) `supervisorctl start all` - (Optional) Restart Supervisor after a config file update ``` supervisorctl reread supervisorctl update supervisorctl restart ``` ### Facebook Duckling ``` apt-get -y install \ libpcre3 \ libpcre3-dev \ pkg-config && \ cd /var/www/html && \ git clone https://github.com/facebook/duckling.git fb-duckling && \ cd fb-duckling && \ curl -sSL https://get.haskellstack.org/ | sh && \ stack build && \ stack exec duckling-example-exe && \ stack test ``` ### Facebook Duckling Supervisor config Config file path: **/etc/supervisor/conf.d/duckling-worker-search-and-displace-core-production.conf** ```bash [program:duckling-worker-search-and-displacecore-production] process_name=%(program_name)s_%(process_num)02d directory=/var/www/html/fb-duckling command=sudo -S stack exec duckling-example-exe autostart=true autorestart=true user=root numprocs=1 redirect_stderr=true stdout_logfile=/var/log/queue/duckling-worker-search-and-displace-core-production.log ``` The value for the **directory** key should reflect the Facebook Duckling app path (in the example above the path is **/var/www/html/fb-duckling**). The **stdout_logfile** value is the log file. All parent directories must already exist. ### Start the queue worker and Facebook Duckling with Supervisor ``` supervisorctl reread supervisorctl update supervisorctl start all ``` - Check they are running ``` supervisorctl status ``` ### Converting documents ``` # LibreOffice apt-get install -y software-properties-common && \ apt-add-repository ppa:libreoffice/ppa && \ apt-get update && \ apt-get install -y libreoffice libreoffice-writer2xhtml ``` # Searchers There are 2 types of searchers: basic and compounded ## Basic searcher There are 2 types of basic searchers: native and custom ### Native basic searcher This type of searchers are added by default in the app and cannot be edited or deleted. - Amount of Money - Credit Card Number - Distance - Duration - Email - Numeral - Ordinal - Phone Numbers - Quantity - Temperature - Time - Url - Volume ### Custom basic searcher You can add a custom basic searcher by clicking the 'Add regex' button found in the navbar. This searcher is a regular expression. Example: `[d\]{4}-[d\]{3}-[d\]{3}` searches, in the document, all text strings that have 4 digits, a dash, 3 digits, a dash, and finally 3 digits; 1234-123-123 is a valid text. ## Compounded searcher A compounded searcher contains one or more searchers, which can be either basic or comopounded. The searchers can be listed in two ways: in rows and in columns. Each column in a row extends the searching criteria and each row filters the results of the previous row. Let's take as an example the following searcher: the first row has 2 searchers, in the first column we have the 'Email' native basic searcher and in the second column we have a custom basic searcher which searches for text strings that have a leading '#' character. The second rows has only one column and that column has a custom basic searcher which searches for text strings which contain the '@' character. After we execute the Search&Displace the first row of the searcher will be applied on the initial document content and will find all email addresses and all text strings which have a leading '#' character, so the operation applies the searchers in the first row independently, each column extending the searching criteria. Then the second row will be applied on the results of the first row, so on the email addresses and the text strings which have a leading '#' character, basically each row filters the results of the previous row. # Demo Version Is available here https://demo.searchanddisplace.com/ No authentication is required. # Demo Steps - Select and upload a document file (supported files: .docx, .pdf, .odt, .txt) - After the file is uploaded and processed you will see it's contents on the page - Select searchers by clicking the 'List' button on the right, for each searcher you can input a replace value, so for example if you select the 'Email' searcher and input the replace value as 'EMAIL' then all email addresses which are found in the document will be replaced with the text EMAIL - After you are done with the searchers selection you can hide the panel by clicking again on the 'List' button - You can execute the Search&Displace by clicking on the 'Run filters' button - After the processing is done you will see the resulting document in the right panel, side by side with the initial document - You can highlight the found and replaced items by toggling the 'Highlight differences' button