Repo for the search and displace core module including the interface to select files and search and displace operations to run on them. https://searchanddisplace.com
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Alex Puiu ad998d37db Improve document content preview 2 years ago
app Improve document content preview 2 years ago
bootstrap Initial commit 3 years ago
config Initial commit 3 years ago
database Improve README. Cleanup 2 years ago
demo-cli Apply SD on original document from CLI command. Add performance analyzer for CLI ran operations. 3 years ago
public update package.json 2 years ago
resources Improve document content preview 2 years ago
routes Seperate index action from AJAX action, fix #30 2 years ago
storage Initial commit 3 years ago
tests Initial commit 3 years ago
.editorconfig Initial commit 3 years ago
.env.example update nix and env.example 2 years ago
.gitattributes Initial commit 3 years ago
.gitignore composer update 2 years ago
.styleci.yml Initial commit 3 years ago
README.md refactor. 2 years ago
artisan Initial commit 3 years ago
composer-env.nix Added nix composer equivalent nix files 2 years ago
composer.json update package.json 2 years ago
default.nix update nix and env.example 2 years ago
node-env.nix added node dependencies to nix files 2 years ago
node.nix added node dependencies to nix files 2 years ago
package-lock.json Improve document content preview 2 years ago
package.json Improve document content preview 2 years ago
php-packages.nix Added nix composer equivalent nix files 2 years ago
phpunit.xml Initial commit 3 years ago
registry.nix added node dependencies to nix files 2 years ago
sandd-core.nix composer update 2 years ago
server.php Initial commit 3 years ago
tsconfig.json Minor UI updates 3 years ago
webpack.mix.js Merge 3 years ago

README.md

Search and Displace Core


NOTE

The installation steps below were tested on an Ubuntu machine and should be adapted for each specific environment.


Install

  • Create the .env file by copying the contents from the .env.example file.

cp .env.example .env

  • Install the 'sqlite' driver for your PHP version if it is not already installed.

  • For the 'QUEUE_CONNECTION' variable in .env you can use either sync or redis (recommended). If you choose to use redis then you need to make sure that it is installed on your machine.

apt update

apt install redis-server

  • Install the Search and Displace Ingest app, found here https://git.law/newroco/searchanddisplace-ingest

  • Get the URL of the Search and Displace Ingest app and add it to the SD_INGEST_URL variable in .env

  • Add in .env the WEBHOOK_CLIENT_SECRET value which needs to be the same value as the WEBHOOK_CORE_SECRET in the Search and Displace Ingest app .env file

  • Add in .env the SD_DUCKLING_URL value which by default is http://0.0.0.0:8000/parse. You can find details about installing Facebook Duckling in a section below.

  • Install composer

  • Install composer dependencies

composer install

  • Install NodeJS and npm
  • Install npm dependencies

npm install

  • Compile frontend assets

npm run prod

  • Generate the app key by running the following command:

php artisan key:generate

  • Migrate DB tables

php artisan migrate

  • Start Supervisor (after adding the Supervisor configs detailed below)

supervisorctl start all

  • (Optional) Restart Supervisor after a config file update

supervisorctl reread

supervisorctl update

supervisorctl restart <name>

Queues Supervisor config

Add a new Supervisor config file in the "/etc/supervisor/conf.d" path like in the example below:

Config file path: /etc/supervisor/conf.d/queue-worker-search-and-displace-core-production.conf

[program:queue-worker-search-and-displace-core-production]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/html/searchanddisplace-core/artisan queue:listen --queue=sd_core,default --tries=2 --timeout=180
autostart=true
autorestart=true
user=www-data
numprocs=3
redirect_stderr=true
stdout_logfile=/var/log/queue/queue-worker-search-and-displace-core-production.log

The value for the 'command' key should reflect the app path (in the example above the app's path is "/var/www/html/searchanddisplace-core").

The 'stdout_logfile' value is the log file. All parent directories must already exist.

Facebook Duckling

  • $ apt-get install libpcre3-dev
  • Go to the directory in which you want to deploy the app (e.g. /var/www/html)
  • $ git clone https://github.com/facebook/duckling.git
  • $ cd duckling
  • $ curl -sSL https://get.haskellstack.org/ | sh
  • $ stack build
  • $ stack exec duckling-example-exe
  • $ stack test

Facebook Duckling Supervisor config

Add a new Supervisor config file in the "/etc/supervisor/conf.d" path like in the example below:

Config file path: /etc/supervisor/conf.d/duckling-worker-search-and-displace-core-production.conf

[program:duckling-worker-search-and-displacecore-production]
process_name=%(program_name)s_%(process_num)02d
directory=/var/www/html/fb-duckling
command=sudo -S stack exec duckling-example-exe
autostart=true
autorestart=true
user=root
numprocs=1
redirect_stderr=true
stdout_logfile=/var/log/queue/duckling-worker-search-and-displace-core-production.log

The value for the 'directory' key should reflect the Facebook Duckling app path (in the example above the path is "/var/www/html/fb-duckling").

The 'stdout_logfile' value is the log file. All parent directories must already exist.

Converting documents

# LibreOffice
apt-get install software-properties-common
apt-add-repository ppa:libreoffice/ppa
apt-get update
apt-get install libreoffice

Searchers

There are 2 types of searchers: basic and compounded

Basic searcher

There are 2 types of basic searchers: native and custom

Native basic searcher

This type of searchers are added by default in the app and cannot be edited or deleted.

  • Amount of Money
  • Credit Card Number
  • Distance
  • Duration
  • Email
  • Numeral
  • Ordinal
  • Phone Numbers
  • Quantity
  • Temperature
  • Time
  • Url
  • Volume

Custom basic searcher

You can add a custom basic searcher by clicking the 'Add regex' button found in the navbar.

This searcher is a regular expression.

Example: [d\]{4}-[d\]{3}-[d\]{3} searches, in the document, all text strings that have 4 digits, a dash, 3 digits, a dash, and finally 3 digits; 1234-123-123 is a valid text.

Compounded searcher

A compounded searcher contains one or more searchers, which can be either basic or comopounded.

The searchers can be listed in two ways: in rows and in columns. Each column in a row extends the searching criteria and each row filters the results of the previous row.

Let's take as an example the following searcher: the first row has 2 searchers, in the first column we have the 'Email' native basic searcher and in the second column we have a custom basic searcher which searches for text strings that have a leading '#' character. The second rows has only one column and that column has a custom basic searcher which searches for text strings which contain the '@' character. After we execute the Search&Displace the first row of the searcher will be applied on the initial document content and will find all email addresses and all text strings which have a leading '#' character, so the operation applies the searchers in the first row independently, each column extending the searching criteria. Then the second row will be applied on the results of the first row, so on the email addresses and the text strings which have a leading '#' character, basically each row filters the results of the previous row.

Demo Version

Is available here https://demo.searchanddisplace.com/ No authentication is required.

Demo Steps

  • Select and upload a document file (supported files: .docx, .pdf, .odt, .txt)
  • After the file is uploaded and processed you will see it's contents on the page
  • Select searchers by clicking the 'List' button on the right, for each searcher you can input a replace value, so for example if you select the 'Email' searcher and input the replace value as 'EMAIL' then all email addresses which are found in the document will be replaced with the text EMAIL
  • After you are done with the searchers selection you can hide the panel by clicking again on the 'List' button
  • You can execute the Search&Displace by clicking on the 'Run filters' button
  • After the processing is done you will see the resulting document in the right panel, side by side with the initial document
  • You can highlight the found and replaced items by toggling the 'Highlight differences' button