Repo for the search and displace core module including the interface to select files and search and displace operations to run on them. https://searchanddisplace.com
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

185 lines
6.7 KiB

3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
2 years ago
3 years ago
3 years ago
3 years ago
3 years ago
  1. # Search and Displace Core
  2. ---
  3. **NOTE**
  4. The installation steps below were tested on an Ubuntu machine and should be adapted for each specific environment.
  5. ---
  6. ## Install
  7. - Create the `.env` file by copying the contents from the `.env.example` file.
  8. `cp .env.example .env`
  9. - Install the 'sqlite' driver for your PHP version if it is not already installed.
  10. - For the 'QUEUE_CONNECTION' variable in `.env` you can use either `sync` or `redis` (recommended). If you choose to use `redis`
  11. then you need to make sure that it is installed on your machine.
  12. `apt update`
  13. `apt install redis-server`
  14. - Install the `Search and Displace Ingest` app, found here https://git.law/newroco/searchanddisplace-ingest
  15. - Get the URL of the `Search and Displace Ingest` app and add it to the `SD_INGEST_URL` variable in `.env`
  16. - Add in `.env` the `WEBHOOK_CLIENT_SECRET` value which needs to be the same value as the `WEBHOOK_CORE_SECRET` in
  17. the `Search and Displace Ingest` app `.env` file
  18. - Add in `.env` the `SD_DUCKLING_URL` value which by default is `http://0.0.0.0:8000/parse`. You can find
  19. details about installing Facebook Duckling in a section below.
  20. - Install composer
  21. - Install composer dependencies
  22. `composer install`
  23. - Install NodeJS and npm
  24. - Install npm dependencies
  25. `npm install`
  26. - Compile frontend assets
  27. `npm run prod`
  28. - Generate the app key by running the following command:
  29. `php artisan key:generate`
  30. - Migrate DB tables
  31. `php artisan migrate`
  32. - Start Supervisor (after adding the Supervisor configs detailed below)
  33. `supervisorctl start all`
  34. - (Optional) Restart Supervisor after a config file update
  35. `supervisorctl reread`
  36. `supervisorctl update`
  37. `supervisorctl restart <name>`
  38. ### Queues Supervisor config
  39. Add a new Supervisor config file in the "/etc/supervisor/conf.d" path like in the example below:
  40. Config file path: /etc/supervisor/conf.d/queue-worker-search-and-displace-core-production.conf
  41. ```bash
  42. [program:queue-worker-search-and-displace-core-production]
  43. process_name=%(program_name)s_%(process_num)02d
  44. command=php /var/www/html/searchanddisplace-core/artisan queue:listen --queue=sd_core,default --tries=2 --timeout=180
  45. autostart=true
  46. autorestart=true
  47. user=www-data
  48. numprocs=3
  49. redirect_stderr=true
  50. stdout_logfile=/var/log/queue/queue-worker-search-and-displace-core-production.log
  51. ```
  52. The value for the 'command' key should reflect the app path (in the example above the app's path is "/var/www/html/searchanddisplace-core").
  53. The 'stdout_logfile' value is the log file. All parent directories must already exist.
  54. ### Facebook Duckling
  55. - `$ apt-get install libpcre3-dev`
  56. - Go to the directory in which you want to deploy the app (e.g. /var/www/html)
  57. - `$ git clone https://github.com/facebook/duckling.git`
  58. - `$ cd duckling`
  59. - `$ curl -sSL https://get.haskellstack.org/ | sh`
  60. - `$ stack build`
  61. - `$ stack exec duckling-example-exe`
  62. - `$ stack test`
  63. ### Facebook Duckling Supervisor config
  64. Add a new Supervisor config file in the "/etc/supervisor/conf.d" path like in the example below:
  65. Config file path: /etc/supervisor/conf.d/duckling-worker-search-and-displace-core-production.conf
  66. ```bash
  67. [program:duckling-worker-search-and-displacecore-production]
  68. process_name=%(program_name)s_%(process_num)02d
  69. directory=/var/www/html/fb-duckling
  70. command=sudo -S stack exec duckling-example-exe
  71. autostart=true
  72. autorestart=true
  73. user=root
  74. numprocs=1
  75. redirect_stderr=true
  76. stdout_logfile=/var/log/queue/duckling-worker-search-and-displace-core-production.log
  77. ```
  78. The value for the 'directory' key should reflect the Facebook Duckling app path (in the example above the path is "/var/www/html/fb-duckling").
  79. The 'stdout_logfile' value is the log file. All parent directories must already exist.
  80. ### Converting documents
  81. ```bash
  82. # LibreOffice
  83. apt-get install software-properties-common
  84. apt-add-repository ppa:libreoffice/ppa
  85. apt-get update
  86. apt-get install libreoffice
  87. ```
  88. # Searchers
  89. There are 2 types of searchers: basic and compounded
  90. ## Basic searcher
  91. There are 2 types of basic searchers: native and custom
  92. ### Native basic searcher
  93. This type of searchers are added by default in the app and cannot be edited or deleted.
  94. - Amount of Money
  95. - Credit Card Number
  96. - Distance
  97. - Duration
  98. - Email
  99. - Numeral
  100. - Ordinal
  101. - Phone Numbers
  102. - Quantity
  103. - Temperature
  104. - Time
  105. - Url
  106. - Volume
  107. ### Custom basic searcher
  108. You can add a custom basic searcher by clicking the 'Add regex' button found in the navbar.
  109. This searcher is a regular expression.
  110. Example: `[d\]{4}-[d\]{3}-[d\]{3}` searches, in the document, all text strings that
  111. have 4 digits, a dash, 3 digits, a dash, and finally 3 digits; 1234-123-123 is a valid text.
  112. ## Compounded searcher
  113. A compounded searcher contains one or more searchers, which can be either basic or comopounded.
  114. The searchers can be listed in two ways: in rows and in columns. Each column in a row
  115. extends the searching criteria and each row filters the results of the previous row.
  116. Let's take as an example the following searcher: the first row has 2 searchers, in the first column
  117. we have the 'Email' native basic searcher and in the second column we have a custom basic searcher
  118. which searches for text strings that have a leading '#' character. The second rows has only one column
  119. and that column has a custom basic searcher which searches for text strings which contain the '@' character.
  120. After we execute the Search&Displace the first row of the searcher will be applied on the initial document content
  121. and will find all email addresses and all text strings which have a leading '#' character, so the operation applies
  122. the searchers in the first row independently, each column extending the searching criteria.
  123. Then the second row will be applied on the results of the first row, so on the email addresses and the text strings
  124. which have a leading '#' character, basically each row filters the results of the previous row.
  125. # Demo Version
  126. Is available here https://demo.searchanddisplace.com/
  127. No authentication is required.
  128. # Demo Steps
  129. - Select and upload a document file (supported files: .docx, .pdf, .odt, .txt)
  130. - After the file is uploaded and processed you will see it's contents on the page
  131. - Select searchers by clicking the 'List' button on the right, for each searcher you can input a replace value, so for example if you select the 'Email' searcher and input the replace value as 'EMAIL' then all email addresses which are found in the document will be replaced with the text EMAIL
  132. - After you are done with the searchers selection you can hide the panel by clicking again on the 'List' button
  133. - You can execute the Search&Displace by clicking on the 'Run filters' button
  134. - After the processing is done you will see the resulting document in the right panel, side by side with the initial document
  135. - You can highlight the found and replaced items by toggling the 'Highlight differences' button