Where the docs live (for project and user/admin/dev)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.6 KiB

What specification servers are required will depend very much on your use case. The numbers of documents you want to process at once, their size and content type (an OCR requirement in particular will add considerable processing time for the ingest step) and the number of operations you want to run on those documents. The testing we've done that is commented on below should suffice to give an indication of what specification you might need given budget constraints versus desired performance.

The S&D demo servers are low level KVM VMs running in the same datacentre, with the Ingest server running 2 vCPUs and 1G RAM and the frontend server running 2vCPUs and 2G RAM, with host servers running multi-core Xeons. In our tests of bulk operation (which is via the CLI tool as documented here https://git.law/newroco/searchanddisplace-core/src/branch/master/demo-cli/README.md) using those demo servers we got the following results.

Processing 380 documents, which were a mix of DOCX, ODT and PDF and vary in size up to ~5M and 60+ pages of dense text (contracts!), running a single searcher on each document a run took 306 seconds to ingest the documents and complete the search and displace action on them all. Timing was done by logging the start and end timestamps from Redis.

It was noted that both RAM and CPU were being maxed out in the tests. The software architecture utilises queues so there is considerable room for improving bulk processing performance (overall speed) just by adding CPU or RAM, alongside adjustments to the number of processes allowed per queue in the supervisor.