Browse Source

Updating with more notes

master
putt1ck 3 years ago
parent
commit
2dbaa9207a
  1. 8
      Sizing

8
Sizing

@ -1,7 +1,7 @@
What size server is required will depend very much on your use case. The numbers of documents you want to process at once, their size and content type
What specification servers are required will depend very much on your use case. The numbers of documents you want to process at once, their size and content type (an OCR requirement in particular will add considerable processing time for the ingest step) and the number of operations you want to run on those documents. The testing we've done that is commented on below should suffice to give an indication
The S&D demo servers are low level KVM VMs, with the Ingest server running 2 vCPUs and 1G RAM and the frontend server running 2vCPUs and 2G RAM. In our tests of bulk operation (which is via the CLI app https://git.law/newroco/searchanddisplace-core/src/branch/master/demo-cli/README.md) using those demo servers we got the following results.
The S&D demo servers are low level KVM VMs running in the same datacentre, with the Ingest server running 2 vCPUs and 1G RAM and the frontend server running 2vCPUs and 2G RAM, with host servers running multi-core Xeons. In our tests of bulk operation (which is via the CLI tool as documented here https://git.law/newroco/searchanddisplace-core/src/branch/master/demo-cli/README.md) using those demo servers we got the following results.
Using 380 documents, which were a mix of DOCX, ODT and PDF and vary in size up to ~5M and 60+ pages of dense text (contracts!), running a single searcher on each document took 306 seconds to ingest the documents and complete the search and displace action.
Processing 380 documents, which were a mix of DOCX, ODT and PDF and vary in size up to ~5M and 60+ pages of dense text (contracts!), running a single searcher on each document a run took 306 seconds to ingest the documents and complete the search and displace action on them all. Timing was done by logging the start and end timestamps from Redis.
It was noted that both RAM and CPU were being maxed out in the tests. The software architecture utilises queues so there is considerable room for improving that performance just by adding CPU or RAM
It was noted that both RAM and CPU were being maxed out in the tests. The software architecture utilises queues so there is considerable room for improving bulk processing performance (overall speed) just by adding CPU or RAM, alongside adjustments to the number of processes allowed per queue in the supervisor.
Loading…
Cancel
Save