From 2dbaa9207aa9e04f66653bc21fe6957d38b5dfff Mon Sep 17 00:00:00 2001 From: putt1ck Date: Tue, 21 Sep 2021 15:16:53 +0000 Subject: [PATCH] Updating with more notes --- Sizing | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Sizing b/Sizing index 655cedb..0242abb 100644 --- a/Sizing +++ b/Sizing @@ -1,7 +1,7 @@ -What size server is required will depend very much on your use case. The numbers of documents you want to process at once, their size and content type +What specification servers are required will depend very much on your use case. The numbers of documents you want to process at once, their size and content type (an OCR requirement in particular will add considerable processing time for the ingest step) and the number of operations you want to run on those documents. The testing we've done that is commented on below should suffice to give an indication - The S&D demo servers are low level KVM VMs, with the Ingest server running 2 vCPUs and 1G RAM and the frontend server running 2vCPUs and 2G RAM. In our tests of bulk operation (which is via the CLI app https://git.law/newroco/searchanddisplace-core/src/branch/master/demo-cli/README.md) using those demo servers we got the following results. +The S&D demo servers are low level KVM VMs running in the same datacentre, with the Ingest server running 2 vCPUs and 1G RAM and the frontend server running 2vCPUs and 2G RAM, with host servers running multi-core Xeons. In our tests of bulk operation (which is via the CLI tool as documented here https://git.law/newroco/searchanddisplace-core/src/branch/master/demo-cli/README.md) using those demo servers we got the following results. -Using 380 documents, which were a mix of DOCX, ODT and PDF and vary in size up to ~5M and 60+ pages of dense text (contracts!), running a single searcher on each document took 306 seconds to ingest the documents and complete the search and displace action. +Processing 380 documents, which were a mix of DOCX, ODT and PDF and vary in size up to ~5M and 60+ pages of dense text (contracts!), running a single searcher on each document a run took 306 seconds to ingest the documents and complete the search and displace action on them all. Timing was done by logging the start and end timestamps from Redis. -It was noted that both RAM and CPU were being maxed out in the tests. The software architecture utilises queues so there is considerable room for improving that performance just by adding CPU or RAM \ No newline at end of file +It was noted that both RAM and CPU were being maxed out in the tests. The software architecture utilises queues so there is considerable room for improving bulk processing performance (overall speed) just by adding CPU or RAM, alongside adjustments to the number of processes allowed per queue in the supervisor. \ No newline at end of file