Caching server for a unique problem

Caching server for a unique problem

The setup:

  • a website written in PHP lets you find and book ping-pong tables at different venues
  • you have to provide standard information, and you get a really fast response of search results
  • the search uses a 3rd party hosted search engine (HSE)
  • about 4000 venues and each with around 5 tables are from another 3rd party provider which
    has a SOAP API. (presumably others can book these venues as well)
  • the venues were previously fetched from there and pushed onto the hosted search engine
  • In order to know which tables are available, you need availability information which you can get from
    the SOAP API, however there is a problem

The problem:
The SOAP API allows for queuing 1 venue for 1 date at a time, and the result takes 1s to resolve.
Sometimes more. In order to have availability information for the next 60 days it would take 360k seconds,
or over 4 days of constant (that is every second) requests to the 3rd party server which they presumably
wouldn't like very much.

The requirement for a solution:

  • fast and scalable
  • have availability information that is up to date as much as possible
  • least possible connections to venue provider

The solution:

new server

  • VPS at the hosting company where 3rd party venue provider with a slow response SOAP API is located, so it shaves off some milliseconds
  • fixed ip
  • node.js
  • redis
  • lots of fast memory

-when a user fills in where/venue and date/time on the main website HP, an ajax call via new server checks for the availability
of all venues in a town, or a particular venue, on the selected date/time

new server workflow

  • new server issues 240000 SOAP requests (4000 venues x 60 days)
  • that is little less then 3 days of consecutive 1sec calls using apache/nginx and PHP
  • calls should be non blocking (async), so node.js may be a better solution, although there are async PHP solutions
  • actually node.js greatly outperforms apache/php so, let's just recommend node
  • if we have 3 async calls/sec that would finish all work within 1 day
  • results are processed, cached using redis and sent to HSE
  • once a day redis should write to disc, as backup. If memory gets wiped, load from disk
  • caching once a week would leave 6 days empty
  • on the 3rd day we could do a secondary run, and exclude completely booked venues
  • a request for a venue for whole day returns all available rooms for any part of that day,
    so there shouldn't be much exclusions
  • to further exclude, we should do statistical analysis to get a Gaussian on which distance from 'now' on the date axes do people usually book,
    include/exclude on 80/20 bookings, but in terms of days there would be a lot more dates we don't need to check, so we could exclude them
  • as the new server completes part of the run (divided in 1 hour segments), the results are cached in local redis,
    the ones that are different are sent to HSE
  • a note on building the request list:
    • select only if cache is not from today (main run, or past 3 days for secondary run). As we'll see, other caching might come from the main website
    • we should run requests in batches. For example 3 async requests x 3600 = 10800 requests in one 1 hour batch.
      The responses bubble up, and when all are in, we send stuff to other servers, and make a new list.
      Why make a new batch only then? Because there might be some activity from the main website. If some
      venues are already checked earlier today, no need to put them in the list.
      Also, no need to wait the whole day to send the results to HSE

Main website workflow (on day 3 and until end of week)

  • user comes to the HP, and fills in the search where/venue and date/time
  • as the time is selected, an ajax call to the new server is issued, to preload the availability data
  • the new server acts like a bridge: the script on the new server checks from redis if the venues are booked and weather this information was cached earlier today, or yesterday or max two days ago. If it is, operation is done. If it isn't, async check availability. The results
    are cached and if they differ they are sent to HSE (because of async this should be over in ~1-2sec even if there were 70 SOAP requests)
    and also to the main website, stored in redis
  • we store also on the main website because of a lot of request/responses, if we encounter a temporary holdup, the HSE server might not
    be updated in time, and it may return the old results. If the responses get through at the end, but the result page is already rendered,
    what we could do is mark those entries 'booked just now' and hide them as the user is browsing
  • when a user books, we call the same new server script that we call on the HP after selecting time

So basically we'd have

  • 1st day check everything
  • 2nd day using cached values
  • 3rd day check venues that we know were not booked and only for major booking days
    + check venues in a set location for an exact date the users at the main website are searching for, while preventing duplicate checking
  • 4th day using cached values
  • 5th day using cached values
  • 6th and 7th day check venues in a location for an exact date the users at the main website are searching for (but I suspect there are
    two days in a week traffic is slower, so 1st day doesn't have to be Monday)

Also

  • benchmark gzip request compression, and use if tests show improvement
  • when calling the new server from the main website, call the IP directly to skip the DNS overhead
  • see if we can call the SOAP with IP instead of domain name

back to top