3

We have come across a sporadic inconsistency when doing memcache lookups from different modules (aka "services") within a single AppEngine app (Python). The scenario is as such:

  • Module "portal" creates and later increments integer values for a certain memcache key:

    @ndb.tasklet
    def increment_total_async(meter):
        total = yield memcache.Client().incr_async(
            key="some-key",
            namespace=TOTALS_NAMESPACE,
            delta=meter.quantity)
    
        if total is not None:
            raise ndb.Return(total)
    
        raise ndb.Return(None)
    
  • Module "dashboard" later looks up the current value using the same key and namespace:

    @ndb.transactional_tasklet
    def get_total_async(subscription, metric):
        total = memcache.get(
            key="some-key",
            namespace=TOTALS_NAMESPACE)
    
        if total is not None:
            raise ndb.Return(total)
    
        raise ndb.Return(None)
    

Interestingly, get_total_async sometimes (rarely) returns an integer value of 0 (not None) when called from the "dashboard" module, while when called from the "portal" module it returns the (correct) value >0.

This is somewhat surprising and makes us believe that there may be a race condition in the memcache service - possibly only when one looks up the same key from different modules (now called "services" by Google).

Has anyone come across a similar phenomenon? This is a "Heisenbug", i.e., difficult to reproduce.

5
  • We were also wondering if the fact that we use "incr_async" (and not the synchronous "memcache.incr" call) may play a role.
    – Soeren
    Feb 20, 2017 at 23:44
  • The memcache values can dissapear at anytime. Can you show the code restoring the values in such case? Feb 22, 2017 at 16:39
  • 2
    Is total in increment_total_async() intended to be a numeric value or the UserRPC instance returned by async functions whos get_result() methods return a value? It is indeed strange that get_total_async() sometimes returns 0 but knowing how and when the value is set would be helpful. As you don't specify an initial_value, the incr_async() call should return None unless the value is set elsewhere. Lastly, during the instances where you saw this behavior, did it occur within a fairly tight time frame and is that why you suspect this may be a race condition?
    – Nicholas
    Feb 22, 2017 at 18:28
  • 2
    Is the counter reset after it returns 0 or does it restore itself on a subsequent read? Does "portal" start reporting the "bad" value after "dashboard" does? How does your code handle the case when a key is missing (eg. does it happen to reset it to 0)? Are Memcache accesses between "portal" and "dashboard" synchronized in some way (eg. is "dashboard" guaranteed to read only after "portal" writes)?
    – Adam
    Mar 25, 2017 at 22:48
  • 2
    There are a lot of unknowns which prevent any sound conclusions from being made, but the chances are the issue is not within the Memcache service as this isn't behavior that has been commonly observed elsewhere.
    – Adam
    Mar 25, 2017 at 22:48

0

You must log in to answer this question.

Browse other questions tagged .