Loki with Extstore

Grafana Loki with Extstore

You may have seen this excellent blog post: https://grafana.com/blog/2023/08/23/how-we-scaled-grafana-cloud-logs-memcached-cluster-to-50tb-and-improved-reliability/

… and are now attempting to make use of this knowledge, but something isn’t working quite right. This document will give you a quick start in tuning Loki and Extstore to work well together.

TLDR

We assume your Loki chunk storage size is 1.5mb

For memcached, add at least the following tuning options:

-I 2m -o ext_wbuf_size=32,ext_threads=10,ext_max_sleep=10000,slab_automove_freeratio=0.10,ext_recache_rate=0

IE, your full start line may look like:

memcached -m 6000 -I 2m -o ext_path=/disk/extstore:500G,ext_wbuf_size=32,ext_threads=10,ext_max_sleep=10000,slab_automove_freeratio=0.10,ext_recache_rate=0

Please set -m and ext_path appropraitely for your system. Leave some RAM for your system to breathe and a little disk space overhead.

Please use version 1.6.21 or newer as it improves the extstore write speed and fixes some related bugs.

In your loki configuration:

chunk_store_config:
  chunk_cache_config:
    memcached:
      batch_size: 3
      parallelism: 2
    memcached_client:
      addresses: 127.0.0.1:11211
      timeout: 60s
    background:
      writeback_goroutines: 1
      writeback_buffer: 1000
      writeback_size_limit: 500MB

NOTE: batch_size can be set to 2x the number of memcached servers you have. So if you have 3 servers, 6 should work. Keep parallelism as low as possible, but increase this value if you are not maxing out network usage on memcached.

Loki’s default configuration is very aggressive, which is normally fine for memory backed memcached. However extstore needs a little more time to fetch or write to disk.

Finally, please check that your memcached instances and loki instances aren’t swapping (out of RAM) or out of CPU, as this can make query times longer and cause timeouts.

Why we have to tune these settings

Loki’s defaults assume both A) A RAM backed cluster, and B) potentially a large cluster made of tens to dozens to hundreds of cache nodes. Many users are trying a low number of memcached nodes with extstore (1-3).

When Loki fetches keys from a pool of memcached servers, it will fetch a batch_size of keys to the entire pool all at once. If you have a batch_size of 500 and 40 memcached servers, each memcached will receive 12-14 keys at the same time, as the batch is split across them.

If you are fetching 500 keys against 1 server, that is a much larger batch of keys hitting a single server. There are other issues with this but this document will not discuss them for now.

Memcached tuning discussion

The defaults for extstore are fairly conservative. Most of the performance improvement you will see is from raising ext_threads, which allows it to fully utilize an SSD.

If you have a particularly fast SSD, the thread count can be further raised to 20 or 30.

The number of memcached worker threads is specified with -t and defaults to 4, do not set this higher than the number of CPU’s your server has.

The rest of the tunings help provide minor speedups.

Loki tuning discussion

Above we discuss the batch_size problem. You can also see timeout is set very high. We run into issues for a few reasons:

Loki’s memcache client timeout is measuring the amount of time to fetch and read and process the entire batch of keys from each host.

If you are fetching 2 keys from one host (3MB of data), the 100ms default might seem okay. However, retrieving 500 1.5 megabyte keys over the network from SSD on one host might take quite a while. If you have fewer memcached hosts, or your Loki server does not have a lot of CPU to process results quickly, this timeout will need to be set very high.

We will update this document if this changes.

writeback_goroutines: 1

This defaults to 10. Loki will aggressively write all of the data it fetches from a backing store back to memcached as each query runs. Memcached keeps some memory reserved as a buffer to give it time to flush data to disk. If it cannot write to disk fast enough, you will see the evictions counter increase.

There are not a lot of good options at the moment, but setting this value to 1 will help minimize the impact. If you still have trouble with evictions you may need to scale up to a faster memcached instance or add more instances.

We will update this document if anything changes.

UDP DDoS