Walking the line between programming and design, Matan has been writing code since the age of 12. When he’s not working on web apps, you’ll find him scouring the web for new projects. From screen printing to woodwork theres always something in the works. Matan holds a B.Sc. in Computer Science from Tel-Aviv University.

Tech Resources

This is the second of a 2 part post about how we improved query performance on our analytics dashboard by over 7000x. All just by moving some of our data from MySQL to Redis. Part 1 was a technical explanation of the setup, while part 2 shows the benchmarks we saw when comparing fetching data from both systems.


We use Redis a lot. It is fast, stable, effective and awesome! This time, we found Redis useful in solving a painful problem: counting unique users for multiple different filters.

We recently found a new feature in Redis (new for us at least): HyperLogLog. HyperLogLog is a growth arrowprobabilistic data structure which makes estimating the number of distinct objects in a set very fast (Actually, more like blazing fast), but with a minor standard error (You can read more about it here). The moment we read about HyperLogLog we knew there’s something in it. And now that Redis has made it so easy to use, our testing started almost immediately.

We Want Real-Time Data

Until now, we used to keep all data about unique users in MySQL. The data was saved in different variations and ready for filtering (country, day …). As time went by, our queries became slower and slower. It was a pretty grim situation when all our different optimizations on MySQL showed us there’s no real solution here. We were offered to take many different approaches using Redshift, Hadoop or ElasticSearch but we didn’t want to have our data presented in any delay to our users. We wanted a complete, real-time data presentation in our dashboard that is being instantly updated using our background workers.

Redis to The Rescue

Once we had Redis running and migrated the MySQL data in, the results were astonishing. We’ve been tweaking MySQL to try to make distinct counting faster for a couple of months now, and results were mediocre at best (not to MySQLs fault, we were counting cardinality in 10 million+ row tables), but Redis was FAST. Although speed wasn’t the only thing we had to benchmark, we weren’t sure how well the 0.8% error deviation Redis promises for HyperLogLog stood up when we ran queries on our data.

MySql is Under Performing

To get us started, here is a benchmark of part of the many many different ways we tried tweaking MySQL specifically for COUNT DISTINCT

mysql-performance

We tried different query and index structures, the conclusions we drew from the process:

  • SELECT COUNT(*) FROM (SELECT * GROUP BY id) seemed to constantly work better than SELECT COUNT(DISTINCT id).
  • MySQLWorkbench is awesome.
  • With 10M rows and getting larger every day, MySQL just wasn’t the tool for counting the cardinality of our user-data.

Revelation of Goodness

Once we migrated all of our MySQL data into Redis Keys, we saw Redis zip by MySQL in a blink of the eye.

mysql-redis

There’s no mistake in that graph. We tried to chart both performance times of MySql and Redis on the same graph, but you probably can’t see redis’s values there. Here’s a close up of Redis performance times.

redis-performance

Amazing!

The Fly in The Ointment

This can’t be all so good. HyperLogLog only gives an estimate, so then it was time to compare the estimates to the actual MySQL counts. For most queries, the difference was much smaller than the 0.8% error deviation (the smallest was 0.03%), but after benchmarking many different queries, we also had 2 that reached an error of 1.1% and 1.7%.

redis-difference

In the end, these error deviations were acceptable for some of our use cases. We’re still saving exact counts outside of Redis … Just in case.

HyperLogLog is a very powerful tool for counting unique entities. You should definitely use it if you’re willing to accept its minor standard error.

Feel free to share:
Tech Resources

This is the first out of a 2 part post about how we improved query performance on our analytics dashboard by over 7000x. All just by moving some of our data from MySQL to Redis. Part 1 will be a technical explanation of the setup, while part 2 will show the benchmarks we saw when comparing fetching data from both systems.


 

GROW is SOOMLA’s new user intelligence services presented by a brand new Analytics Dashboard. We wanted to provide mobile gaming studios with various ways to investigatRedis Featured Imagee their games using informative and important data metrics. The problem was, queries were slow and the user experience was bad. Most of the slowness stemmed from the fact we used MySql to calculate unique users in multiple different filters which was a bad choice for real-time uniqueness calculations. We tried to figure out ways to improve that until we stumbled upon a new method for calculating unique users with Redis.

We were already using Redis at this point, but only for internal purposes, and not for serving data to our web app, so we decided on setting up different servers that would only serve data for the dashboard. We looked at different options (clusters coming to Redis 3, which has officially been released since this blog post was written, Redis Sentinel using at least 3 different servers) but decided that for our usage, setting up a simple master-slave duo would be enough. Failover will be taken care of semi-manually instead of the overhead of sentinels, once a crash is identified we run a script that notifies the slave to be master, and switches the IP on all servers that connect to Redis (we based some of our approach on some great advice by the awesome @jondot).

When looking to set up a few Redis servers, we saw 2 major options:

  • Setting up our own machines and running Redis off them
  • Using a cloud Redis service such as Amazon Elasticache/Azure Cache/redislabs

After considering pricing and our specific needs we decided to manage our own machine, which came to a third of the price of other self managed cloud services.

Here is our process of setting up 2 Redis machines as Master/Slave

We start with a fresh instance on Amazon EC2, using Ubuntu 14.04


EC2 was just our choice for testing purposes… you can absolutely select your preferred cloud service provider.


Setup

SSH in and add all necessary keys to ~/.ssh/authorized_keys

Install the latest version of Redis via Chris Lea’s PPA:

sudo add-apt-repository ppa:chris-lea/redis-server
sudo apt-get update
sudo apt-get install redis-server

redis-server should be running now

Run redis-benchmark -q -n 100000 -c 50 -P 12 to make sure everything is running ok

Config

Open /etc/redis/redis.conf and change the following settings:

  • tcp-keepalive 60

  • comment bind
    This makes the machine accessible to anyone on the web

  • requirepass choose extremely secure password
    The extreme speediness of Redis is a double edged sword when it comes to password protection.
    An attacker can be able to try as much as 150,000 passwords/second so make that password secure.

  • maxmemory-policy noeviction
    For our needs, no key can ever be deleted

  • appendonly yes
    We will be using both AOF and RDB backups

  • appendfilename redis-[prod/stg]-s[1/2]-ao.aof

If configuring slave

This part is just for our slave

  • slaveof ip port
  • masterauth master password

Save and exit.

Restart redis with sudo service redis-server restart.

You should now be able to connect to redis via

redis-cli -h 127.0.0.1 -p [your port]
AUTH ֿ

Machine Config

Install Git:

sudo apt-get update
sudo apt-get install git

Install Node.JS + NPM:
We will migrate our data to Redis with some node scripts

sudo apt-get install nodejs
sudo apt-get install npm
sudo ln -s /usr/bin/nodejs /usr/sbin/node

Setting up semi-auto failover

We will set up the failover in a way where the redis server IP is an environment variable. On failure we will receive notification and set up a script that will:

  • switch the environment variable to be the IP of the slave machine
  • send the slave a SLAVEOF NO ONE command
  • update the master’s configuration to be slave of the original slave machine

After that we can take our time and figure out why the master server crashed.

To setup the local variable

create a new file in /etc/profile.d with a .sh extension, the file content should be

export REDIS_SERVER=

To make changes have effect immediately run

source .sh

And make sure everything is set by running

printenv REDIS_SERVER

Now, in your server environment configuration, set the Redis server url to be

server: process.env.REDIS_SERVER

Now your server should successfully connect to Redis via the environment variable.

If you’re running your server as a service

In this case the daemon service will not recognise your environment variables, therefor you should inject the .sh in your daemon script in /etc/init.d/yourservice

the injection should look like this

source /etc/profile.d/.sh

inserted before the start/stop/restart functions.

That’s it for setting up our server, stay tuned for the 2nd part, how using Redis HyperLogLog made our queries 7000x faster.

Feel free to share:
Tech Resources

Hi Everyone, I’m Matan, the newest addition to the team. For my first week at Soomla, I’ll be sharing the stages a new member goes through to install the full stack for our product.

I’ll be using OSX so the instructions are written accordingly.

Go through these stages to install the full stack

Step 1: XCode installation

Using The OSX App Store, Install The Latest Version Of XCode, And Run It For The First Time

1.1: Command Line Tools

Next, Paste The Following Code In A New Terminal Window

xcode-select --install 

screenshot of xcode installation. Setting up the developer tools is a critical step for any new employee

Step 2: Homebrew setup

Paste The Following Code In a Terminal Window

ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)

  • Update via the brew update Command
  • Verify Successful Installation via The brew doctor Command

Now we’ll be installing various programs using brew, for each program enter the following command brew install X, replacing each time.
You can easily verify installation by running X --version after, where X is the installed name

The programs we’ll install using brew (click for further info):

Using homebrew to install git, openssl, node, redis and mongodb.

Step 3: Getting RVM up and running

In A New Terminal Window, Enter \curl -sSL https://get.rvm.io | bash -s stable. It’s important not to install RVM under sudo.

  • Verify with rvm --version

Step 4: Downloading JDK 1.6

Download And Install JDK 1.6 For OSX Here

Step 5: Installing the Android SDK

5.1: SDK

Enter brew install android-sdk Into Terminal

  • Run The android Command From Terminal And Follow Instructions To Continue Installing

Installing android SDK from the terminalw

5.2: NDK

Enter brew install android-ndk Into Terminal To Install

Step 6: Setting up your IDEs

These are the IDEs we work with at Soomla

6.1: Intelli-J Community Edition

Download And Install Here

6.2: Android Studio

Download And Install Here

6.3: RubyMine

Download And Install Here

Step 7: Additional utilities you will need

7.1: robomongo Robomongo

MongoDB Management

Download And Install Here

7.2: skitch  Skitch

Image Annotation

Download And Install Here

7.3: mou-icon-20111010  MOU

Markdown Editor

Download And Install Here

Step 8: Online Services Sign-Up

Every team member should be signed up to these services

Done

Now you should be up and running, the Soomla way

Feel free to share:

DOWNLOAD OUR RECENT REPORT

Join 7889 other smart people who get email updates for free!

We don't spam!

Unsubscribe any time

Categories

Archives