Skip to content


Instagram Powers

Source: What Powers Instagram: Hundreds of Instances, Dozens of Technologies

One of the questions we always get asked at meet-ups and conversations with other engineers is, “what’s your stack?” We thought it would be fun to give a sense of all the systems that power Instagram, at a high-level; you can look forward to more in-depth descriptions of some of these systems in the future. This is how our system has evolved in the just-over-1-year that we’ve been live, and while there are parts we’re always re-working, this is a glimpse of how a startup with a small engineering team can scale to our 14 million+ users in a little over a year. Our core principles when choosing a system are:

  • Keep it very simple
  • Don’t re-invent the wheel
  • Go with proven and solid technologies when you can

We’ll go from top to bottom:

OS / Hosting

We run Ubuntu Linux 11.04 (“Natty Narwhal”) on Amazon EC2. We’ve found previous versions of Ubuntu had all sorts of unpredictable freezing episodes on EC2 under high traffic, but Natty has been solid. We’ve only got 3 engineers, and our needs are still evolving, so self-hosting isn’t an option we’ve explored too deeply yet, though is something we may revisit in the future given the unparalleled growth in usage.

Load Balancing

Every request to Instagram servers goes through load balancing machines; we used to run 2 nginx machines and DNS Round-Robin between them. The downside of this approach is the time it takes for DNS to update in case one of the machines needs to get decomissioned. Recently, we moved to using Amazon’s Elastic Load Balancer, with 3 NGINX instances behind it that can be swapped in and out (and are automatically taken out of rotation if they fail a health check). We also terminate our SSL at the ELB level, which lessens the CPU load on nginx. We use Amazon’s Route53 for DNS, which they’ve recently added a pretty good GUI tool for in the AWS console.

Application Servers

Next up comes the application servers that handle our requests. We run Django on Amazon High-CPU Extra-Large machines, and as our usage grows we’ve gone from just a few of these machines to over 25 of them (luckily, this is one area that’s easy to horizontally scale as they are stateless). We’ve found that our particular work-load is very CPU-bound rather than memory-bound, so the High-CPU Extra-Large instance type provides the right balance of memory and CPU.

We use http://gunicorn.org/ as our WSGI server; we used to use mod_wsgi and Apache, but found Gunicorn was much easier to configure, and less CPU-intensive. To run commands on many instances at once (like deploying code), we use Fabric, which recently added a useful parallel mode so that deploys take a matter of seconds.

Data storage

Most of our data (users, photo metadata, tags, etc) lives in PostgreSQL; we’ve previously written about how we shard across our different Postgres instances. Our main shard cluster involves 12 Quadruple Extra-Large memory instances (and twelve replicas in a different zone.)

We’ve found that Amazon’s network disk system (EBS) doesn’t support enough disk seeks per second, so having all of our working set in memory is extremely important. To get reasonable IO performance, we set up our EBS drives in a software RAID using mdadm.

As a quick tip, we’ve found that vmtouch is a fantastic tool for managing what data is in memory, especially when failing over from one machine to another where there is no active memory profile already. Here is the script we use to parse the output of a vmtouch run on one machine and print out the corresponding vmtouch command to run on another system to match its current memory status.

All of our PostgreSQL instances run in a master-replica setup using Streaming Replication, and we use EBS snapshotting to take frequent backups of our systems. We use XFS as our file system, which lets us freeze & unfreeze the RAID arrays when snapshotting, in order to guarantee a consistent snapshot (our original inspiration came from ec2-consistent-snapshot. To get streaming replication started, our favorite tool is repmgr by the folks at 2ndQuadrant.

To connect to our databases from our app servers, we made early on that had a huge impact on performance was using Pgbouncer to pool our connections to PostgreSQL. We found Christophe Pettus’s blog to be a great resource for Django, PostgreSQL and Pgbouncer tips.

The photos themselves go straight to Amazon S3, which currently stores several terabytes of photo data for us. We use Amazon CloudFront as our CDN, which helps with image load times from users around the world (like in Japan, our second most-popular country).

We also use Redis extensively; it powers our main feed, our activity feed, our sessions system (here’s our Django session backend), and other related systems. All of Redis’ data needs to fit in memory, so we end up running several Quadruple Extra-Large Memory instances for Redis, too, and occasionally shard across a few Redis instances for any given subsystem. We run Redis in a master-replica setup, and have the replicas constantly saving the DB out to disk, and finally use EBS snapshots to backup those DB dumps (we found that dumping the DB on the master was too taxing). Since Redis allows writes to its replicas, it makes for very easy online failover to a new Redis machine, without requiring any downtime.

For our geo-search API, we used PostgreSQL for many months, but once our Media entries were sharded, moved over to using Apache Solr. It has a simple JSON interface, so as far as our application is concerned, it’s just another API to consume.

Finally, like any modern Web service, we use Memcached for caching, and currently have 6 Memcached instances, which we connect to using pylibmc & libmemcached. Amazon has an Elastic Cache service they’ve recently launched, but it’s not any cheaper than running our instances, so we haven’t pushed ourselves to switch quite yet.

Task Queue & Push Notifications

When a user decides to share out an Instagram photo to Twitter or Facebook, or when we need to notify one of our Real-time subscribers of a new photo posted, we push that task into Gearman, a task queue system originally written at Danga. Doing it asynchronously through the task queue means that media uploads can finish quickly, while the ‘heavy lifting’ can run in the background. We have about 200 workers (all written in Python) consuming the task queue at any given time, split between the services we share to. We also do our feed fan-out in Gearman, so posting is as responsive for a new user as it is for a user with many followers.

For doing push notifications, the most cost-effective solution we found washttps://github.com/samuraisam/pyapns, an open-source Twisted service that has handled over a billion push notifications for us, and has been rock-solid.

Monitoring

With 100+ instances, it’s important to keep on top of what’s going on across the board. We use Munin to graph metrics across all of our system, and also alert us if anything is outside of its normal range. We write a lot of custom Munin plugins, building on top of Python-Munin, to graph metrics that aren’t system-level (for example, signups per minute, photos posted per second, etc). We use Pingdom for external monitoring of the service, andPagerDuty for handling notifications and incidents.

For Python error reporting, we use Sentry, an awesome open-source Django app written by the folks at Disqus. At any given time, we can sign-on and see what errors are happening across our system, in real time.

You?

If this description of our systems interests you, or if you’re hopping up and down ready to tell us all the things you’d change in the system, we’d love to hear from you. We’re looking for a DevOps person to join us and help us tame our EC2 instance herd.

Share

Posted in A - Z.

Tagged with , .


Facebook’s Architecture

Source: What is Facebook’s architecture?

From various readings and conversations I had, my understanding of Facebook’s current architecture is:

  • Web front-end written in PHP. Facebook’s HipHop Compiler [1] then converts it to C++ and compiles it using g++, thus providing a high performance templating and Web logic execution layer.
  • Because of the limitations of relying entirely on static compilation, Facebook’s started to work on a HipHop Interpreter [2] as well as a HipHop Virtual Machine which translate PHP code to HipHop ByteCode [3].
  • Business logic is exposed as services using Thrift [4]. Some of these services are implemented in PHP, C++ or Java depending on service requirements (some other languages are probably used…)
  • Services implemented in Java don’t use any usual enterprise application server but rather use Facebook’s custom application server. At first this can look as wheel reinvented but as these services are exposed and consumed only (or mostly) using Thrift, the overhead of Tomcat, or even Jetty, was probably too high with no significant added value for their need.
  • Persistence is done using MySQL, Memcached [5], Hadoop’s HBase [6]. Memcached is used as a cache for MySQL as well as a general purpose cache.
  • Offline processing is done using Hadoop and Hive.
  • Data such as logging, clicks and feeds transit using Scribe [7] and are aggregating and stored in HDFS using Scribe-HDFS [8], thus allowing extended analysis using MapReduce
  • BigPipe [9] is their custom technology to accelerate page rendering using a pipelining logic
  • Varnish Cache [10] is used for HTTP proxying. They’ve prefered it for its high performance and efficiency [11].
  • The storage of the billions of photos posted by the users is handled by Haystack, an ad-hoc storage solution developed by Facebook which brings low level optimizations and append-only writes [12].
  • Facebook Messages is using its own architecture which is notably based on infrastructure sharding and dynamic cluster management. Business logic and persistence is encapsulated in so-called ‘Cell’. Each Cell handles a part of users ; new Cells can be added as popularity grows [13]. Persistence is achieved using HBase [14].
  • Facebook Messages’ search engine is built with an inverted index stored in HBase [15]
  • Facebook Search Engine’s implementation details are unknown as far as I know
  • The typeahead search uses a custom storage and retrieval logic [16]
  • Chat is based on an Epoll server developed in Erlang and accessed using Thrift [17]
  • They’ve built an automated system that respond to monitoring alert by launching the appropriated repairing workflow, or escalating to humans if the outage couldn’t be overcome [18].

About the resources provisioned for each of these components, some information and numbers are known:

  • Facebook is estimated to own more than 60,000 servers [18]. Their recent datacenter in Prineville, Oregon is based on entirely self-designed hardware [19] that was recently unveiled as Open Compute Project [20].
  • 300 TB of data is stored in Memcached processes [21]
  • Their Hadoop and Hive cluster is made of 3000 servers with 8 cores, 32 GB RAM, 12 TB disks that is a total of 24k cores, 96 TB RAM and 36 PB disks [22]
  • 100 billion hits per day, 50 billion photos, 3 trillion objects cached, 130 TB of logs per day as of july 2010 [22]

[1] HipHop for PHPhttp://developers.facebook.com/b…
[2] Making HPHPi Fasterhttp://www.facebook.com/note.php…
[3] The HipHop Virtual Machinehttp://www.facebook.com/note.php…
[4] Thrifthttp://thrift.apache.org/
[5] Memcachedhttp://memcached.org/
[6] HBasehttp://hbase.apache.org/
[7] Scribehttps://github.com/facebook/scribe
[8] Scribe-HDFShttp://hadoopblog.blogspot.com/2…
[9] BigPipehttp://www.facebook.com/notes/fa…
[10] Varnish Cachehttp://www.varnish-cache.org/
[11] Facebook goes for Varnishhttp://www.varnish-software.com/…
[12] Needle in a haystack: efficient storage of billions of photos:http://www.facebook.com/note.php…
[13] Scaling the Messages Application Back Endhttp://www.facebook.com/note.php…
[14] The Underlying Technology of Messageshttps://www.facebook.com/note.ph…
[15] The Underlying Technology of Messages Tech Talk:http://www.facebook.com/video/vi…
[16] Facebook’s typeahead search architecturehttp://www.facebook.com/video/vi…
[17] Facebook Chathttp://www.facebook.com/note.php…
[18] Who has the most Web Servers?http://www.datacenterknowledge.c…
[19] Building Efficient Data Centers with the Open Compute Project:http://www.facebook.com/note.php…
[20] Open Compute Projecthttp://opencompute.org/
[21] Facebook’s architecture presentation at Devoxx 2010:http://www.devoxx.com
[22] Scaling Facebook to 500 millions users and beyond:http://www.facebook.com/note.php…

Share

Posted in A - Z.

Tagged with , .


Menampilkan Informasi CPU

Terkadang dalam melakukan konfigurasi server kita perlu mengetahui informasi detil CPU yang digunakan pada mesin sehingga konfigurasi dapat berjalan maksimal.

Ada beberapa cara untuk menampilkan informasi CPU, yang pertama:

server# sysctl -a | egrep -i 'hw.machine|hw.model|hw.ncpu'

output

hw.machine: amd64
hw.model: Intel(R) Xeon(R) CPU           X3430  @ 2.40GHz
hw.ncpu: 4
hw.machine_arch: amd64

Mengambil informasi CPUINFO dari dmesg:

server# dmesg | grep -i cpu

atau

server# grep -i cpu /var/run/dmesg.boot

output

CPU: Intel(R) Xeon(R) CPU           X3430  @ 2.40GHz (2394.00-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
cpu0 (BSP): APIC ID:  0
cpu1 (AP): APIC ID:  2
cpu2 (AP): APIC ID:  4
cpu3 (AP): APIC ID:  6
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
est0: <Enhanced SpeedStep Frequency Control> on cpu0
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 13
p4tcc0: <CPU Frequency Thermal Control> on cpu0
est1: <Enhanced SpeedStep Frequency Control> on cpu1
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 13
p4tcc1: <CPU Frequency Thermal Control> on cpu1
est2: <Enhanced SpeedStep Frequency Control> on cpu2
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 13
p4tcc2: <CPU Frequency Thermal Control> on cpu2
est3: <Enhanced SpeedStep Frequency Control> on cpu3
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 13
p4tcc3: <CPU Frequency Thermal Control> on cpu3
SMP: AP CPU #2 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!

Bisa juga dengan:

server# sysctl -a | grep -i cpu | less
Share

Posted in FreeBSD.

Tagged with , .


Problem Make Perl 5.12

Saat mengetikkan make di /usr/ports/lang/perl5.12/, bila menemukan error message seperti ini:

License not correctly defined: multiple licenses in LICENSE, but LICENSE_COMB is set to single (or undefined)

Solusinya adalah menambahkan baris berikut pada /etc/make.conf.
Buat file tersebut bila belum ada.

LICENSE_COMB=multi

Problem selesai.

@updated 06/11/2010

Konfigurasi diatas menyebabkan problem saat compile Libtool-2.2
Antisipasinya adalah memberikan remark (comment) pada konfigurasi yang ditambahkan diatas sebelum melakukan compile

Share

Posted in FreeBSD.

Tagged with , , , .


Problem Autoconf dengan PHP

Versi FreeBSD 8.1-RELEASE
Versi Autoconf 2.68
Versi PHP  5.3.3_2

Problem saat penambahan extension PHP:

Cannot find autoconf. Please check your autoconf installation and the $PHP_AUTOCONF environment variable is set correctly and then rerun this script.

server# export PHP_AUTOCONF="/usr/local/bin/autoconf"

Cannot find autoconf. Please check your autoconf installation and the $PHP_AUTOHEADER environment variable is set correctly and then rerun this script.

server# export PHP_AUTOHEADER="/usr/local/bin/autoheader"

Jalankan 2 perintah diatas saat mendapatkan masalah tersebut.

Share

Posted in FreeBSD.

Tagged with , , , .


My We Rule Kingdom

Anyone play we rule?

Share

Posted in A - Z.

Tagged with , , .


iPad Wallpaper: Wintery Night

Wintery Night

Share

Posted in iPad Wallpaper.

Tagged with , , , .


iPad Wallpaper: Unido

Unido

Share

Posted in iPad Wallpaper.

Tagged with , , .


iPad Wallpaper: Teja

Teja

Share

Posted in iPad Wallpaper.

Tagged with , , .


iPad Wallpaper: Teja Red

Teja Red

Share

Posted in iPad Wallpaper.

Tagged with , , , .


The On Demand Global Workforce - oDesk