Beyond MapReduce

Google announced that they are not using MapReduce anymore: "Google dumps MapReduce favor new hyper scale analytics system". MapReduce has been a simple abstraction that has made large scale data processing easier, scalable, and fault-tolerant. However, MapReduce paradigm does not work well for many use cases such as stream processing, iterative computation, graph processing, real-time analytics, etc. Here is another blog post on this announcement: "The elephant was a trojan horse on the death of map reduce at Google".


On leap second bug

Last Saturday "leap second" adjustment caused issues with many online sites: "Leap second bug wreaks havoc across web".

Google's SRE team posted a nice blog post on how they fixed leap second issue: "Time, technology and leaping seconds" by "leap smear", where they change duration of each second reported by NTP depending on "leap second" is added or subtracted. Wondering whether similar techniques can be applied to Stratum 0/1 NTP servers so that the rest of the people don't have to worry about leap seconds in future?

Summary of Haystack system paper from Facebook


This week in our reading group at LinkedIn we read and discussed this paper: Finding a needle in Haystack: Facebook's photo storage.

As the name suggests the Haystack system is designed and built by Facebook guys to store and serve photos.

The basic goals of Haystack system are as follows: first, high throughput and low latency, store all metadata in main memory, and save on multiple disk I/Os. They are able to achieve 4x speedup than the earlier system by doing this. Second, fault-tolerance. Replicate photos across multiple data centers. Third, cost-effectiveness. They claim to be 28% cheaper than the earlier system.

In the earlier NAS (Network Attached Storage) photo storage system, the main bottleneck was 3 disk accesses for fetching each photo: one for fetching the location of corresponding inode, second for reading inode, and third for fetching the file data.

The main idea of the Haystack system is the design of Haystack store that lead to fetching of a photo in a single disk access. In the Haystack store, they keep multiple photos in a single large file. They need only the offset of the photo in the file and size of the photo to access a photo. This way they are able to reduce the amount of metadata needed to access a photo, and keep all metadata for photos in memory. Hence, no disk access is needed to fetch the metadata for a photo and the photo can be fetched from the disk in a single disk access.

Overall, impressed by the scale of Haystack photo storage system that stores more than 20 Petabytes in form of 260 billions of photos and serves millions of photos per second at the peak.

Reference:
  • D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel. Finding a needle in Haystack: Facebook’s photo storage. In OSDI ’10