Wednesday, December 30, 2009
Greenstone Carbon Management - Services & Solutions - Functional Architecture
continue...
Tuesday, December 29, 2009
Highscalability.com - eBay Architecture
Update: eBay Serves 5 Billion API Calls Each Month. Aren't we seeing more and more traffic driven by mashups composed on top of open APIs? APIs are no longer a bolt on, they are your application. Architecturally that argues for implementing your own application around the same APIs developers and users employ.
continue...
Monday, December 28, 2009
Eclipsesource.com - Persistent Trees in git, Clojure and CouchDB
continue...
Facebook Architecture
High Performance at Massive Scale – Lessons learned at Facebook
Jeff Rothschild, Vice President of Technology at Facebook gave a great presentation at UC San Diego on our favorite subject: "High Performance at Massive Scale – Lessons learned at Facebook". The abstract for the talk is:
continue...
Facebook's Memcached Multiget Hole: More machines != More Capacity
When you are on the bleeding edge of scale like Facebook is, you run into some interesting problems. As of 2008 Facebook had over 800 memcached servers supplying over 28 terabytes of cache. With those staggering numbers it's a fair bet to think they've seen their share of Dr. House worthy memcached problems.
continue...
Why are Facebook, Digg, and Twitter so hard to scale?
Real-time social graphs (connectivity between people, places, and things). That's why scaling Facebook is hard says Jeff Rothschild, Vice President of Technology at Facebook. Social networking sites like Facebook, Digg, and Twitter are simply harder than traditional websites to scale. Why is that? Why would social networking sites be any more difficult to scale than traditional web sites? Let's find out.
continue...
Product: Facebook's Cassandra - A Massive Distributed Store
Update 2: Presentation from the NoSQL conference: slides, video.
Update: Why you won't be building your killer app on a distributed hash table by Jonathan Ellis. Why I think Cassandra is the most promising of the open-source distributed databases --you get a relatively rich data model and a distribution model that supports efficient range queries. These are not things that can be grafted on top of a simpler DHT foundation, so Cassandra will be useful for a wider variety of applications.
continue...
Hive - A Petabyte Scale Data Warehouse using Hadoop
This post about using Hive and Hadoop for analytics comes straight from Facebook engineers.
Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis and business intelligence applications used by analysts across the company, a number of Facebook products are also based on analytics.
continue...
Facebook, Hadoop, and Hive
Facebook has the second largest installation of Hadoop (a software platform that lets one easily write and run applications that process vast amounts of data), Yahoo being the first.
continue...
Strategy: Facebook Tweaks to Handle 6 Time as Many Memcached Requests
Our latest strategy is taken from a great post by Paul Saab of Facebook, detailing how with changes Facebook has made to memcached they have:
continue...
Product: Scribe - Facebook's Scalable Logging System
In Log Everything All the Time I advocate applications shouldn't bother logging at all. Why waste all that time and code? No, wait, that's not right. I preach logging everything all the time. Doh. Facebook obviously feels similarly which is why they opened sourced Scribe, their internal logging system, capable of logging 10s of billions of messages per day. These messages include access logs, performance statistics, actions that went to News Feed, and many others.
continue...
Some Facebook Secrets to Better Operations
Kim Nash in an interview with Jonathan Heiliger, Facebook VP of technical operations, provides some juicy details on how Facebook handles operations. Operations is one of those departments everyone runs differently as it is usually an ontogeny recapitulates phylogeny situation. With 2,000 databases, 25 terabytes of cache, 90 million active users, and 10,000 servers you know Facebook has some serious operational issues. What are some of Facebook's secrets to better operations?
continue...
Highscalability.com - Stack Overflow Architecture
Stack Overflow is a much loved programmer question and answer site written by two guys nobody has ever heard of before. Well, not exactly. The site was created by top programmer and blog stars Jeff Atwood and Joel Spolsky. In that sense Stack Overflow is like a celebrity owned restaurant, only it should be around for a while. Joel estimates 1/3 of all the programmers in the world have used the site so they must be serving up something good.
continue...
Highscalability.com - Amazon Architecture
Amazon grew from a tiny online bookstore to one of the largest stores on earth. They did it while pioneering new and interesting ways to rate, review, and recommend products. Greg Linden shared is version of Amazon's birth pangs in a series of blog articles
continue...
Asynchronous Architectures 4
Availability & Consistency
Highscalability.com - Flickr Architecture
Flickr is both my favorite bird and the web's leading photo sharing site. Flickr has an amazing challenge, they must handle a vast sea of ever expanding new content, ever increasing legions of users, and a constant stream of new features, all while providing excellent performance. How do they do it?
continue...
Highscalability.com - Scaling Twitter: Making Twitter 10000 Percent Faster
Update 5: Twitter on Scala. A Conversation with Steve Jenson, Alex Payne, and Robey Pointer by Bill Venners. A fascinating discussion of why Twitter moved to the Java JVM for their server infrastructure (long lived processes) and why they moved to Scala to program against it (high level language, static typing, functional). Ruby is used on the front-end but wasn't performant or reliable enough for the back-end.
continue...
Highscalability.com - Google Architecture
Update: Greg Linden points to a new Google article MapReduce: simplified data processing on large clusters. Some interesting stats: 100k MapReduce jobs are executed each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory.
continue...
Highscalability.com - PlentyOfFish Architecture
Update 3: POF now has 200 million images and serves 10,000 images served per second. They'll be moving to a 250,000 IOPS RamSan to handle the load. Also upgraded to a core database machine with 512 GB of RAM, 32 CPU’s, SQLServer 2008 and Windows 2008.
continue...
Highscalability.com - YouTube Architecture
Update 2: YouTube Reaches One Billion Views Per Day. That’s at least 11,574 views per second, 694,444 views per minute, and 41,666,667 views per hour.
Update: YouTube: The Platform. YouTube adds a new rich set of APIs in order to become your video platform leader--all for free. Upload, edit, watch, search, and comment on video from your own site without visiting YouTube. Compose your site internally from APIs because you'll need to expose them later anyway.continue...
Asynchronous Architectures 4
This is the fourth in a series of posts presenting arguments for asynchronous architectures as the optimal way to build high-performance, scalable systems for a distributed environment.
In a QCon conference presentation on Availability and Consistency or how the CAP theorem ruins it all, Werner Vogels, Amazon CTO, examines the tension between availability & consistency in large-scale distributed systems, and presents a model for reasoning about the trade-offs between different solutions.
Asynchronous Architectures 3
continue...
Asynchronous Architectures 2
continue..
Asynchronous Architectures 1
continue...
Five Scalability Principles
continue...
Brewer's CAP Theorem
continue...
Tuesday, December 22, 2009
Tuesday, December 15, 2009
Project Lambda: Straw-Man Proposal
This is a straw-man proposal to add first-class functions, function types, and lambda expressions (informally, “closures”) to Java.
This sketch is incomplete and, most likely, inconsistent and unimplementable in its present form. That’s acceptable: It’s intended to be a starting point for discussions on the Project Lambda mailing list which will, hopefully, lead to the formulation of a detailed proposal and a prototype implementation. This document is written in a tutorial style so as to be easily approachable by non-experts, and also to stress its informal nature.
Monday, December 14, 2009
Wednesday, December 9, 2009
Thursday, November 5, 2009
140 Google Interview Questions
continue...
Friday, August 28, 2009
Starting a Business as an Open Source Consultant
continue...
Learning Job Interview Skills from Actors
continue...
Wednesday, August 26, 2009
Mash that trash -- Incremental compaction in the IBM JDK Garbage Collector
Java applications do not have memory management issues, because the garbage collector of the Java Virtual Machine (JVM) takes care of all the storage issues. The garbage collector in the IBM JVM is based on the mark-sweep-compact (MSC) algorithm, where garbage collection (GC) takes place in three phases. At the end of the mark and sweep phases, free space is available, but there is a possibility of heap fragmentation. The compact phase alleviates the fragmentation problem by moving chunks of allocated space towards the lower end of the heap, helping create contiguous free memory at the other end.
continue...
Thursday, August 20, 2009
Java Memory Problems
continue...