Google reveals how it transfers 1.2M terabytes of data daily
Google has disclosed the technical specifications of its proprietary data transfer tool, Effingo. The company uses this tool to move an average of 1.2 exabytes of data every day, which is equal to 1.2 million terabytes. The details were shared in a paper, which describes managed data transfer as "an unsung hero of large-scale, globally-distributed systems." This is because it "reduces the network latency from across-globe hundreds to in-continent dozens of milliseconds."
Optimization for Google's Colossus filesystem
Effingo is optimized for Google's in-house Colossus filesystem, typically deployed across clusters comprising thousands of machines. Each cluster is connected to every other, some within a datacenter on a "low-latency, high-bandwidth CLOS" network while others rely on WAN connections using a mix of Google and third-party owned infrastructure. The tool comprises a control plane to manage the lifecycle of a copy and a data plane that transfers bytes and reports status.
Resource consumption and traffic
The paper states that "the code and resource consumption are uneven: the data plane uses 99% of CPU but is less than 7% of lines of code." When a user initiates a data shift, Effingo requests a traffic allocation from another Google project called Bandwidth Enforcer (BWe), which allocates capacity based on service priority and value derived by adding extra bandwidth.
Quota management and resource allocation
Effingo can bid to use quota-defined resources for workloads that need certain network performance or rely on best-effort resources for less critical flows. Quotas are budgeted upfront in a central planning system, while best-effort resources are typically harvested from underused quotas and shared equally. Despite its efficiency, Effingo has a mean global backlog size of 12 million, usually about eight petabytes (one petabyte is equal to 1,000 terabytes).
Google's future plans for Effingo
On its best day, around two million files are queued in Effingo. Backlogs spike by around 12 petabytes and nine million files when the service's top 10 users initiate new transfers. The paper also reveals that Google plans to improve Effingo's integration with resource management systems and CPU usage during cross-datacenter transfer. Enhancements to the control loop to scale out transfers faster are also in the pipeline.