Skip to main content

存如何利用Snowflake的Snowgrid技术和AWS实现数据弹性

Arjun宫著, 据导演, Snowflake and Data Platforms Team Lead and Sudha Gullapalli, 存数据云工程副总监| 5分钟阅读| 10月10日, 2023

Business continuity remains a top priority for global companies, given that disruptions caused by natural disasters, regional network and power outages, cyberattacks and breaches, and user error (just to name a few) are not an if 但是一个 .

Arjun宫Arjun宫, 据导演, Snowflake and Data Platforms Team Lead

对于存托信托(Depository Trust)这样的公司来说,保持业务连续性的理由尤其令人信服 & 清算 Corporation (存), 它被指定为具有系统重要性的金融市场效用(SIFMU), a U.S. 国会制定的地位,承认这样一个组织的破坏或失败将破坏金融市场的稳定. 这就是存致力于提供世界上最高效、最具弹性的交易后金融市场基础设施的原因. AWS上的雪花支持我们的业务弹性计划,使我们能够以运营效率和信心满足和扩展灾难恢复.

Sudha GullapalliSudha Gullapalli, 存 Associate Director of Data Cloud Engineering

Before we go further into our Snowflake and AWS story, here’s a bit more about 存 to help you understand what’s at stake. We settle a majority of securities transactions in the U.S., 4美元.5 trillion per day in U.S. government securities and a monthly average of $8.35 trillion in mortgage-backed securities. 你明白了:业务连续性对我们来说是必要的,以便结算证券交易或运行内部报告, 因此,我们的IT战略是基于安全的三个基本支柱, resilience and stability.

Building Resiliency into Every Element with Snowgrid

在存, the notion of resiliency is built into all our initiatives, 无论是清算证券还是为客户提供执行数据分析的能力, including how we go about modernizing our applications. Each application has a disaster recovery plan, including what we call a runbook, 详细介绍了故障转移和故障恢复模式,以及灾难恢复中两个主要标准的目标:

  • Recovery point objective (RPO): The extent of data recovery you expect to achieve should data be lost.
  • Recovery time objective (RTO): 在发生灾难时,您所能容忍的应用程序不可用的最长时间.

自2020年6月在AWS上实施雪花以来,我们的风险和数据分析, our organization has been incident-free. One of the reasons for this resiliency success is Snowflake’s Snowgrid 功能. Snowgrid使客户能够跨区域和云复制数据等, unlocking greater resiliency and minimizing business disruption.

我们使用Snowgrid技术进行了至少15次灾难恢复演习,以实现业务连续性. Our Snowflake instance handles over 700,每天在15个应用程序中查询000次,支持400多名用户, 使用Snowflake的帐户复制功能,我们已经能够实现接近零的数据丢失和接近零的RTO.

Snowflake’s built-in redundancy is a major benefit for 存; there is triple redundancy for all critical services and automatic retries for failed parts of any query. 在区域层面, Snowflake使用AWS上的可用性区域,还提供跨区域复制和故障转移, 是什么帮助我们实现了接近零数据丢失和接近零恢复时间的业务连续性目标. 我们可以使用雪花时间旅行功能查询和检索最多90天的已删除数据,并且故障安全功能提供了超过时间旅行保留期限的额外7天.

Snowgrid的帐户复制功能允许每个帐户拥有一个或多个故障转移组, so we can segregate apps by line of business. This lends a lot of flexibility to our disaster recovery process design, 包括在应用程序的连接URL完好无损的情况下进行故障转移的能力, 所以应用程序和它的连接一起失败(也可以一起失败). 我们还获得了独立旋转应用程序而不会相互影响的能力.

Reaping the Benefits

We always strive for an RTO of zero. Snowflake supports this effort with many of its key features, including multi-cloud support, on-demand scalability, SOC 1 and SOC 2 compliance, 复制, 和故障转移. Over the past 9+ months we have done resiliency (chaos) testing, 压力测试, and testing P99 lags; we feel that we’ve put Snowflake 复制 through thorough testing and had good success.

在存, the benefits of Snowgrid 复制 和故障转移 include consistency, speed and cost savings.

一致性:跨主、从帐户和云提供商的自动同步消除了手动迁移任务,提高了操作效率. 每个应用程序都有一个用于全局灾难恢复(DR)进程的runbook, 这意味着只有一个代码库来集中管理和执行复制. We can use the same code base and process for the U.S. and EU, saving effort.

速度: An application can be DR-enabled, 在不到三天的时间里测试并配备了它的运行手册(详细的DR计划). 简单和优雅的设计,使其快速工作与雪花DR.

节约成本: Snowflake 复制 is inexpensive. 我们以前的本地复制解决方案的成本翻了一番,因为硬件和许可成本翻了一番.

与雪花’s separation of compute and storage, highly compressed micro-partitions are replicated, 这样可以提高复制站点的存储效率和数据新鲜度. Paired with the ability to spin up compute resources instantly, 我们能够快速恢复,同时只在需要时支付计算费用. 避免双重加载和传输数据(ETL)的需要帮助我们实现了大约30%的节省.

在存, the notion of resiliency is built into all our initiatives, including how we go about modernizing our applications.

Four Tips for Business Continuity Success

存与雪花团队的合作为我们提供了一个紧密而持续的反馈循环,并有机会在私人预览中尝试新功能. 在一起, 我们已经能够移动像跨域身份管理系统(System for Cross-domain Identity Management, SCIM)供应和用户复制这样的大而复杂的东西.

当您承担(或继续)您自己的业务连续性计划时, we highly recommend Snowflake as the foundation and offer this advice:

  • 确保您了解公司的资产,并确定每个应用程序的可接受损失或停机时间(如果有的话).
  • Test constantly and look for edge cases.
  • 自动化, 自动化, 自动化——这是实现关键任务应用程序所需的规模和效率的唯一途径.
  • Keep measuring for continuous improvement.

在存,我们为从一开始就设计弹性IT策略而感到自豪.

与雪花, and the cross-cloud abilities of Snowgrid, 我们知道我们架构的安全和操作方面已经涵盖,因此我们可以专注于优化用户体验并为我们的业务增加价值.

Curious about Snowgrid? 读了 Operate at Global Scale with Snowgrid 解决方案简单.

dtccdotcom