Traditional on-premises data processing solutions have led to a hugely complex and expensive set of data silos where IT spends more time managing the infrastructure than extracting value from the data. Big data architectures have attempted to solve the problem with large pools of cost-effective storage, but in doing so have often created on-premises management and administration challenges. These challenges of acquiring, installing and maintaining large clusters of computing resources gave rise to cloud-based implementations as an alternative. Public cloud is becoming the new center for data as organizations migrate from static on-premises IT architectures to global, dynamic and multi-cloud architectures.
Snowflake is a data cloud company that presents a range of data management and processing offerings. What started out as a cloud data warehouse in 2012, Snowflake can now handle a variety of data workloads including data warehouse, data lake, data science and data marketplace. The company had its IPO in September 2020, which earned it the title of largest-ever IPO for a software company. In part because of its popularity, the company also has a rich ecosystem of technology and services partners for customers to leverage in their adoption of Snowflake’s platform.
Snowflake’s services were designed from the ground up to leverage the cloud scalability. This has enabled it to offer additional features and capabilities to its users, such as no software to install, no servers to maintain and rapidly scaling up and down workloads in seconds. The company’s Data Cloud is a new type of cloud-based data service that allows customers to instantly share live data in a governed and secured fashion. Organizations can publish or provide access to their data without copying or replication by simply designating data for sharing and granting the appropriate permissions. Unstructured data management in Snowflake means that customers will be able to avoid accessing and managing multiple systems, deploy fine-grained governance over unstructured files and metadata, and gain more complete insights.
Snowflake also introduced an array of new capabilities for its cloud data warehouse, including a developer tool called Snowpark that enable organizations to deploy custom data wrangling workflows on the platform. Snowpark enables developers to program data processes in their languages of choice and then execute extract, load and transform (ELT) and extract, transform and load (ETL) data modeling, data preparation, and analytics on Snowflake. It features native support for Java, Scala and Python. Recently, the company announced support for unstructured data such as audio, video, PDFs and images. This feature is in private preview, but the company has not said whether this will be through support for secondary NoSQL database technology or supported with the RDBMS.
Snowflake jumped on the cloud data warehouse bandwagon early in the game, and it has established itself as one of the leaders in this market. We assert that by 2022, more than one-half of all organizations will use cloud-based technology as their primary data lake platform. While Snowflake was initially launched as a data warehouse, these enhancements will expand the use cases the product can support. The unstructured data capabilities coupled with Snowpark will support more data lake-style workloads and more data science workloads. The data cloud capabilities will make it easier to share data within or outside the organization.
If your organization is considering cloud data warehouse or cloud data lake solutions, we recommend you consider Snowflake to see if it meets your needs.