If you are reading this, you are probably looking out the ways to transfer data from Salesforce Cloud to Snowflake. In this article, you will get different techniques to extract data from salesforce and load it to Snowflake.
What is Salesforce?
Salesforce, a cloud-based software-as-a-service platform, is the most popular CRM application in use today. Salesforce is amazingly customizable, has tons of integration functionality, and includes almost too many bells and whistles to count. Companies can use it to do everything from managing account planning to time management and team collaboration.
What is Snowflake?
Snowflake is a cloud-based data warehouse that’s fast, flexible, and easy to work with. It runs on Amazon Web Services EC2 and S3 instances and separates compute and storage resources, enabling users to scale the two independently and pay only for resources used. Snowflake can natively load and optimize both structured and semi-structured data and make it available via SQL. It provides native support for JSON, Avro, XML, and Parquet data, and can provide access to the same data for multiple workgroups or workloads simultaneously with no contention roadblocks or performance degradation.
Extract Data out of Salesforce
Salesforce provides many APIs for its products that can deliver data on accounts, leads, tasks, and more. You can find a list of APIs on one of the company’s help docs posts with some direction on when and how to use each API. By looking through that post, you can get an idea of which API makes the most sense for your use case.
For our purposes, we’ll use the REST API with SOQL (Salesforce Object Query Language), but the same data is available using other protocols, including streaming for real-time receipt of data.
Different ways to Migrate data from Salesforce to Snowflake
1. Python Scripts
Yes, we can use python scripts to fetch the data from salesforce using REST API. In Python script, we can connect AWS S3 to load the API’s JSON data directly into AWS S3. We can use various methods to implement this solution such as writing Python Script in the AWS EC2 instance, using Apache Airflow or AWS Lambda for orchestration.
In my recent project, we got the requirement to migrate the data from Salesforce to Snowflake. At that time, we used the AWS EC2 instance and create the python script inside the instance. In the python script, we used various open-source libraries to connect with AWS S3 and load the data into S3.
Below are the steps which I followed to load the data from API to S3 –
- Create an AWS EC2 Instance — It is quite easy to create an EC2 instance. You can follow the steps mentioned in AWS EC2 Documentation. As of now, you can use the default configuration for testing purposes.
- In EC2 instance, Create a Python script, import requests, JSON, and boto3 libraries.
- Write a code to fetch the data using requests.get() method. Connect with AWS S3 using boto3.client() method and load data into S3 using boto3.put_object() method. You can follow boto3 docs if required.
- Once data is loaded into AWS S3, you can load the data into Snowflake using AWS S3.
2. Apache Airflow
Apache Airflow is an open-source workflow management platform. It is basically used for orchestration but there are various open-source Airflow plugins which we can use to extract the data from Salesforce and load it into Snowflake.
Apache Airflow is a powerful ETL scheduler, organizer, and manager, but it doesn’t process or stream data. For that, you need a plugin like Airflow Plugin — Salesforce to act as a data pipeline. This plugin’s Salesforce Hook authenticates your requests to Salesforce. Then you can create new connections to pull and save Salesforce data. To extract defined datasets, the plugin’s SalesforceToS3Operator operator queries the Salesforce bulk API with Salesforce Object Query Language (SOQL), a version of SQL that talks to the Salesforce datastore.
Airflow Plugin — Salesforce is an Apache Airflow plugin developed by Astronomer, Inc. This is an open-source tool in which we can use Salesforce Plugin and Snowflake Plugin to extract the data from Salesforce and load it into Snowflake.
We can also use the AWS S3 plugin alongside Salesforce to load the data first into AWS S3 and then to Snowflake.
3. Sync out for Snowflake — Einstein Analytics
Sync Out for Snowflake exports your raw local Salesforce data via Tableau CRM to Snowflake using the Tableau CRM output connector for Snowflake. With Sync Out for Snowflake, keep your Salesforce data in Snowflake up to date using scheduled Data Sync without the need for a third-party ETL tool. Fresh Salesforce data is vital if you maintain a central Snowflake data lake for processing, analysis, business automation, or storage. For example, give your shipping logistics team the freshest data by merging your account data from your system of record, Salesforce, with your ERP’s shipping data in your data lake.
Enable Sync Out
- From Setup, enter Analytics in the Quick Find box.
- Select Settings under Tableau CRM.
- Select Enable Snowflake output Connection and Save.
- Turn on and configure the Snowflake output connector as described in Snowflake Output Connection. The output connection is used to link Tableau CRM to Snowflake. You don’t add or update a Data Prep recipe output node to use Sync Out, as the push happens during Data Sync.
4. Data Loader
Salesforce’s own ETL product dataloader.io is the company’s supported method for extracting data from Salesforce data stores. If you have a Salesforce account, you can sign in to dataloader.io with the same credentials. But dataloader.io isn’t free with Salesforce. It’s a separate paid service. This cloud data pipeline is powered by the Salesforce Mulesoft Anypoint Platform. It goes beyond what Mulesoft can do as an ETL. dataloader.io can save time by grouping your related objects into a single pull. So you don’t have to export multiple datasets individually and then recombine them in Excel or another database system. The dataloader.io ETL also has connectors to pull data from Box, DropBox, FTP, and SFTP repos into Salesforce.
dataloader.io has a clean, user-friendly UI with features like search filters and keyboard shortcuts. You can schedule automatic data extractions from Salesforce as frequently as once an hour, but dataloader.io won’t refresh your Salesforce-connected apps and BI tools in real-time.
Some of the helpful links —
https://docs.aws.amazon.com/AmazonS3/latest/gsg/SigningUpforS3.html
https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
There are various other paid tools out there to migrate the data from Salesforce to Snowflake or make a direct connection between Salesforce and Snowflake. If you have a budget then you can go with any paid tool else I recommend using Python scripts with any open-source orchestration tool such as Airflow and transferring the data from the Salesforce cloud to Snowflake.
If you have any queries, feel free to comment or mail me at imharmanbhatia@gmail.com.