Snowflake & Einstein Analytics
In this blog, I am focusing on one of the key areas ‘Data Acquisition’ on Einstein Analytics. For the demo purpose, I have decided to use our OOTB EA ‘Snowflake Connector’ & highlighting another great feature ‘Direct Data – Live Queries’ which was showcased in our last Dreamforce19 with Snowflake and is currently running in Beta in Spring 20 release. Like always, my focus is to leverage as much declarative side to execute this use-case. Feel free to try this out and let me know your feedback
The Importance of a “Data Integration First” Strategy
While the Fourth Industrial Revolution is set to change society like never before, it is paving the way for transformative changes in the way we live and radically disrupting almost every business sector. It’s all happening at an unprecedented, whirlwind pace. And in our fast-changing world where businesses data should underpin every decision. And it can be the most valuable enterprise asset if it is processed, stored, managed, and shared in ways that the employees can find, use, and interpret it. To do this, we need a robust data integration strategy which is a key part of successful digital transformation.
As the company’s business needs grow, our data needs will grow as well. We need to make sure that our strategy takes that growth into account. This is why data integration isn’t only a matter of selecting the right software or even the right project; it’s a holistic business strategy that impacts the company’s capacity to innovate and grow as a data-driven organization. Below are the four key factors which should be taken into consideration while developing our data integration strategy –
- Q1 -> Beyond the data integration project, what are the long term goals for the company?
- Q2 -> How can we embrace emerging technologies (even the ones we haven’t heard of yet)?
- Q3 -> What quantifiable business value is this data integration strategy supposed to bring?
- Q4 -> Lastly, do we have the people and technical resources to affect the change we are seeking in the organization?
“The world’s most valuable resource is no longer oil, but data”
– The Economist – 2017
Einstein Analytics Data Acquisition :
A well-defined data integration strategy is one of the crucial factors to deliver successful Einstein Analytics projects. We will need to have an integration strategy that is flexible enough to handle those changes easily and deliver quality EA artifacts. Don’t position your EA platform to dump your all data and build something which has very low business value for your end-users or very limited to few personas. Always think of your implementation key objectives with your business personas, so you can turn those analytics into actionable insights for them, and that’s where the product core strength is.
Einstein Analytics Data Connectors:
Einstein Analytics has several out of the box data connectors on the platform. Not only these can be scheduled individually but further down you can also prepare or transformed in your pre-post dataflow & data prep operations to simplify it for your end-users. Below are the two Snowflake connector types I will be using for my demo (Snapshot Below)
1) EA Database Connector
(Note – Analytics connectors are enabled when you enable data sync. If you haven’t enabled data sync, you can do it from Setup. Enter Analytics in the Quick Find box, then click Settings. Select Enable Data Sync and Connections, and then click Save)
2) Direct Data – Snowflake (Live Connections): “Query your Data without Moving it”
EA Direct Data lets users explore external data sources directly without having to move the data into EA. These explorations can also be used in Dashboards to power live insights.
General Business Context :
With the increasing volume of information collected through a variety of channels, businesses face a segmented approach to data. In order to remove those data silos and to have a single source of truth, the company has decided to build a cloud-based data-warehousing solution on Snowflake. Via this integration business also wanted to integrate their external data points (ERP) for their Sales & Service reps in Salesforce Analytics Cloud.
Lastly, they also have some frequent changing data points which they like to integrate with EA dashboards to provide some ‘Live’ KPIs. So the business can stay up to date on those metrics on those changes and take appropriate actions.
Solution Approach :
- EA Snowflake Connector: I will be using EA OOTB Snowflake Connector to ingest the external data into EA
- Snowflake Live Connections: For a few fast-changing reporting KPIs I have decided to use the EA live connections feature.
My Usecase : (Coronavirus: COVID-19) :
We are living in unprecedented times as COVID-19 crisis is impacting everyone around the world and every aspect of our daily lives. So while I am working remotely, and of course thinking of some use cases around this topic. Or maybe to build something interesting for industry verticals esp. for travel & tourism customers in our ecosystem in order to provide EA powered analytics or actionable insights.
- Data points I am using to depict my COVID-19 EA KPIs are derived from “John Hopkins University of Medicine – Cornovirus Resource Center” publicly available data (Link)
- Number projections vs actual people impacted may be different in a real scenario than the projections displayed here in this use case dashboard, and it has to be verified with the respective health authorities only.
Why Einstein Analytics?
- If you ask me, this is a perfect use-case for Tableau depicting heterogeneous data points/real-time streaming and the full spectrum of wide analytics capabilities.
- Possible Use Case 1 – Einstein Analytics is all about actionable insights & intelligence which is tightly embedded within your Salesforce CRM application for your business end-users. But think about customers or companies in tourist/travel verticals, where they would like to embed or explore this information on their transactional records, or maybe relevant for them prior to offer any future travel services/plans to their customers.
- Possible Use Case 2 – Maybe relevant to your global security team where they have your travel information already stored in Salesforce. This info can be handy for them to validate your travel request to avoid any future risks to your safety.
Mentioned above use cases I am trying to highlight or showcasing Einstein Analytics’ strong ext. data integration platform capabilities in this article.
EA Sample Dashboard (COVID-19) :
Below is the short demo showcasing the EA Snowflake Data Integration capabilities in the context of COVID-19 use-case
Join our EA Success Community & Campfire Sessions:
If you would like to deep dive more on to this, then I would highly recommend you all to join our EA Camp series or EA success community. Additionally, you can read some good content from our own EA Evangelists from ACE team Anirudh Sridhar & Chris Ames on the public forums.
Below are the key components I am using for my use-case
- Snowflake Trial Account – Link ( Note – Make sure to understand the usage & cost implications of different tier types beforehand)
- Free Salesforce Development Environment: EA enabled – Link
What is Snowflake :
Snowflake’s cloud data platform offers an analytic data warehouse solution as Software-as-a-Service (SaaS). It runs completely on public cloud infrastructure. It is built on top of multiple clouds(GCP, AWS, and Azure) for analytics workload. It uses virtual compute instances for its compute needs and storage service for persistent storage of data. Its unique architecture consists of three key layers –
- Database Storage
- Query Processing
- Cloud Services
Database Storage :
- When data is loaded into Snowflake, it reorganizes that data into its internal optimized, compressed, columnar format. Then it stores this optimized data in cloud storage.
- Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake.
- The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.
Query Processing :
- Query execution is performed in the processing layer. Snowflake processes query using “virtual warehouses”.
- Warehouses are required for queries, as well as all DML operations, including loading data into tables. A warehouse is defined by its size, as well as the other properties that can be set to help control and automate warehouse activity
Cloud Services :
The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch.
View (SQL) :
This is another topic I would like touch before I will deep dive into my use-case, It’s very essential to build the right Snowflake view types (Materialised vs. Regular views) in order to improve your query performance for different workloads.
In a database, a view is the result set of a stored query on the data, which the database users can query just as they would in a persistent database collection object. Other advantages –
- It can join and simplify multiple tables into a single virtual table.
- It can act as aggregated tables
- It can hide the complexity of data.
- It takes very little space to store, as the database contains only the definition of a view, not a copy of all the data that it presents.
Views in Snowflake :
Snowflake supports two types of views:
- Non-materialized views (usually simply referred to as “views”)
- Materialized views.
Views are automatically updated based on the underlying table’s data changing. If the underlying tables have schema changes (column names are changed, for example), then the view will need to be recreated. Otherwise, views are dynamically updated.
Meanwhile, materialized views have the same characteristics of a view in the above scenarios. The only difference is the view has been materialized. If the tables are updated frequently, then you are using a lot of computing to update the materialized view. In Direct Data (EA Live Connections), it is highly recommended to use a materialized view, as the data is pre-computed and helps your queries to run faster.
Implementation Steps Overview :
Implementation Steps :
Snowflake EA Connector: Phase 1
- Step 1: Trial Account Sign up
- Step 2: Create Snowflake Objects
- Creating a Virtual Warehouse
- Creating a Database
- Creating a Schema & Table
- Step 3: Load the data into the table
- Step 4: Query the Loaded Data
- Step 5: Grant Permissions & Roles
- Step 6: Allocate Resources to your table (Warehouse)
Salesforce Einstein Analytics: Phase 2
- Step 7: EA Snowflake Database Connection Creation
- Step 8: Snowflake Data Pull & Extraction
- Step 9: Snowflake Data Preview & ELT jobs
- Step 10: Final Snowflake based Dataset creation
(Note: Phase 2 is not relevant for ‘Snowflake – Live Connections’. EA Direct Data lets users explore external data sources directly without having to move the data into EA. These explorations can also be used in Dashboards to power live insights.)
Let’s Deep Dive :
Step 1: Snowflake Sign up
If you are new to this integration. You can sign up for the free 30 days trial account (400 Credits) to test your use-case. Based upon your country region you can select your hosting instance (AWS or Azure) in this process.
Step 2: Create Snowflake Objects
a) Virtual Warehouse Creation
Create a Snowflake Virtual Warehouse, in my case – EAIamTB (Snapshot Below)
b) Snowflake Database Creation
Now we will create the Snowflake database – (EAIamTB_db) (Snapshot Below)
Now navigate it to this Database and create your SCHEMA – (EAIAMTB_SCHEMA) (Snapshot Below) and will be using this schema to expose our table into EA
Below is the sample CSV table I will be creating for this demo.
Now navigate it to the TABLE tab and create your table columns and assign our earlier created schema to this table.
Now we will load data into this table via the “Load Table” Interface.
Now assign your earlier created WAREHOUSE to this table and upload the source file. If you any specific data format for your table which you can also create from file format options in the UI
Now we will load our table data from our local machine
Optionally if you have data in any other cloud staging environment such as AWS S3 / GCP/ Azure you can also load it to your table.
Now we will create the file format for our upload. Snowflake’s default options match a standard CSV well; however, we may need to change a few options. We name our file format, verify that our column separator is indeed a comma and set Header lines to skip to the value of 1. We do this as the first line of data in our file contains the column names, which we do not want to import into our table in Snowflake – EAIAMTB_FORMAT (Snapshot Below)
Now simply navigate to the worksheet tab and open your table. In this tab, you can write your SQL queries to query your table data. And don’t forget to assign your created attributes (Database/Schema/Warehouse) to your table. By default, your role is SYSADMIN in your trial account and we can leave this as it is for now. (Snapshot Below)
Lets Recap –
Before going into the next steps lets recap what we have done so far –
We were able to –
- Successfully signed up for Snowflake free tier account
- Created our Snowflake Objects.
- Loaded our Demo CSV Data.
- Allocate the appropriate resources to our snowflake demo table
Phase 2 :
Step 7: EA Data Connection Creation
In this step, we will create the Snowflake EA connection from Einstein Analytics. In order to fetch our table data and then store it into DATA SYNC cache in Einstein Analytics.
Now go to your Salesforce Environment and then Analytics Studio. Then you need to press ‘Data Manager’ at the very right corner. (snapshot below).
Then in the connect tab, you need to press ‘Connect to Data’ and select Snowflake computing connector and fill your snowflake credentials. Then Press Save & Test (Snapshot Below)
EA-Snowflake Credentials Overview :
- Connection Name: Any meaningful Name
- Developer Name: Any meaningful Name
- Description: Any meaningful description of the connection
- Schema: Your Snowflake schema name
- Password: Your Password for your Snowflake account
- Database: Your Snowflake database name
- Warehouse: Your Snowflake warehouse name.
- Username: Your user name for the Snowflake account.
- Account: Check your Snowflake account URL. If your login URL is ‘eai8797.eu-central-1.snowflakecomputing.com’ Then your account name is eai8797.eu-central-1
- Role (Optional): Your Snowflake Integration user role that you are using to connect
- JDBC URL (Optional): Set as – (parameter1=value¶meter2=value¶meter3=value)
You can see below our connection is Successful (Snapshot Below)
Now Press Continue and you will see my Snowflake connection established and display in Einstein Analytics instance. (Snapshot Below)
Step 8: Snowflake Data Pull & Extraction
Now go back to your connection. You will see your Snowflake Table in Einstein Analytics. In my case EAIAMTB_TABLE. (Snapshot Attached)
Now select the table fields you intended to bring it into EA. Then press continue and save (Snapshot Below)
After pressing continue then you can see your table data preview in Einstein Analytics successfully
At this point, we are able to establish the EA-Snowflake connection successfully. (Snapshot Below)
Step 9: Snowflake Data Preview & ELT jobs
Now we will simply run the data sync to bring our table data into EA Data Sync cache and monitor the job (Snapshot Below)
Step 10: Final Snowflake Based Dataset creation
We have our data in the Einstein Analytics data cache now. We can further take this ext data in our EA Dataflow to register our final snowflake based dataset. In order to make it simple, I will only create a simple Data flow and register my one Dataset in Einstein Analytics named as ‘snowflake_ea_dataset’
Simply go on a dataflow tab from the data manager to create your dataflow
Voila, now you can see our dataflow and Snowflake based dataset registered in EA successfully : (Snapshot Below)
EA Direct Data (Live Connections) :
Step 1 – Feature Activation
Under Settings → Administer → Analytics → Settings → select ->Enable Einstein Analytics Direct Data for external data sources” (Snapshot Below)
Step 2 – Provide users with permission to explore live connections
Assign the below permission set which will allow you to explore live connections and view widgets in EA Dashboards (Snapshot below)
Step 3 – Using EA Direct Data – (Create Live Connections)
Now we can create live connections to our external data source from the data manager in EA.
- Navigate it to – Data Manager → Connect → Connect to Data
- We can use the same above credentials to set up Snowflake live connections
- Additionally, you need to set up the folder path where you want to add this connection (Snapshot Below)
Step 4 – Explore Snowflake Live Data
Now you will be able to see & explore all live data that you have access to from Analytics Studio. Just click on a live data connection to start exploring.
Connector Limits & Considerations :
EA Computing Connector
- This connector can sync up to 100 million rows or 50 GB per object, whichever limit it reaches first.
- A Snowflake warehouse can be set to automatically resume or suspend. Review the Snowflake documentation for information about the optimal use of these features
- Exclude unnecessary or sensitive data from syncing to Analytics with data sync filters. Filters run on the source object and speed up data sync by pulling only the data you need into Analytics.
- This connector only supports full uploads
Direct Data – Snowflake Live Connections -(Performance & Optimisation)
- Direct Queries apply towards your EA limits – Link
- The performance of your EA dashboards using direct queries primarily depends on query performance of the data source. Please make sure that your source tables are optimized for analytical queries.
- Materialized Views: Think of using Snowflake Materialized Views for your commonly used Views. Because the data is pre-computed, querying performance would be faster than executing the original query. Most data warehouses can automatically refresh materialized views based on a schedule or data change.
- It’s a good practice to avoid grouping high-cardinality categorical fields. If you need to do so, then apply filters. Avoid using a contains a filter or try using ‘equals’ instead
- EA Dashboard Pages: Distribute your direct queries across different dashboard pages overbuilding on a single page.
- Think ahead in positioning your use-case to leverage direct queries on EA dashboards. It may be worth putting some fast-changing KPIs on your dashboards ONLY. It is highly recommended to build your queries against EA datasets where possible.
- High Volume Data: Think of leveraging Snowflake connector to bring in billions of rows of data and to get benefit from scale & performance offered by the platform. Don’t construct expensive queries that can cause dashboard perf issues.
Summary & Conclusion:
- In the above scenario, I have focused on unidirectional flow to ingest data into EA from Snowflake Datawarehouse.
- I have used the fairly straightforward sample use case in order to showcase the dataset creation via this connector and leveraging direct queries. You can extend this integration with further Snowflake products or Partners to bring your on-prem. data more closely to your end-user & CRM operational data. Without storing in core Salesforce (Sales & Service Cloud).
- Lastly, think ahead about designing the data schemas for your datasets and your large queries. Eventually, the data will be incrementally increased over time. You don’t want to put yourself in a position where you will be having bottlenecks in your architecture and have to re-engineer your schemas from scratch.
For the new audience on this topic, I hope it gave you a fair idea about the Snowflake integration capabilities from the EA platform standpoint. Let me know your feedback.