-0.4 C
New York
Friday, January 10, 2025

Ingest knowledge from Google Analytics 4 and Google Sheets to Amazon Redshift utilizing Amazon AppFlow


Google Analytics 4 (GA4) supplies priceless insights into person conduct throughout web sites and apps. However what if it is advisable to mix GA4 knowledge with different sources or carry out deeper evaluation? That’s the place Amazon Redshift and Amazon AppFlow are available. Amazon AppFlow bridges the hole between Google purposes and Amazon Redshift, empowering organizations to unlock deeper insights and drive data-informed selections. On this publish, we present you easy methods to set up the information ingestion pipeline between Google Analytics 4, Google Sheets, and an Amazon Redshift Serverless workgroup.

Amazon AppFlow is a totally managed integration service that you should utilize to securely switch knowledge from software program as a service (SaaS) purposes, corresponding to Google BigQuery, Salesforce, SAP, HubSpot, and ServiceNow, to Amazon Net Providers (AWS) providers corresponding to Amazon Easy Storage Service (Amazon S3) and Amazon Redshift, in only a few clicks. With Amazon AppFlow, you possibly can run knowledge flows at practically any scale and on the frequency you select—on a schedule, in response to a enterprise occasion, or on demand. You may configure knowledge transformation capabilities corresponding to filtering and validation to generate wealthy, ready-to-use knowledge as a part of the circulation itself, with out extra steps. Amazon AppFlow mechanically encrypts knowledge in movement, and permits you to limit knowledge from flowing over the general public web for SaaS purposes which are built-in with AWS PrivateLink, lowering publicity to safety threats.

Amazon Redshift is a quick, scalable, and totally managed cloud knowledge warehouse that permits you to course of and run your advanced SQL analytics workloads on structured and semi-structured knowledge. It additionally helps you securely entry your knowledge in operational databases, knowledge lakes, or third-party datasets with minimal motion or copying of information. Tens of 1000’s of consumers use Amazon Redshift to course of giant quantities of information, modernize their knowledge analytics workloads, and supply insights for his or her enterprise customers.

Stipulations

Earlier than beginning this walkthrough, it is advisable to have the next conditions in place:

  • An AWS account.
  • In your Google Cloud undertaking, you’ve enabled the next APIs:
    • Google Analytics API
    • Google Analytics Admin API
    • Google Analytics Information API
    • Google Sheets API
    • Google Drive API

For extra data, check with Amazon AppFlow help for Google Sheets.

For the steps to allow these APIs, see Allow and disable APIs on the API Console Assist for Google Cloud Platform.

Structure overview

The next structure exhibits how Amazon AppFlow can rework and transfer knowledge from SaaS purposes to processing and storage locations. Three sections seem from left to proper within the diagram: Supply, Transfer, Goal. These sections are described within the following part.

  • Supply – The leftmost part on the diagram represents totally different purposes performing as a supply, together with Google Analytics, Google Sheets, and Google BigQuery.
  • Transfer – The center part is labeled Amazon AppFlow. The part accommodates containers that symbolize Amazon AppFlow operations corresponding to Masks Fields, Map Fields, Merge Fields, Filter Information, and others. On this publish, we deal with organising the information motion utilizing Amazon AppFlow and filtering knowledge based mostly on begin date. The opposite transformation operations corresponding to mapping, masking, and merging fields aren’t coated on this publish.
  • Vacation spot – The part on the appropriate of the diagram is labeled Vacation spot and represents targets corresponding to Amazon Redshift and Amazon S3. On this psot, we primarily deal with Amazon Redshift because the vacation spot.

This publish has two elements. The first half covers integrating from Google Analytics. The second half focuses on connecting with Google Sheets.

Software configuration in Google Cloud Platform

Amazon AppFlow requires OAuth 2.0 for authentication. You should create an OAuth 2.0 consumer ID, which Amazon AppFlow makes use of when requesting an OAuth 2.0 entry token. To create an OAuth 2.0 consumer ID within the Google Cloud Platform console, comply with these steps:

  1. On the Google Cloud Platform Console, from the initiatives listing, choose a undertaking or create a brand new one.
  2. If the APIs & Providers web page isn’t already open, select the menu icon on the higher left and choose APIs & Providers.
  3. Within the navigation pane, select Credentials.
  4. Select CREATE CREDENTIALS, then select OAuth consumer ID, as proven within the following screenshot.

  1. Choose the appliance kind Net utility, enter the title demo-google-aws, and supply URIs for Licensed JavaScript origins https://console.aws.amazon.com. For Licensed redirect URIs, add https://us-east-1.console.aws.amazon.com/appflow/oauth. Select SAVE, as proven within the following screenshot.

  1. The OAuth consumer ID is now created. Choose demo-google-aws.

  1. Beneath Extra data, as proven within the following screenshot, observe down the Consumer ID and Consumer secret.

Information ingestion from Google Analytics 4 to Amazon Redshift

On this part, you configure Amazon AppFlow to arrange a connection between Google Analytics 4 and Amazon Redshift for knowledge migration. This process might be categorized into the next steps:

  1. Create a connection to Google Analytics 4 in Amazon AppFlow
  2. Create an IAM function for Amazon AppFlow integration with Amazon Redshift
  3. Arrange Amazon AppFlow connection for Amazon Redshift
  4. Arrange desk and permission in Amazon Redshift
  5. Create knowledge circulation in Amazon AppFlow

Create a connection to Google Analytics 4 in Amazon AppFlow

To create a connection to Google Analytics 4 in Amazon AppFlow, comply with these steps:

  1. Check in to the AWS Administration Console and open Amazon AppFlow.
  2. Within the navigation pane on the left, select Connections.
  3. On the Handle connections web page, for Connectors, select Google Analytics 4.
  4. Select Create connection.
  5. Within the Connect with Google Analytics 4 window, enter the next data. For Consumer ID, enter the consumer ID of the OAuth 2.0 consumer ID in your Google Cloud undertaking created within the earlier part. For Consumer secret, enter the consumer secret of the OAuth 2.0 consumer ID in your Google Cloud undertaking created within the earlier part.
  6. (Elective) underneath Information encryption, select Customise encryption settings (superior) if you wish to encrypt your knowledge with a buyer managed key in AWS Key Administration Service (AWS KMS). By default, Amazon AppFlow encrypts your knowledge with an AWS KMS key that AWS creates, makes use of, and manages for you. Select this selection if you wish to encrypt your knowledge with your personal AWS KMS key as an alternative.

The next screenshot exhibits the Connect with Google Analytics 4 window.

Amazon AppFlow encrypts your knowledge throughout transit and at relaxation. For extra data, see Information safety in Amazon AppFlow.

If you wish to use an AWS KMS key from the present AWS account, choose this key underneath Select an AWS KMS key. If you wish to use an AWS KMS key from a unique AWS account, enter the Amazon Useful resource Title (ARN) for that key:

  1. For Connection title, enter a reputation on your connection
  2. Select Proceed
  3. Within the window that seems, register to your Google account and grant entry to Amazon AppFlow

On the Handle connections web page, your new connection seems within the Connections desk. Whenever you create a circulation that makes use of Google Analytics 4 as the information supply, you possibly can choose this connection.

Create an IAM function for Amazon AppFlow integration with Amazon Redshift

You should utilize Amazon AppFlow to switch knowledge from supported sources into your Amazon Redshift databases. You want an IAM function as a result of Amazon AppFlow wants authorization to entry Amazon Redshift utilizing an Amazon Redshift Information API.

  1. Check in to the AWS Administration Console, ideally as admin person, and within the navigation pane of the IAM dashboard, select Insurance policies.
  2. Select Create coverage.
  3. Choose the JSON tab and paste within the following coverage. Amazon AppFlow wants the next permissions to achieve entry and run SQL statements with the Amazon Redshift database.
{
  "Model": "2012-10-17",
  "Assertion": [
    {
      "Sid": "DataAPIPermissions",
      "Effect": "Allow",
      "Action": [
        "redshift-data:ExecuteStatement",
        "redshift-data:GetStatementResult",
        "redshift-data:DescribeStatement"
      ],
      "Useful resource": "*"
    },
    {
      "Sid": "GetCredentialsForAPIUser",
      "Impact": "Enable",
      "Motion": "redshift:GetClusterCredentials",
      "Useful resource": [
        "arn:aws:redshift:*:*:dbname:*/*",
        "arn:aws:redshift:*:*:dbuser:*/*"
      ]
    },
    {
      "Sid": "GetCredentialsForServerless",
      "Impact": "Enable",
      "Motion": "redshift-serverless:GetCredentials",
      "Useful resource": "*"
    },
    {
      "Sid": "DenyCreateAPIUser",
      "Impact": "Deny",
      "Motion": "redshift:CreateClusterUser",
      "Useful resource": [
        "arn:aws:redshift:*:*:dbuser:*/*"
      ]
    },
    {
      "Sid": "ServiceLinkedRole",
      "Impact": "Enable",
      "Motion": "iam:CreateServiceLinkedRole",
      "Useful resource": "arn:aws:iam::*:function/aws-service-role/redshift-data.amazonaws.com/AWSServiceRoleForRedshift",
      "Situation": {
        "StringLike": {
          "iam:AWSServiceName": "redshift-data.amazonaws.com"
        }
      }
    }
  ]
}

  1. Select Subsequent, present the Coverage title as appflow-redshift-policy, Description as appflow redshift coverage, and select Create coverage.

  1. Within the navigation pane, select Roles and Create function. Select Customized belief coverage and paste within the following. Select Subsequent. This belief coverage grants Amazon AppFlow the flexibility to imagine the function for Amazon AppFlow to entry and course of knowledge.
{
  "Model": "2012-10-17",
  "Assertion": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "appflow.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

  1. Seek for coverage appflow-redshift-policy, test the field subsequent to it, and select Subsequent.

  1. Present the function title appflow-redshift-access-role and Description and select Create function.

Arrange Amazon AppFlow connection for Amazon Redshift

To arrange an Amazon AppFlow connection for Amazon Redshift, comply with these steps:

  1. On the Amazon AppFlow console, within the navigation pane, select Connectors, choose Amazon Redshift, and select Create connection.

  1. Enter the connection title appflow-redshift-connection. You may both use Amazon Redshift provisioned or Amazon Redshift Serverless, however on this instance we’re utilizing Amazon Redshift Serverless. Choose Amazon Redshift Serverless and enter the workgroup title and database title.
  2. Select the S3 bucket and enter the bucket prefix.

  1. For Amazon S3 entry, choose the IAM function hooked up to the Redshift cluster or namespace throughout the creation of the Redshift cluster. Moreover, for the Amazon Redshift Information API, select the IAM function appflow-redshift-access-role created within the earlier part after which select

Arrange a desk and permission in Amazon Redshift

To arrange desk and permission in Amazon Redshift, comply with these steps:

  1. On the Amazon Redshift console, select Question editor v2 in Explorer.
  2. Connect with your present Redshift cluster or Amazon Redshift Serverless workgroup.
  3. Create a desk with the next Information Definition Language (DDL).
create desk public.stg_ga4_daily_summary

(

event_date date,

area varchar(255),

nation varchar(255),

metropolis varchar(255),

deviceCategory varchar(255),

deviceModel varchar(255),

browser varchar(255),

active_users INTEGER,

new_users integer,

total_revenue  NUMERIC(18,2)

);

The next screenshot exhibits the profitable creation of this desk in Amazon Redshift:

The next step is simply relevant to Amazon Redshift Serverless. If you’re utilizing a Redshift provisioned cluster, you possibly can skip this step.

  1. Grant the permissions on the desk to the IAM person utilized by Amazon AppFlow to load knowledge into Amazon Redshift Serverless, for instance, appflow-redshift-access-role.
GRANT INSERT ON TABLE public.stg_ga4_daily_summary TO "IAMR:appflow-redshift-access-role";

Create knowledge circulation in Amazon AppFlow

To create an information circulation in Amazon AppFlow, comply with these steps:

  1. On the Amazon AppFlow console, select Flows and choose Amazon Redshift. Select Create circulation and enter the circulation title and the circulation description, as proven within the following screenshot.

  1. In Supply title, select Google Analytics 4. Select the Google Analytics 4 connection.
  2. Choose the Google Analytics 4 object, then select Amazon Redshift because the vacation spot, choosing the public schema and stg_ga4_daily_summary desk in your Redshift occasion.

  1. For Movement set off, select Run on demand and select Subsequent, as proven within the following screenshot.

You may run the circulation on schedule to drag both full or incremental knowledge refresh. For extra data, see Schedule-triggered flows.

  1. Choose Manually map fields. From the Supply area title dropdown menu, choose the attribute date, and from the Vacation spot area title, choose event_date and select Map fields, as proven within the following screenshot.

  1. Repeat the earlier step (step 5) for the next attributes after which select Subsequent. The next screenshot exhibits the mapping.
Dimension:browser --> browser
Dimension:area --> area
Dimension:nation --> nation
Dimension:metropolis --> metropolis
Dimension:deviceCategory --> devicecategory
Dimension:deviceModel --> devicemodel
Metric:activeUsers --> active_users
Metric:newUsers --> new_users
Metric: totalRevenue --> total_revenue
Dimension:date --> event_date

The Google Analytics API supplies varied dimensions and metrics for reporting functions. Confer with API Dimensions & Metrics for particulars.

  1. In Discipline title, enter the filter start_end_date and select Subsequent, as proven within the following screenshot. The Amazon AppFlow date filter helps each a begin date (criteria1) and an finish date (criteria2) to outline the specified date vary for knowledge switch. We’re utilizing the date vary as a result of we now have pattern knowledge created for this vary.

  1. Overview the configurations and select Create circulation.
  2. Select Run circulation, as proven within the following screenshot, and anticipate the circulation execution to be accomplished.

  1. On the Amazon Redshift console, select Question editor v2 in Explorer.
  2. Connect with your present Redshift cluster or Amazon Redshift Serverless workgroup.
  3. Enter the next SQL to confirm the information in Amazon Redshift.
choose * from public.stg_ga4_daily_summary

The screenshot beneath exhibits the outcomes loaded into the stg_ga4_daily_summary desk.

Information ingestion from Google Sheets to Amazon Redshift

Ingesting knowledge from Google Sheets to Amazon Redshift utilizing Amazon AppFlow streamlines analytics, enabling seamless switch and deeper insights. On this part, we exhibit how enterprise customers can preserve their enterprise glossary in Google Sheets and combine that utilizing Amazon AppFlow with Amazon Redshift and get significant insights.

For this demo, you possibly can add the Nation Market phase file to your Google sheet earlier than continuing to the following steps. These steps present easy methods to configure Amazon AppFlow to arrange a connection between Google Sheets and Amazon Redshift for knowledge migration. This process might be categorized into the next steps:

  1. Create Google Sheets connection in Amazon AppFlow
  2. Arrange desk and permission in Amazon Redshift
  3. Create knowledge circulation in Amazon AppFlow

Create Google Sheets connection in Amazon AppFlow

To create a Google Sheets connection in Amazon AppFlow, comply with these steps:

  1. On the Amazon AppFlow console, select Connectors, choose Google Sheets, then select Create connection.
  2. Within the Connect with Google Sheets window, enter the next data. For Consumer ID, enter the consumer ID of the OAuth 2.0 consumer ID in your Google Sheets undertaking. For Consumer secret, enter the consumer secret of the OAuth 2.0 consumer ID in your Google Sheets undertaking.
  3. For Connection title, enter a reputation on your connection.
  4. (Elective) Beneath Information encryption, select Customise encryption settings (superior) if you wish to encrypt your knowledge with a buyer managed key in AWS KMS. By default, Amazon AppFlow encrypts your knowledge with an AWS KMS key that AWS creates, makes use of, and manages for you. Select this selection if you wish to encrypt your knowledge with your personal AWS KMS key as an alternative.
  5. Select Join.
  6. Within the window that seems, register to your Google account and grant entry to Amazon AppFlow.

Arrange desk and permission in Amazon Redshift

To arrange a desk and permission in Amazon Redshift, comply with these steps:

  1. On the Amazon Redshift console, select Question editor v2 in Explorer
  2. Connect with your present Redshift cluster or Amazon Redshift Serverless workgroup
  3. Create a desk with the next DDL
create desk public.stg_nation_market_segment(
n_nationkey int4 not null,
n_name char(25) not null ,
n_regionkey int4 not null,
n_comment varchar(152) not null,
n_marketsegment varchar(255),
Main Key(N_NATIONKEY)
) distkey(n_nationkey) sortkey(n_nationkey);

he following steps are solely relevant to Amazon Redshift Serverless. If you’re utilizing a Redshift provisioned cluster, you possibly can skip this step.

  1. Grant the permissions on the desk to the IAM person utilized by Amazon AppFlow to load knowledge into Amazon Redshift Serverless, for instance, appflow-redshift-access-role
GRANT INSERT ON TABLE public.stg_nation_market_segment TO "IAMR:appflow-redshift-access-role";

Create knowledge circulation in Amazon AppFlow

  1. On the Amazon AppFlow console, select Flows and choose Google Sheets. Select Create circulation, enter the circulation title and circulation description, and select Subsequent.
  2. Choose Google Sheets in Supply title and select the Google Sheets connection.
  3. Choose the Google Sheets object nation_market_segment#Sheet1.
  4. Select the Vacation spot title as Amazon Redshift, then choose stg_nation_market_segment as your Amazon Redshift object, as proven within the following screenshot.

  1. For Movement set off, choose On demand and select Subsequent.

You may run the circulation on schedule to drag full or incremental knowledge refresh. Learn extra at Schedule-triggered flows.

  1. Choose Manually map fields. From the Supply area title dropdown menu, choose Map all fields instantly. When a dialog field pops up, select the respective attribute values and select Map fields, as proven within the following screenshot. Select Subsequent.

The next screenshot exhibits the mapping.

  1. On the Add Filters web page, select Subsequent.
  2. On the Overview and create web page, select Create circulation.
  3. Select Run circulation and anticipate the circulation execution to complete.

The screenshot beneath exhibits the execution particulars of the circulation job.

  1. On the Amazon Redshift console, select Question editor v2 in Explorer.
  2. Connect with your present Redshift cluster or Amazon Redshift Serverless workgroup.
  3. Run the next SQL to confirm the information in Amazon Redshift.
choose * from public.stg_nation_market_segment

The screenshot beneath exhibits the outcomes loaded into the stg_nation_market_segment desk.

  1. Run the next SQL to organize a pattern dataset in Amazon Redshift.
create desk public.buyer (
c_custkey int8 not null ,
c_name varchar(25) not null,
c_address varchar(40) not null,
c_nationkey int4 not null,
c_phone char(15) not null,
c_acctbal numeric(12,2) not null,
c_mktsegment char(10) not null,
c_comment varchar(117) not null,
Main Key(C_CUSTKEY)
) distkey(c_custkey) sortkey(c_custkey);

create desk public.lineitem (
l_orderkey int8 not null ,
l_partkey int8 not null,
l_suppkey int4 not null,
l_linenumber int4 not null,
l_quantity numeric(12,2) not null,
l_extendedprice numeric(12,2) not null,
l_discount numeric(12,2) not null,
l_tax numeric(12,2) not null,
l_returnflag char(1) not null,
l_linestatus char(1) not null,
l_shipdate date not null ,
l_commitdate date not null,
l_receiptdate date not null,
l_shipinstruct char(25) not null,
l_shipmode char(10) not null,
l_comment varchar(44) not null,
Main Key(L_ORDERKEY, L_LINENUMBER)
) distkey(l_orderkey) sortkey(l_shipdate,l_orderkey)  ;

create desk public.orders (
o_orderkey int8 not null,
o_custkey int8 not null,
o_orderstatus char(1) not null,
o_totalprice numeric(12,2) not null,
o_orderdate date not null,
o_orderpriority char(15) not null,
o_clerk char(15) not null,
o_shippriority int4 not null,
o_comment varchar(79) not null,
Main Key(O_ORDERKEY)
) distkey(o_orderkey) sortkey(o_orderdate, o_orderkey) ;
copy lineitem from 's3://redshift-downloads/TPC-H/2.18/10GB/lineitem.tbl' iam_role default delimiter '|' area 'us-east-1';
copy orders from 's3://redshift-downloads/TPC-H/2.18/10GB/orders.tbl' iam_role default delimiter '|' area 'us-east-1';
copy buyer from 's3://redshift-downloads/TPC-H/2.18/10GB/buyer.tbl' iam_role default delimiter '|' area 'us-east-1';

  1. Run the next SQL to do the information analytics utilizing Google Sheets enterprise knowledge classification within the Amazon Redshift dataset.
choose
n_marketsegment,
sum(l_extendedprice * (1 - l_discount)) as income
from
public.buyer,
public.orders,
public.lineitem,
public.stg_nation_market_segment
the place
c_custkey = o_custkey
and l_orderkey = o_orderkey
and c_nationkey = n_nationkey
group by
1
order by
income desc;

The screenshot beneath exhibits the outcomes from the aggregated question in Amazon Redshift from knowledge loaded utilizing Amazon Appflow.

Clear up

To keep away from incurring costs, clear up the sources in your AWS account by finishing the next steps:

  1. On the Amazon AppFlow console, within the navigation pane, select Flows.
  2. From the listing of flows, choose the circulation title created and delete it.
  3. Enter “delete” to delete the circulation.
  4. Delete the Amazon Redshift workgroup.
  5. Clear up sources in your Google account by deleting the undertaking that accommodates the Google BigQuery sources. Observe the documentation to clear up the Google sources.

Conclusion

On this publish, we walked you thru the method of utilizing Amazon AppFlow to combine knowledge from Google Advertisements and Google Sheets. We demonstrated how the complexities of information integration are minimized so you possibly can deal with deriving actionable insights out of your knowledge. Whether or not you’re archiving historic knowledge, performing advanced analytics, or making ready knowledge for machine studying, this connector streamlines the method, making it accessible to a broader vary of information professionals.

For extra data, check with Amazon AppFlow help for Google Sheets and Google Advertisements.


In regards to the authors

Ritesh Kumar Sinha is an Analytics Specialist Options Architect based mostly out of San Francisco. He has helped clients construct scalable knowledge warehousing and massive knowledge options for over 16 years. He likes to design and construct environment friendly end-to-end options on AWS. In his spare time, he loves studying, strolling, and doing yoga.

Tahir Aziz is an Analytics Answer Architect at AWS. He has labored with constructing knowledge warehouses and massive knowledge options for over 13 years. He loves to assist clients design end-to-end analytics options on AWS. Exterior of labor, he enjoys touring and cooking.

Raza Hafeez is a Senior Product Supervisor at Amazon Redshift. He has over 13 years {of professional} expertise constructing and optimizing enterprise knowledge warehouses and is captivated with enabling clients to understand the facility of their knowledge. He focuses on migrating enterprise knowledge warehouses to AWS Fashionable Information Structure.

Amit Ghodke is an Analytics Specialist Options Architect based mostly out of Austin. He has labored with databases, knowledge warehouses and analytical purposes for the previous 16 years. He loves to assist clients implement analytical options at scale to derive most enterprise worth.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles