Introducing the HubSpot connector for AWS Glue

Most corporations have adopted a various set of software program as a service (SaaS) platforms to assist numerous purposes. The speedy adoption has enabled them to shortly streamline operations, improve collaboration, and achieve extra accessible, scalable options for managing their crucial information and workflows.

Extra corporations have realized there is a chance to combine, improve, and current this SaaS information to enhance inner operations and achieve helpful insights on their information. Utilizing AWS Glue, a serverless information integration service, corporations can streamline this course of, integrating information from inner and exterior sources right into a centralized AWS information lake. From there, they will carry out significant analytics, achieve helpful insights, and optionally push enriched information again to exterior SaaS platforms.

This put up introduces the new HubSpot managed connector for AWS Glue, and demonstrates how one can combine HubSpot information into your current information lake on AWS. By consolidating HubSpot information with information out of your AWS accounts and from different SaaS companies, you’ll be able to improve, analyze, and optionally write the information again to HubSpot, making a seamless and built-in information expertise.

Answer overview

On this instance, we use AWS Glue to extract, remodel, and cargo (ETL) information out of your HubSpot account right into a transactional information lake on Amazon Easy Storage Service (Amazon S3), utilizing Apache Iceberg format. We register the schema within the AWS Glue Knowledge Catalog to make your information discoverable. Subsequently, we use Amazon Athena to validate that the HubSpot information has been efficiently loaded to Amazon S3. The next diagram illustrates the answer structure.

The next are key parts and steps within the integration:

Configure your HubSpot account and app to allow entry to your HubSpot information.
Put together for information motion by securely storing your HubSpot OAuth credentials in AWS Secrets and techniques Supervisor, creating an S3 bucket to retailer your ingested information, and creating an AWS Identification and Entry Administration (IAM) function for AWS Glue.
Create an AWS Glue job to extract and cargo information from HubSpot to Amazon S3. AWS Glue establishes a safe connection to HubSpot utilizing OAuth for authorization and TLS for information encryption in transit. AWS Glue additionally helps the flexibility to use advanced information transformations, enabling environment friendly information integration and preparation to fulfill your wants.
Schema and different metadata can be registered within the AWS Glue Knowledge Catalog, a centralized metadata repository for all of your information belongings. This helps simplify schema administration, and likewise makes the information discoverable by different companies.
Run the AWS Glue job to extract information from HubSpot and write it to Amazon S3 utilizing Iceberg format. Apache Iceberg is an open supply, high-performance open desk format designed for large-scale analytics, offering transactional consistency and seamless schema evolution. Though we use Iceberg on this instance, AWS Glue presents strong assist for numerous information codecs, together with different transactional codecs akin to Apache Hudi and Delta Lake.
The information loaded to Amazon S3 can be organized into partitioned folders to optimize for question efficiency and administration. Amazon S3 can even retailer the AWS Glue scripts, logs, and different short-term information required in the course of the ETL course of.
Lastly, Amazon Athena can be used to question the information loaded from HubSpot to Amazon S3, validating that each one adjustments within the supply system have been captured efficiently.
Optionally, HubSpot can recurrently synchronize HubSpot information to Amazon S3 and analyze information updates over time.

Arrange your HubSpot account

This instance requires you to create a HubSpot public app for AWS Glue in a HubSpot Developer account, and join it to an related HubSpot account. A HubSpot public app is a sort of integration that may be put in in your HubSpot accounts or listed within the HubSpot Market. On this instance, you create a HubSpot app for the AWS Glue integration, and set up it in a brand new check account. Though HubSpot calls it a public app, it won’t be listed of their Market and can solely have entry to your check account.

In case you don’t have already got one, join a free HubSpot developer account.
Log in to your HubSpot developer account, the place you’ll see choices to create apps and check accounts.
Select Create a check account and observe the directions.

HubSpot check accounts have Enterprise variations of the HubSpot Advertising, Gross sales, and Service Hubs together with pattern information, so you’ll be able to check most HubSpot instruments, create CRM information, and entry it by APIs with Glue. For extra details about making a check account, consult with Create a developer check account.

Create a HubSpot app

Full the next steps to create a HubSpot app:

Swap again to your HubSpot developer account, and select Create an app.
Fill within the App Data part with the identify AWS Glue and a quick description.
Select the Auth tab.
For Redirect URLs, enter the redirect URL for AWS Glue within the type: https://<area>.console.aws.amazon.com/gluestudio/oauth.

You should definitely exchange <area> together with your AWS Glue working AWS Area. As an example, the code for the US East (N. Virginia) Area is us-east-1, so the AWS Glue redirect URL is https://us-east-1.console.aws.amazon.com/gluestudio/oauth.

Within the Scopes part, select Add new scope and choose the next permissions:
- automation
- content material
- crm.lists.learn
- crm.lists.write
- crm.objects.corporations.learn
- crm.objects.corporations.write
- crm.objects.contacts.learn
- crm.objects.contacts.write
- crm.objects.customized.learn
- crm.objects.customized.write
- crm.objects.offers.learn
- crm.objects.offers.write
- crm.objects.homeowners.learn
- crm.schemas.customized.learn
- e-commerce
- types
- oauth
- sales-email-read
- tickets
Overview the Scopes and Redirect URL settings, then select Create app.
Navigate again to your app Auth tab.
Be aware of the values for Shopper ID, Shopper secret, and Set up URL (OAuth). You will want these later to attach your AWS Glue occasion.

Choose or create an Amazon S3 bucket the place your HubSpot information will reside

Choose an current Amazon S3 bucket in your account, or create a brand new bucket to retailer your HubSpot information, in addition to scripts, logs, and so forth. For this instance, the bucket identify will observe the format aws-glue-hubspot-<account>-<area>, the place <account> is the AWS account quantity and <area> is the working Area. The account can be configured with all defaults: public entry disabled, versioning disabled, and server-side encryption with Amazon S3 managed keys (SSE-S3).

In case you use AWSGlueServiceRole in your IAM function as proven on this instance, it can present entry to S3 buckets with names beginning with aws-glue-.

Create an IAM function for AWS Glue

Create an IAM function with permissions for the AWS Glue job. AWS Glue will assume this function when calling different companies in your behalf.

On the IAM console, select Roles within the navigation pane.
Select Create function.
For Trusted entity sort¸ select AWS service.
For Use case, select Glue.
Add the next AWS managed insurance policies to the function:
1. AWSGlueServiceRole for accessing associated companies akin to Amazon S3, Amazon Elastic Compute Cloud, Amazon CloudWatch, and IAM. This coverage allows entry to S3 buckets with names beginning with aws-glue-.
2. SecretsManagerReadWrite for learn/write entry to AWS Secrets and techniques Supervisor.
Give the function a reputation, for example AWSGlueServiceRole_blog.

For extra info, see Getting began with AWS Glue and Create an IAM function for AWS Glue.

Create a AWS Secrets and techniques Supervisor secret

AWS Secrets and techniques Supervisor is used to securely retailer your HubSpot OAuth credentials. Full the next steps to create a secret:

On the AWS Secrets and techniques Supervisor console, select Secrets and techniques within the navigation pane.
Select Retailer a brand new secret.
For Secret sort, choose Different sort of secret.
Underneath Kay/worth pairs, enter the HubSpot consumer secret with the important thing USER_MANAGED_CLIENT_APPLICATION_CLIENT_SECRET.
Select Subsequent.

Enter the key identify, akin to HubSpot-Weblog, an outline, and proceed.
Go away the key rotation as default, and select Subsequent.
Overview the key configuration, and select Retailer.

Create an AWS Glue connection

Full the next steps to create an AWS Glue connection to your HubSpot account:

On the AWS Glue console, select Knowledge connections within the navigation pane.
Select Create connection.
For Knowledge sources, seek for and choose HubSpot.
Select Subsequent.

On the Configure connection web page, fill within the required info:
1. For IAM service function, select the service function created beforehand. On this instance, we use the function AWSGlueServiceRole_blog.
2. For Authentication URL, go away as default.
3. For Person Managed Shopper Utility ClientId, enter the OAuth consumer ID from HubSpot.
4. For AWS Secret, select the OAuth consumer secret identify configured beforehand in AWS Secrets and techniques Supervisor.
5. Select Subsequent.

Select Take a look at Connection to validate the connection to HubSpot.
This can deliver up a brand new HubSpot connection window. You should definitely choose your HubSpot check account (not your developer account) to check the connection.
If that is your first connection try, you can be redirected to a different web page the place you’re requested to verify the entry degree granted to AWS Glue. Select Join App.

If profitable, the HubSpot window will shut and your AWS connection window will say Connection check profitable.

Underneath Set properties, for Title, enter a reputation (for instance, HubSpot_Connection_blog).
Select Subsequent.
Underneath Overview and create, assessment your settings after which create the connection.

Create a database in AWS Glue Knowledge Catalog

Full the next steps to create a database in AWS Glue Knowledge Catalog to prepare your HubSpot information:

On the AWS Glue console, select Databases within the navigation pane.
Create a brand new database.
Enter a reputation (for instance, hubspot).
You possibly can go away the placement discipline clean.
Select Create database.

Create an AWS Glue ETL job

Now that you’ve an AWS Glue information connection to your HubSpot account, you’ll be able to create an AWS Glue ETL job to ingest HubSpot information into your AWS information lake. AWS Glue supplies each visible and code-based interfaces to simplify information integration, relying in your experience. On this instance, we use the Script interface to ingest HubSpot information into the Amazon S3 location. Full the next steps:

On the AWS Glue console, select ETL jobs within the navigation pane.
Select the Script editor.
Select Spark because the engine, and add the next script.

The AWS Glue Spark job reads the HubSpot information and merges it into the S3 bucket in Iceberg format.

On the Job particulars tab, present the next info:
For Title, enter a reputation, akin to HubSpot_to_S3_blog.
For Description, enter a significant description of the job.
For IAM Position, select the IAM function you created beforehand (for this put up, AWSGlueServiceRole_blog).

Broaden Superior properties.
Underneath Connections, enter your HubSpot connection from the earlier part (for this put up, HubSpot_Connection_blog).

Underneath Job parameters, enter the next parameters:

- For --conf, enter spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.glue_catalog.warehouse=file:///tmp/spark-warehouse
- For --datalake-formats, enter iceberg
- For --db_name, enter the AWS Glue database to retailer your information lake (for this put up, hubspot)
- For --table_name, enter the HubSpot desk to be ingested (for this put up, firm)
- For --s3_bucket_name, enter the place the ingested Iceberg desk is saved, on this case aws-glue-hubspot-<account>-<area>
- For --connection_name, enter the AWS Glue connection identify created, on this case HubSpot_Connection_blog

Select Save to avoid wasting the job, then select Run.

Relying on the quantity of information in your HubSpot account, the job can take a couple of minutes to finish. After a profitable job run, you’ll be able to select Run particulars to see the job specs and logs.

Use Athena to question information

Athena is an interactive and serverless question service that makes it simple to investigate information instantly in Amazon S3 utilizing normal SQL. On this instance, we question the outcomes of the HubSpot information ingested into Amazon S3.

On the Athena console, select Question editor.
For Database, select hubspot, and you need to see your firm desk.
Choose entries from the hubspot.firm desk to view the information captured from hubspot.

You possibly can strive numerous queries on the HubSpot information, akin to:

-- get pattern of dataset
SELECT * FROM "hubspot"."firm" restrict 10;

-- get corporations income
SELECT * FROM "hubspot"."firm" A
WHERE A.annualrevenue IS NOT NULL;

-- get variety of corporations with income
SELECT COUNT(*) AS companies_count FROM "hubspot"."firm" A
WHERE A.annualrevenue IS NOT NULL;

Over time, your HubSpot information might change. You possibly can rerun your ETL job periodically, and the Iceberg information lake desk will successfully seize your adjustments. You possibly can confirm by including, eradicating, and altering corporations in your HubSpot database, after which rerun the ETL job. Your information lake ought to match your newest HubSpot information. With this functionality, you’ll be able to schedule the ETL job to run as typically as you want.

Extending the HubSpot connector with AWS companies

The HubSpot connector for AWS Glue supplies a robust basis for constructing complete information pipelines and analytics workflows. By integrating HubSpot information into your AWS setting, you should use extra companies like Amazon Redshift, Amazon QuickSight, and Amazon SageMaker to additional course of, remodel, and analyze the information. This lets you assemble subtle, end-to-end information architectures that unlock the total worth of your HubSpot information, with out the necessity to handle advanced infrastructure. The seamless integration between these AWS companies makes it simple to construct scalable analytics pipelines tailor-made to your particular necessities.

Concerns

You possibly can arrange AWS Glue job triggers to run the ETL jobs on a schedule, in order that the information is recurrently synchronized between HubSpot and Amazon S3. It’s also possible to combine the ETL jobs with different AWS companies, together with AWS Step Capabilities, Amazon MWAA (Amazon Managed Workflows for Apache Airflow), AWS Lambda, Amazon EventBridge , and Amazon Bedrock to create a extra superior information processing pipeline.

By default, the HubSpot connector doesn’t import deleted information. Nonetheless, you’ll be able to set the IMPORT_DELETED_RECORDS choice to true to import all information, together with the deleted ones.

Clear up

To keep away from incurring prices, clear up the assets used on this put up out of your AWS account, together with the AWS Glue jobs, HubSpot connection, AWS Secrets and techniques Supervisor secret, IAM function, and Amazon S3 bucket.

Conclusion

With the introduction of the AWS Glue connector for HubSpot, integrating HubSpot information with info from different information sources has turn into extra streamlined than ever. This characteristic allows you to arrange ongoing information integration from HubSpot to AWS, offering a unified view of information from throughout platforms and enabling extra complete analytics. The serverless nature of AWS Glue means there is no such thing as a infrastructure administration required, and also you solely pay for the assets consumed. By following the steps outlined on this put up, you’ll be able to ensure that up-to-date information from HubSpot is captured within the your information lake, permitting groups to make sooner data-driven selections and uncover advanced insights from throughout information sources.

To study extra in regards to the AWS Glue connector for HubSpot, consult with Connecting to HubSpot in AWS Glue. This information walks by your complete course of, from establishing the connection to working the information switch move. For extra info on AWS Glue, go to AWS Glue.

Concerning the Authors

Eric Bomarsi is a Senior Options Architect within the ISV group at AWS, the place he focuses on constructing scalable options for big prospects. As a member of the AWS analytics group, he helps prospects get strategic insights from their information. Outdoors of labor, he enjoys enjoying ice hockey and touring along with his household.

Annie Nelson is a Senior Options Architect at AWS. She is a knowledge fanatic who enjoys downside fixing and tackling advanced architectural challenges with prospects.

Kartikay Khator is a Options Architect inside International Life Sciences at AWS, the place he dedicates his efforts to creating progressive and scalable options that cater to the evolving wants of consumers. His experience lies in harnessing the capabilities of AWS analytics companies. Extending past his skilled pursuits, he finds pleasure and achievement on the planet of working and climbing. Having already accomplished a number of marathons, he’s at the moment getting ready for his subsequent marathon problem.

Kamen Sharlandjiev is a Sr. Large Knowledge and ETL Options Architect, Amazon MWAA and AWS Glue ETL professional. He’s on a mission to make life simpler for patrons who’re going through advanced information integration and orchestration challenges. His secret weapon? Totally managed AWS companies that may get the job executed with minimal effort. Comply with Kamen on LinkedIn to maintain updated with the newest Amazon MWAA and AWS Glue options and information!