We were setting up ContentFul as our CMS. With any change in content, the task is to clear CDN cache and call some GCP internal services to reflect the changes. We chose GCP Cloud Function to accomplish these tasks. The plan was to trigger Contentful’s webhook to call all the relevant cloud functions for any change. The stumbling block was cloud function authentication. All of these cloud functions are secure HTTP cloud functions and Google’s recommended way of authetication in this case is to use OAuth 2.0, which is not possible at the moment with Contentful’s webhook implementation. What is the way around it?
Google HTTP(S) load balancer to the rescue. As per this blog post from Google, Google HTTP(S) Load Balancer is supporting serverless compute load balancing since mid of 2020. With an HTTP Load Balancer in place, we can deploy all the Cloud Functions as unauthenticated internal cloud functions, which are load-balanced and has a single endpoint. We will use Cloud Armor to only allow traffic from Contentful’s IP range to the load balancer’s public IP, and for authentication, we will build in basic auth into our cloud functions. Let’s visualize it from the infrastructure point of view:
This solution has many advantages. It will give you a way to secure your cloud functions outside the cloud function and edge security context, a.k.a, safety from external threats like DDoS. You may also consider it as an added layer of security and an alternative to cloud function’s recommendation when the caller does not have an OAuth client like this very case.
Deploy your cloud function with the below options:
Set Allow unauthenticated invocations
under Trigger
in the UI. If you are deploying this cloud function using gcloud
the equivalent flag would be --allow-unauthenticated
Set Allow internal traffic and traffic from cloud Load Balancing
under VARIABLES, NETWORKS AND ADVANCED SETTINGS -> CONNECTIONS
. The corresponding gcloud functions
flag is --ingress-settings=internal-and-gclb
If you want your Cloud Function to talk to some service hosted in your private network, set up a VPC Connector
under Egress Settings
. The corresponding gcloud functions
option is (no prize for guessing this) --egress-settings
At this stage, the Cloud Function is insecure but not accessible outside your network.
Create an HTTP(S) Load Balancer (Internet facing) by going to Network Services -> Load Balancing
.
In the Backend configuration
, under the new HTTP(S) load balancer, create a backend service of type Serverless network endpoint group
(#1 in the picture). Also, please create a new Network Endpoint Group a.k.a NEG and point it to the Cloud Function built above (#2 & #3 in the picture below).
Host and path rules
.At this point, the Cloud Function(s) are open to the whole world. To allow traffic only from Contentful’s IP range, go to Network Security -> Cloud Armor
and set up a new security policy.
Under Configure policy
, set a default policy that denies everything (#1 in the image below).
Under Add more rules
, set the Action
as Allow
and add the IPs mentioned in this Contentful doc in the Match
section (#2 in the image below).
Under Apply policy to target
, from the Target
dropdown, select the backend(s) created while setting up the load balancer and hit save.
This will allow traffic only from the IP ranges mentioned above, effectively blocking the endpoint to the rest of the world.
If you are not an enterprise customer of Contentful, deciding the source IP range is tricky. Contentful documentation says, in such case, Contentful does not guarantee the source IP, and it hosts its webhook infrastructure in AWS us-east-1 zone.
Here is how I figured this out:
This document shows IP ranges for all AWS services across all regions. Save this file as IP-ranges.json
. Using jq
, you can figure out all the IP ranges in use by AWS. Here is how I solved it:
$ jq -r '.prefixes[] | select(.region=="us-east-1") | .ip_prefix' < ip-ranges.json | wc -l
454
Let’s figure out the unique services that are hosted on us-east-1
:
$ jq -r '.prefixes[] | select(.region=="us-east-1") | .service' < ip-ranges.json | grep -o -E '\w+' | sort -u -f
AMAZON
AMAZON_APPFLOW
AMAZON_CONNECT
API_GATEWAY
CHIME_VOICECONNECTOR
CLOUD9
CLOUDFRONT
CODEBUILD
DYNAMODB
EC2
EC2_INSTANCE_CONNECT
GLOBALACCELERATOR
ROUTE53_HEALTHCHECKS
ROUTE53_HEALTHCHECKS_PUBLISHING
S3
WORKSPACES_GATEWAYS
From the above list, let’s single out the services which may host Contentful’s webhook. Here is my list:
Services to be considered for CF:
AMAZON ---------------------------- No
AMAZON_APPFLOW -------------------- No
AMAZON_CONNECT -------------------- No
API_GATEWAY ----------------------- Yes
CHIME_VOICECONNECTOR -------------- No
CLOUD9 ---------------------------- No
CLOUDFRONT ------------------------ Yes
CODEBUILD ------------------------- No
DYNAMODB -------------------------- No
EC2 ------------------------------- Yes
EC2_INSTANCE_CONNECT -------------- No
GLOBALACCELERATOR ----------------- No
ROUTE53_HEALTHCHECKS -------------- No
ROUTE53_HEALTHCHECKS_PUBLISHING --- No
S3 -------------------------------- Yes
WORKSPACES_GATEWAYS --------------- Yes
Using the above list, let’s dump all the IP ranges assicated with these services in a file:
jq -r '.prefixes[] | select(.region=="us-east-1") | select(.service=="API_GATEWAY" or .service=="CLOUDFRONT" or .service=="EC2" or .service=="S3" or .service=="WORKSPACES_GATEWAYS") | .ip_prefix' < ip-ranges.json > ranges
This gives you 108
IP ranges. We need to split it into lists of 10 IP ranges, given the limitation in the Match
field under Add more rules
.
$ split -l 10 ranges
$ ls
ranges xaa xab xac xad xae xaf xag xah xai xaj xak
The command below will give you a single file named final
with the 108 IP ranges divided into 11 lines, each containing upto 10 IP ranges:
$ for f in ./*;
do
paste -s -d ',' ""${f}"" >> final
done
I’m sure there are better and shorter way to split a long string with the above criteria in mind. Do let me know.
Here are the alternatives for deploying a similar secure deployment, though both will enable API key based authentication and filtering traffic based on source IP is not an option: