An Innocuous Change
Earlier this week, we shipped a change that added a :
into one of our S3 URL paths. A URL-encoded colon is a perfectly valid character sequence to exist in a URL path. Our customers don't even parse the URL at all. They just GET the URL to download a video from an S3 bucket, and these new URLs are perfectly valid.
There's no way this could be a breaking change. Right?
Once we deployed this change, we immediately started getting reports of failed video downloads. We dug into this immediately, and discovered that this only happened for a small subset of our customers, but it happened for 100% of their requests. No other customers were affected at all.
How could this be?
From debugging with the affected customers, we realized they were all using the same aiohttp
http client.
It turns out that aiohttp
depends on yarl
for URL parsing.
We discovered yarl
attempts to normalize URLs as part of its parsing, by default.
Specifically it url-decodes "safe" characters in the URL path before making the request.
This means that the request made to S3 now has a different path than the one that was signed, so S3 sees a different signature and concludes the request is invalid, rejecting it with a 403.
It turns out we are far from the first to be surprised by this behaviour. There are several issues that have reported this.
So we concluded this wasn't an issue in our code -- it was actually caused by an edge-case behavior in a library which a small number of our customers used. We had never encountered this before because our URLs had never contained a safe URL-encoded character before this change.
So what is the fix?
We attempted to see if we could work-around this behaviour in yarl
by crafting a URL that would decode to the correct string after yarl
parsed it.
However, we could not find a way to do this, and considering only a small subnet of customers were affected we decided to notifiy and ask them to disable the yarl
normalization behaviour.
Fortunately, this is a simple change, requiring only a single line of code:
import yarl
yarl.URL(url, encoded=True)
This will prevent yarl
from url-decoding the path before making the request.
Hyrum's Law states:
With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.
--Hyrum Wright, 2012
Earlier this week, Hyrum was proved right again.
We are hiring!
Does this sound like a fun problem to solve?
We are hiring for a number of roles, including Backend Engineer #4 to join our close-knit team.