While learning Golang and working on some first experiments, I started a little side-project: a tool to download HTTP headers of most popular webpages. It would store them in a database, and do some analysis work. The list of domains was found on the following gist.
I simply assumed that each address should work with https:// schema as a prefix. If the “www.” prefix is needed, DNS or HTTPS redirects would take care of it. This was true for a majority of sites (although had to ignore various certificate issues), any aborted connections were simply ignored.
Of course, a webpage may generate different HTTP headers (or their values) depending on the context: Am I already logged in? Am I being redirected to another page? Am I using API? Am I downloading some resource? Even: am I sending the right headers in request? For now, all that was recorded was the basic response to HEAD request. Nothing fancy. No AI or machine learning.
At first, my main goal was to identify interesting non-standard headers, but this is a bit trickier than just looking for headers with the “X-” prefix. In fact, this wouldn’t work at all as explained in RFC 6648:
Historically, designers and implementers of application protocols have often distinguished between standardized and unstandardized parameters by prefixing the names of unstandardized parameters with the string “X-” or similar constructs. In practice, that convention causes more problems than it solves. Therefore, this document deprecates the convention for newly defined parameters with textual (as opposed to numerical) names in application protocols.
So, if “X-” isn’t a good indicator, then what is? How to tell if a header is non-standard? MDN points at IANA registry. The list is just full of exotic inventions. Ever heard of Hobareg authentication header? If-Schedule-Tag-Match conditional header? Default-style?
Neither did I. They aren’t necessarily supported by modern browsers either, but the mere fact that they are produced may leak some interesting information. Plus, they are somehow standardized, so incorrect configuration is also an important metric. E.g. the Cf-Ray header is non-standard, but it was observed in over 10% of pages. It’s there if Cloudflare is involved. Any headers processing tool should know how to make use of that information and possibly return some findings.
Without any distinction between standard and non-standard headers… these are the most popular HTTP headers observed:
Some bad names
Very likely, some of the observed headers were simply invalid. Several were already obsolete, they shouldn’t be used anymore and most browsers will just ignore them. Sometimes, obsolete headers were sent in addition to valid ones - perhaps to guarantee that old clients1 are still supported. Some examples:
- note typo, Secure instead of Security
On the list, I found also minor typos that will render header useless:
Some bad values
Even if header names were correct, the values were sometimes a bit off. Certain headers are well-defined and can use only a few specific values. A good example is X-Frame-Options (XFO) header that offers simple protection against so-called clickjacking. It is recommended to use a proper configuration of this header along with a much more powerful CSP’s frame-ancestors directive. Even though XFO originally tried to address the same functionality via ALLOW-FROM parameter, it was never widely deployed and is now (the parameter) obsolete.
If used with ALLOW-FROM, the header will be simply ignored by any browser and treated as if no XFO was defined. This may open up chances to clickjacking2, hence such configuration should never be used. Yet… I observed three interesting variants:
X-Frame-Options: ALLOW-FROM <some-url>
- someone should look up CSP
- non-standard, not mentioned in RFC 7034, but because it will turn off XFO it somehow manages to do exactly that: allows all 🤪
- empty, not expected by anyone
Another on the list is X-XSS-Protection header. Since XSS auditor support was dropped from Chrome and Edge, the typo in its value doesn’t matter, but I still want to place it here: because it did matter in the past. The directives within value should be separated using semicolons, however, simple comma was found in value returned by one of pages:
- note comma
Access-Control-* headers are meant to inform a browser how a given API can be used. One of them - *Access-Control-Allow-Methods* - can be used to specify which HTTP verbs are OK to access the endpoint. One page misused it and returned header name in reply:
More than one page decided to use a non-standard way of informing what origins are accepted. This mixed host and wildcard syntax, but also unsupported escaped patterns:
- mix of wildcards and… not sure what
- hostname cannot be mixed with wildcard
Access-Control-Allow-Origin: *first*, *second*, *third
- only a single origin can be specified
The null origin is also on the list. This makes API accessible from sandboxed iframes or data: scheme that have null as their origin:
Several pages use the “preload” directive in their Strict-Transport-Security header. To be properly preloaded, the header needs to follow few more restrictions, though. Besides that - page needs to be sent to the proper registry and pass validation, but this also requires correct header configuration.
The requirements for preloading are described on hstspreload.org. HSTS must specify includeSubDomains directive. 2% of sites did not meet this requirement while having preload directive in header. Couple websites also didn’t meet the requirement of max-age of over 1 year. Two sites had this value set to 0.
Some fresh stuff
Only two pages returned Permissions-Policy header that provides a mechanism to allow and deny the use of browser features. A couple more (8) used the older name of this header - Feature-Policy. Should they change it right away? Not necesarily, since according to Can I use permission policy isn’t recognized by default by any browser.
Cross-Origin-Embedder-Policy wasn’t used anywhere, however, a single site used Cross-Origin-Embedder-Policy-Report-Only for testing. That’s good!
Cross-Origin-Opener had more love and was used by four sites.
The __Host- prefix cookies aren’t that useful apparently - only a single instance was observed, but at least it was properly configured.
It’s still better than __Secure- prefix - no website used them.
This post was meant to summarize some of the quirks spotted in HTTP headers from most popular websites. If they have it wrong, so will others. At least I have lots of ideas to implement in my little headers analyzer project.