Please don’t use externally hosted JavaScript libraries

2014-06-30 security 3 mins 16 comments

A few days ago I outlined that the Reuters website relies on 40 external parties with its security. What particularly struck me was the use of external code hosting services, e.g. loading the jQuery library directly from the jQuery website and GSAP library from cdnjs. It seems that in this particular case Reuters isn’t the one to blame — they don’t seem to include these scripts directly, it’s rather some of the other scripts they are using that are doing this.

Why would one use externally hosted libraries as opposed to just uploading them to your own server? I can imagine three possible reasons:

Simplicity: No need to upload the script, you simply add a <script> tag to your page and forget about it. Not that uploading the scripts to your webspace is an effort worth mentioning, especially compared to the effort actually developing the website.
Performance: External code hosting typically uses a content distribution network with the claim to provide superior performance by being closer to the clients. However, whether there is really a performance advantage when using persistent connections (that’s the typical scenario these days) is doubtful to say the least.
Scalability: your server no longer needs to serve the static files and has more reserves for something else. No longer much of an argument given modern websites, serving static files typically produces negligible CPU load compared to the page using these files.

On the other side there are clear disadvantages:

Lack of control: Decisions like whether the particular version of the JavaScript library you are relying on is still hosted are no longer yours to make. Worst-case scenario is that it is removed from CDN and you only notice because your website is broken.
Stability: With externally hosted code your website won’t just be down when your webserver is down, it will also be down whenever that external code hosting is down. And there will always be users who can reach your website but not your code hosting for some reasons — these will be able to visit your website but it will appear broken.
Privacy: You are hopefully respecting the privacy of your users when they visit your website and don’t collect unnecessary data. But what about the code hosting service and their content delivery network? You now have two more parties that can collect data of your site visitors, do you have any idea what they are doing with that data?
Security: You are betting the security of your website on the security of the external code hosting and their content delivery network. If either of them is compromised, they can do anything to your website. As mentioned in the previous blog post, you could suddenly discover propaganda messages on your website, or it could start distributing malware to your visitors.

It’s especially the last two points that make externally hosted JavaScript libraries a risk not worth taking in my opinion. The advantages are really minor and definitely not worth giving external parties control over your website.

Comments

Felix 2014-06-30 08:33

+1! I would add that on the modern Internet latency is a much bigger problem than bandwidth, especially when transferring the <100K of jQuery. Making a whole new TCP connection simply takes longer. So by loading JS libraries from a 3rd-party server, your site just became slower… the opposite of what was supposed to happen.
Alarmist 2014-06-30 09:01

On this topic … I recently had the idea that a Firefox extension could provide at least J-Query and J-Query UI in its various versions from a local source. Shouldn’t this be pretty simple and avoid lots of CDN-based tracking? (Yes, the extension might be few megabytes large in the end, because there are so many possible J-Query versions… I wouldn’t care. Actually, I would love to get around Google Fonts this way too, even though that would result in a one-time several hundred megabyte download.)

Wladimir Palant

Yes, this kind of thing is simpler to implement with Chrome APIs but a Firefox extension should be possible as well. Of course, it wouldn’t be done with jQuery, e.g. Google Hosted Libraries provides eleven different libraries in dozens of versions – and that’s only “the most popular libraries” according to cdnjs which hosts more than 800 libraries.
Giorgio Maone 2014-06-30 09:48

@Alarmist: you might be interested in this “local replacement of known JavaScript libraries” discussion/solution: http://forums.informaction.com/viewtopic.php?p=69882#p69882
pd 2014-06-30 10:07

Web developers are frequently hammered with some tip on how to do the next best trendy thing. Central hosting was yet another one of these tips that I took with a grain of salt for many months. Eventually I realized its about caching, not just distributed hosting enabling faster download. I could be wrong but if a browser sees jslibrary.js being requested on egdomain.com, and some other site a user has already visited already retrieved that file from the same domain, the browser will use the cached version.

So any library file that is near ubiquitous, like jQuery, will often be loaded from super fast local cache if two sites both load exactly the same domain and filename.

The issue here is really that browser developers are ignorant of the caching ball aches that developers face. Ubiquitous libraries should be periodically downloaded by the browser, in the background, like any other update, to a central cache all developers can refer to. It would save a significant amount of traffic on the net and speed it up as well, especially on mobiles.

As a developer, why shouldn’t I be able to confidently load ubiquitous libraries from browser cache every time?

Whilst browser developers continue to innovate or iterate their JS engines and endless meetings are held on standardization, … in the real world, the only standards or engines that matter in the present tense are those ‘set’ by compatibility libraries. Until we live in a magical world where no compatibility libraries are ever required, browser developers should take the responsibility for ‘hosting’ common libraries away from the CDNs and networks.

Wladimir Palant

Yes, so much for the theory. I also noticed that jQuery advertises their CDN solution with the promise of having the library cached. And jQuery is certainly ubiquitous. Problem is: there isn’t just one version of jQuery. Each website uses a different jQuery version and they won’t use the “latest and greatest” because that means redoing the compatibility testing (don’t forget plugin compatibility). For example, if you look only at the Reuters website, it loads both jQuery 1.6.1 and 1.10.1 (both outdated). Stack Overflow and Mozilla Add-ons use jQuery 1.7.1 (via Google and Verizon CDNs respectively). Looking at this, what are the chances that you visit two different websites both using the same version of jQuery from the same CDN within the cache expiration interval?
memyselfandi 2014-06-30 12:50

You’re forgetting the biggest benefit of “shared” hosting – local cache/storage. If my browser fetches lib A from cdn.com and both x.com and y.com reference it I won’t waste my bandwidth twice for the same bits, not to mention the speed of local vs network.
About security – very few sites require higher level than thousands of sites with thousands of eyes …

Wladimir Palant

I’m not forgetting it, I just didn’t think it was worth noting in addition to what I’ve already written about performance. See also my reply to the comment above yours. Also, I don’t think I can understand your point on security.
pd 2014-06-30 13:31

I don’t know that the versioning would be a problem if the browsers ripped down ubiquitous libraries like any other update. If anything, the versioning problem is more evidence of why browsers should store ubiquitous libraries on a user’s machine for developers to access. I doubt that many users are going to complain about browsers using an extra 10 MB or so of storage space on their drives when the Firefox cache is already huge (half a gig?).

If Mozilla is serious about the open web on mobiles, surely this move is imperative? Why should developers trying to support the open web on mobiles have slower access to all the JavaScript-built functionality than developers of native apps have access to functionality immediately available in the phone’s software stack?

I can just imagine developer A, who is trying to convince some manager to support open web apps on mobile phones instead of native apps, trying to explain that the open web app will run slower because it has to wait for a bunch of libraries to download before it active the UI. Meanwhile developer B says “well, a native app initializes faster and we can get marketing cred through the app stores”. Which developer do you think the manager is going to go with? It’s already hard enough to be an open web advocate. Sure it’s got easier with HTML5 but then alone came ‘smart’ pones and ‘apps” and it got hard again. Don’t make it any harder for open web advocates to promote the open web! Make compatibility libraries local/native in the browser!

Wladimir Palant

I was mostly replying to your claims that CDNs in their current state help with caching. Of course the browsers could pull some tricks to improve the caching here: by merging all the various download URLs for the libraries into one, by increasing the expiration intervals, by downloading the libraries from a trusted source etc. Also, which libraries? It’s most likely more about various jQuery plugins than about jQuery itself. That’s a non-trivial effort on the side of the browser vendors, did anybody ever write a proper proposal on that?
Paul 2014-06-30 15:16

I’ve felt this way since I first saw this idea 7 years ago. Back then libraries were much smaller, and the possibility of getting client calls at 2:00 AM because the CDN was doing scheduled maintenance kept me from drinking the caching kool-aid. Not to mention this is only initial page-load, after that the library is cached from your domain.

It’s also worth mentioning here using mobile cache as a reason to link out to libraries is fear mongering. Most mobile browsers open with a cleared cache (when you open the browser it re-loads the page you had opened last and it is not instantaneous). There’s no substitute for mobile-first architecture. If the framework is too heavy to load on mobile, look for alternatives, don’t rely on cache.
Colin Dellow 2014-06-30 15:59

The privacy concern is real, but the real-world scope of it should be mitigated by the CDN serving a 1-year expiry. In theory, the CDN will miss some large percentage (> 99%, I’d think) of opportunities to track the user due to it being in the cache. YMMV based on cache sizes, version mismatches, etc.

Wladimir Palant

True, this is something I didn’t consider. I verified that cdnjs sets a 1 year expiration interval, code.jquery.com even 10 years. So while in theory a CDN might decide to show a lower expiration interval to a particular user and track his way through the web, it normally won’t do so, at least not on a large scale.
Pete Duncanson 2014-06-30 16:40
I have to disagree with just about all your points :(

CDN’s have their place and for the right content (common libraries used a lot by other sites) you should use them. Caching as mentioned is the biggest gain and reason alone for using it. Few sites are using the latest versions of various libraries, but they are using the one that works for their site and their code and you know what…that is exactly the right one to use for their needs. In short CDN’s potentially save a 100K download (and the TCP connection and DNS lookup time as they are common domain names, google et al).

I’ve a post saying just about the opposite of this one from a few years back https://offroadcode.com/blog/1501/content-delivery-networks-are-your-friend-and-here-is-why/

Regarding security which is a new one, I mainly focus on using common libraries (lets just say jQuery and be done with it) hosted on google’s CDN, built for the job. If something happens there then most of the web for most people is in trouble, I believe that is what memyselfandi above was on about. If you use Google at all then you are already giving them plenty more info then they can get from a CDN download.
Wladimir Palant
As outlined in reply to the comments above, a caching hit with a CDN isn’t terribly more likely than a caching hit for a library you host yourself. That’s because there is a number of different CDNs and lots of different library versions with each website relying on another one. And cache sizes aren’t unlimited, even with a lengthy expiration interval some files are bound to be evicted from cache. And even when the library is cached, it only saves you the TCP connection and DNS lookup you normally wouldn’t have had in the first place.

Your points in the blog post are:

Bandwidth: From what I can tell, most traffic tends to be generated by images and custom code, not standard libraries. So I would consider the scenario where outsourcing downloads of standard libraries makes a cost difference an edge case.

Faster downloads due to different domain names: fortunately, this trick is increasingly losing its importance. All browsers increased their connection limits by now, and this technique is even considered counter-productive with SPDY.

CDN uptime: Even if CDN uptime is really 99.999%, that’s 0.001% of additional downtime for your website.

Less data is being sent over the wire due to less headers/better compression: As long as all of the request still fits into one Ethernet frame (typical scenario, even with many cookie headers), there should no performance difference by making that frame smaller. Requiring one more TCP (and SSL?) handshake in order to connect to the CDN outweighs this advantage by far. As to better compression, the compression of jQuery 2.1.0 minified on code.jquery.org is actually very bad for some reason (34151 bytes from CDN whereas regular gzip command produces 29344 bytes). While cdnjs doesn’t fail quite as badly, it has a 29726 bytes download which is still too large. Only Google CDN manages to do better than my naive approach, it has a download that is 43 bytes smaller. Yes, totally worth it :-)

Cross-site caching: see above.

Your argumentation on security essentially boils down to: “everybody is doing it, so I’ll do it as well.” Sure, in case of Google CDN the risk is low. But still, Google CDN is a very interesting target for hackers (and governments), exactly because it is used by so many websites. If eventually your website turns out compromised through Google CDN, will seeing other websites suffer the same fate give you comfort? And wouldn’t it be better to avoid an unnecessary risk here in the first place?
olivvv 2014-07-01 14:06

js libs hosted on cdn has been a cargo cult from the beginning (no proof of benefits but proofs of real issues), and then it takes years to get ridden of it. We should make bett on what is the next golden calf in FE development that we will have to burn.
Ian Feather 2014-07-01 17:03

There’s a peculiar amount of misinformation within this post amongst some other more valid points.

We shouldn’t kid ourselves that CDNs are there for anything other than performance gains. Simplicity and scalability are misnomers in your Advantages list which feel like you put them there just to be kind :)

This line in the performance bullet is a strange one:

“However, whether there is really a performance advantage when using persistent connections (that’s the typical scenario these days) is doubtful to say the least.”

In the era of mobile connections we are definitely not in the persistent connection world. If you have any mobile users (and you do) you will definitely be serving your site over a non-persistent connection at some point as well as over a connection with potentially higher latency and more network collision. CDNs are ultimately for putting your assets closer to the user. They are for reducing latency between the time that the browser requests your asset and and the time it is evaluated. If you host your site in the US, a user in Australia will incur a latency of 100s of milliseconds (regardless of vendor and bound by the speed of light). Using a CDN will place your static assets closer to each user and reduce latency.

For the disadvantages you mention a lack of control. This can be mitigated by placing a CDN in front of your own asset servers or a third party static server with good uptime. With cdns and asset servers providing SLAs of 99.9%+, are you really going to be able to guarantee you can do better with your own hardware?

And as for stability, fundamentally your site shouldn’t depend on javascript libraries being available. Transfer failures and downtime will happen whether it’s your servers or a third party one. This is often not to do with your hardware but again with the network (something CDNs can help with reducing e.g. timeouts). You should include javascript in a resilient way which doesn’t stop the user from using your application.
oever 2014-07-01 22:09

A superior solution to CDN is the Magnet URI which includes a checksum on the content and a fallback location to retrieve the file.
http://en.wikipedia.org/wiki/Magnet_URI_scheme

for example:
magnet:?as=https%3A%2F%2Fnews.ycombinator.com%2Ffavicon.ico&xt=urn:sha1:RFJGKU7TT4ZMU7RUB7SR7SW3BINM5JKL

‘as’ gives the url to retrieve the file if it not present locally
‘xt’ gives the base32 of the sha1 of the file contents.
Stephen 2014-07-03 09:19

People in China will fail to read any site that depends on the Google CDN for scripts — because China regularly blocks all access to google.com.

Wladimir Palant

It’s actually ajax.googleapis.com for the Google CDN but judging by https://meta.stackoverflow.com/q/258288/785541 it is indeed an issue. Exactly my point – there is a number of conditions where a user will be able to access your website but not the CDN you are using.
Stephen 2014-07-04 04:02

Wladimir: I totally agree with you. When I am traveling in China, I constantly get frustrated by websites not working. Most of the time the site can be reached, but the Google CDN can’t.

In many cases, it was just Google Analytics, not even a library. But still, it blocks the entire site from being accessed in China.
Vincent Kleijnendorst 2014-07-04 08:33

HTML5 boilerplate uses

<script src=”//ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js”></script>
<script>window.jQuery || document.write(’<script src=“js/vendor/jquery-1.11.1.min.js”><\/script>’)</script>

If, for any reason, the CDN is not loading the local version will be used.

Wladimir Palant

Note that this “HTML5 boilerplate” has nothing to do with HTML5.

This approach solves the availability issues, albeit in a very ugly way. It doesn’t solve the other issues, and it increases the complexity of the solution significantly.
Hash 2022-02-07 11:31

I've just come across this thread in 2022 ... and had stubmled upon a few addons like Decentraleyes and LocalCDN. Of the two, LocalCDN is much more powerful and hopefully resolves (at a individual user level) the issues mentioned above...

Wladimir Palant 2022-02-07 11:35

In 2022, Subresource Integrity also exists and allows websites to solve this issue without bothering individual users.

Please don’t use externally hosted JavaScript libraries

See Also:

Comments

Leave a comment