PHP vs BOM – Rik Lewis

I’ve just fixed a little head-scratcher, and as a little reward to myself, I’m going to write about it. Well, I’m sat on train, so methods of celebration are a little limited.

Anyway, what’s that title about?

PHP is a popular general-purpose scripting language that is especially suited to web development. And a BOM (or Byte Order Mark) is a special Unicode character used at the start of a file to indicate the character encoding used in the file.

To give a little context, my problem surfaced when I was including some generated PHP files into my main PHP file, called via AJAX. The generated PHP files contained data held in arrays, and I only wanted to load in the data at the right time, instead of loading it all in up front. The main PHP file was then using die to exit out, returning a JSON string. This works nicely when called using AJAX.

However, when I started including the files, I started getting \uFEFF characters appearing at the beginning of the response, which meant the JSON decode in the browser was failing due to invalid characters in position 0. What I should have realised was that there was one of these \uFEFF characters for each file that I was including, and they were the BOM characters.

So what’s going on?

The PHP files I had generated were Unicode encoded (UTF-8 to be precise) and included the BOM. PHP doesn’t like this, they should be ASCII encoded, and should never include a BOM. Once I re-generated the files using ASCII encoding, everything worked as expected.