Last Friday, I was surprised to notice that the front page of the Neue Zürcher Zeitung's daily edition was scattered with nothing but zeros and ones. At first, I thought this had to be some kind of a printing defect, similar to the illegible raw output of a PDF. But then I realized that the gothic letters in the title were also affected by the anomaly; this couldn't be a mistake.
Sure enough, this was just a bogus cover page in front of the real title page to advertise the newspaper's new online archive. After manually decoding the first value 01001110 (N) together with the following 01011010 (Z) occuring twice, I knew the rest of the page couldn't be random, so I decided to unveil the hidden text...
First step was to get a digitized version of that newspaper issue. Scanning it wasn't necessary since it's available online. I then used tesseract, an OCR engine to retrieve the strings from the image file (note: this isn't even necessary since one can just copy-paste the contents from the PDF, but it's just more fun). Finally, I hacked the following ruby script to convert the binary data into characters:
#!/usr/bin/ruby puts "converting file #{ARGV[0]}" str = IO.read(ARGV[0]).gsub(/[^01]/, '') puts str for i in 0..str.length/8 print str[i*8, 8].to_i(2).chr end putsThe result: the encoded texts correspond to the articles of the real front page. Of course, they had to be shortened to fit. I was a bit disappointed though not to find any easter eggs :( Still, it was a good distraction :)
Nice one :)
ReplyDeleteInterresting read
ReplyDelete