Editing Semester A Progress Report 2012 (section)

==Aidan Duffy==
===June 9, 2012===
===Executive Summary===
Working on a cold case over 60 years old, our project has been designed to apply modern cryptographic and computational techniques in order to try to solve the mystery of the Somerton Man. After 12 weeks, reasonable progress has been made in expanding upon prior years' work on the language of the code, whilst efforts to create a 3D model have met with unforeseen delays that should hopefully be overcome shortly.
===Introduction===
This progress report is designed as a record of the work done by myself and Thomas Stratfold during the first semester on our Final Year Honours Project for Electrical Engineering under the supervision of Derek Abbott and Matthew Berryman. The ultimate aim of our project is to solve the mystery of the Somerton Man case from 1948. In order to do this, we have been tasked to investigate and attempt to unravel a code that was found upon the body - to figure out the language the code is in, what (if any) cipher was used for encoding, or whether the "code" is in fact simply a series of random letters from the mind of a man possibly under the influence of poison.
We have also been set to produce a 3-dimensional model of the victim's head from a bust that was made at the time, in order to provide a method for identifying the unknown man.
Whilst the techniques and programs developed in this assignment are being used to try to solve the mystery of the Somerton Man, they are designed to have a broader application beyond the scope of just this case.
====Background====
At 6.30am on the 1st December 1948, a dead man was discovered on Somerton Beach, South Australia, resting against the seawall. No identification was found on the body, the only items he carried being some chewing gum, cigarettes, a comb, an unused train ticket, and a used bus ticket - for a bus stop just 250 metres from where his body was found.
The pathologist performing the autopsy found that the man's stomach and kidneys were deeply congested and the liver contained excess blood. He suggested the victim died from poisoning, but was unable to identify the specific poison used. A review of the case in 1994 concluded that it was likely the man died from digitalis poisoning.
As is evident from the previous paragraphs and one of the goals for this project, the identity of the man remains a mystery to this day. He was described as being Eastern European in appearance, mid-40s in top physical condition and with his hands showing no signs of physical labour. He was clean-shaven, dressed in a fashionable European suit and his boots were polished, but all the name tags from his clothing had been removed and no record of his fingerprints or dental structure was found in international registries. By February 1949, there had been eight different "positive" identifications of the body by members of the public.
A month and a half after the discovery of the body, a brown suitcase believed to belong to the man was found at Adelaide Railway Station. It contained various items of clothing - again with no name tags - shaving items, and tools such as scissors, a screwdriver and stenciling equipment. The only identifying marks were "T. Keane" on a tie, "Keane on a laundry bag and "Kean" on a singlet, along with three dry-cleaning marks; these have never successfully been linked to anyone.
Many theories have been put forward as to the identity of the mystery man. One of the more popular is that he was a spy, with the lack of identification and the mysterious poisoning pointed to as evidence for this theory.
====The Code====
Around the time of the inquest, a tiny piece of paper was found deep in a fob pocket sewn withing the dead man's trouser pocket. On it was printed the words "Tamam Shud", and public library offcials called in identified it as meaning "ended" or "finished", found on the last page of a book called The Rubaiyat of Omar Khayyam. A nation-wide search was then conducted to find a matching copy of the book, but this was unsuccessful until a man revealed he had found a rare first edition copy of the translation by Edward FitzGerald on the backseat of his unlocked car on Jetty Road the night of the Somerton Man's death. This copy was missing the Tamam Shud on the last page, and microscopic tests indicated that the piece of paper found on the body was torn from this book.
In the back of the book were found faint pencil markings of five lines of capital letters, with the second line crossed out. This line's similarity to the fourth indicates a mistake was made, and adds to the likelihood the lines are a code.

WRGOABABD

MLIAOI

WTBIMPANETP

MLIABOAIAQC

ITTMTSAMSTGAB

However there is some debate over several of the letters: It is unclear whether the first and third lines begin with an `M' or a `W'; the struck-through second line could be an attempt to underline; and the `I' of the last line could possibly be a very narrow `V'.
Code experts brought in to analyse the lines in 1978 concluded:
* There are insufficient symbols to provide a pattern
* The symbols could be a complex substitute code or the meaningless response to a disturbed mind
* It is not possible to provide a satisfactory answer
Our aim is to provide more conclusive answers than these, through the power of modern technology and the vast swathes of data available on the World Wide Web.

===Previous Year's Work===
This project is now into its fourth year under Derek Abbott and Matthew Berryman's supervision, and the three previous groups to attempt to solve the code have all provided valuable insights for us to build upon. The group from 2009 were able to establish that:
* The letters are not random - they mean something; they contain information
* The code is not a transposition cipher - the letters are not simply shifted in position
* The results are consistent with an English initialism - the letter distribution is consistent with the letter distribution of the first letter of English words

In 2010, the group were compared the code's letter distribution to a particular text. Whilst they were unsuccessful in their endeavour, they did generate a large amount of pattern-matching data, and also discovered, somewhat surprisingly, that The Rubaiyat contained few, if any, matches. They also sought to harness the huge collection of data in the internet by developing and running a simple web application and pattern matcher in order to download and screen the contents of websites for patterns.
The 2011 group focused on two main aspects - expanding upon the functionality of the web crawler and pattern matcher, and investigating various cryptographic methodologies that could have been used to generate the code, then determining whether they were possible based on a comparison with the code itself. Both aspects were broadened so that they were applicable beyond the scope of the Somerton Man case.
All three previous groups have worked together on a "Cipher Cross-off List" - eliminating potential methods for encoding the letters that were available at the time, based upon their similarity to the frequency distribution of the letters in the code. Currently, more than 30 possible encryption schemes have been disproved, with the use of a One-Time Pad being a notable exception due to its (virtually) infinite number of possible permutations.

===This Year's Progress===
The focus of this years' project is upon the identification of the victim through a 3D reconstruction of a bust that was taken after death. This involves investigating techniques for scanning in the bust in order to create a model on the computer, undistorting the image to correct for any changes that may have occurred after death and in the 60-plus years since the bust was taken, and then colourising the model to accurately portray what the man may have looked like when he was still alive.
Time early on in the project was spent looking into various means of creating the initial 3D model, with different software programs tested and 3D scanning cameras as discussed. With the budget available for the project we were limited in the options we could afford, and so we settled for a low-cost "DAVID Laserscanner" package which seemed to offer the equipment and software we required in one easy to use kit. However, this kit involves the use of a laser line, and these are potentially harmful due to the high power of laser that is used (approximately 5mW). In order to be approved for the use of this kit, we were required to discuss and produce a Risk Assessment and Standard Operating Procedure applicable for our project.
After meeting with one of the Engineering School's Research Engineers, Henry Ho, it fell to me to draft the initial Risk Assessment and Standard Operating Procedure. Once this was  done, we again met with Henry to refine the documents, then submitted them for approval in order to begin scanning the bust. As of today we are still currently waiting for approval or the kit to be released to us, and this unforeseen delay has pushed back the scanning until the second half of the year. However we are confident that we should not be held up much longer, and may be able to make a start during the mid-year break. We have also sought to expand the range of languages considered, as the original investigation was limited to primarily western-European languages. In order to speed up this process, I wrote a simple Java program which took in a text file and output a list of the letters of the alphabet and a value alongside each indicated the number of times that letter occurred in the file. The aim was to then write a "batch file" for windows in order to process a large number of languages in one go, and then analyse the frequency distributions of each to compare with the code. This was then adjusted by Tom to be done entirely within Java using a driver program, with the output then imported by Microsoft Excel to do the statistical analysis. I also modified the program to only take the first letters of each word and count the occurrences of these, since this was suggested as more similar to the code by the group from 2009.