Editing Semester A Progress Report 2012 (section)

===2012 Progress===
This year the main focus of the project is on the identification of the victim through a 3D reconstruction of a bust taken months after the victim's death. The other main focus of this year is on verifying and improving on the work of previous years; this has been done through expanding the Web Crawler and testing the code against more languages.


====3D Reconstruction====
The 3D reconstruction is a new aspect to the project, which we have been exploring, as a hope to provide a reconstructed image of the victim's face in the hopes of finally identifying him.

The first part of this aspect of the project involves scanning the bust, creating a model on the computer, modifying the model to undistorted the image and correct any changes to the face caused from the post modem.  After this, it involves adding colour into the image, so that the model would look more realistic and would make it easier to identify the victim.

The first part of the semester involved determining how we would create the model; we begin by testing out a few easily available 3D modelling software, such as 123D and PM Scanner. Neither of these was very efficient or easy to use, and the examples did not appear to be of very high quality.  As a result we decided to instead use a low-cost “David Laserscanner”, which is an easy to use kit that allows for 3D modelling. Since this kit involves the use of a 5mW laser line, which could have the potential to cause harm if handled incorrectly. As a result we were required to create a Risk Assessment and Standard Operating Procedure, included in the appendix, before we were allowed to purchase the kit.

My contribution to this aspect of the project involved meeting with one of the schools research engineers, Henry Ho, along with Aidan, and to create the risk assessment, which would be used in conjunction with the laser scanner.

We are still in the process of waiting for the David Laserscanner kit to arrive, so have been unable to complete any further progress in the area.


====Web Crawler====
The web crawler created in 2011 is already functional, however it takes a very long time to scan web pages, and as a result would take months to check even a fraction of the Internet.  The idea behind this aspect of the project was to modify the crawler so that it would use pre-indexed data; the data search engines use to speed up searches.

Aidan did majority of the work conducted in this area.

====Language Verification====
In previous years there has not been a lot of focus on what language the code could be written in. In 2009 they tested 10 languages and came to the assumption that English fit the best. This year we have decided to verify this assumption by testing more languages and performing further analysis. This has been my main focus for contribution towards the project.  

To start this analysis, it began with reviewing previous years work; previously they tested languages by using a frequency analysis of the most used letters, and compared this with the code.  To test this we first went through the dictionary and calculated the number of words that begins with each letter of the alphabet. This was then converted into a frequency analysis and compared to the Somerton code.  The results from this were not very promising, as there was very little matching between the two lots of frequencies. The results are shown in figure 1 of the appendix.

Considering the previous years had mainly focused on Western-European languages, and records had shown the victim was most likely of Eastern-European decent, we decided to expand and consider more languages.  I was able to find a fantastic web site, which had the Tower of Babel bible passage translated into over a hundred different languages. Using this site, I was able to get 85 different languages that contained approximately 1000 characters in each translation.  This allow for frequency analysis to be performed by running a Java program which I had modified to calculate the number of times each letter appeared in the texts. This analysis was then repeated; by calculating the number of times each letter was the initial letter of a word.

The results from this were quite interesting. When we compared the initial letter frequency results with the Somerton code, by computing the difference in frequency and standard deviation. The results showed that English was the third closest, behind two Philippians’ languages Ilocano and Tagalog.  This helps to support the theory that the code is in English since it has such a high frequency comparison to the code. A table of these results is included within the appendix.

Further analysis need to be done in this area, the best thing to do would be to tae the top 20 languages and perform more frequency analysis using longer texts to refine the results.  As a result of the data we have collected, we will be constructing a Language Cross-Off List, similar to the cipher version that will explain why we have discounted each of the languages tested.