Critical design review 2009: Who killed the Somerton man?

From Derek
Jump to: navigation, search


Honours students

Due date

This assignment is due by Semester 1, Week 8, 2009.


This project investigates different possibilities for discovering the meaning behind the obscure string of letters found associated with the unknown man found dead at Somerton beach in 1948. The project is structured as to selectively rule out different possibilities of the code's meaning in an effort to get a better idea of what the code may be. In particular, many different cipher schemes are being investigated with particular focus given to the possibility that the intended language is English. The claim that the string of letters has no meaning (cryptographically) is also being investigated.

Background to the case

The dead body and its circumstances

Perhaps one of South Australia's most bizarre and longest standing mysteries is that of the Somerton Man. His body was found resting against the rock wall on Somerton beach opposite the Crippled Children's home, shortly before 6:45am on the 1st of December, 1948. According to the coroner's inquest, the officer in charge, Constable John Moss, "looked to see if there was any disturbance of the sand and the body, and was sure there had not been." According to John Matthew Dwyer, who performed the postmortem, "the death (of the man) could not have been natural" (Coronial Inquest[1] page 11) due to severe congestion in many of the organs, most notably the liver and spleen. Such congestion is consistent with poisoning, but not a trace was found. There was an unlit cigarette just above his ear when he was found, and a partly smoked portion of a cigarette on the right collar of his coat. There were cigarettes on the body, which were in a packet" (Coronial Inquest[1] page 4) along with a quarter full box of matches. There was also a metal comb and some chewing gum. Also in his possession were "a railway ticket to Henley Beach, also a bus ticket, a tramway bus ticket" and a slip of paper bearing the words "Tamám Shud".

The railway ticket was for the 10:45am train to Henley Beach for the 30th of November, but had not been used. The Tramways bus ticket on the other hand, had been used, and according to the Claims Officer's Assistant, was purchased "somewhere between the railway station on North Terrace and the intersection of West Tce and South Tce while the bus was en route to St. Leonard's departing from the Railway Station at 11.15a.m. ... There may have been a Somerton bus before this, but this would be the first St. Leonard's bus to leave after the 10.45 a.m. train to Henley Beach" (Coronial Inquest[1] page 5).

Before leaving the railway station, the man deposited a suitcase in the cloakroom. This case contained various items of clothing, of which some bore the name "T. Keane", while others followed the trend set by the clothes that the man was wearing, in that they bore no name, many having had the name tag torn out. There were several other items in the suitcase, including a stencilling brush, scissors and knife in a sheath, and 3 pencils. However, these, like the name T. Keane, proved fruitless for useful information.

Here is a photo of the original code found in 1949 in the back of a copy of the Rubiayat of Omar Khayyam

The code

In early 1949, a small scrap of paper was found in the coat pocket of the deceased man bearing the words "Tamám Shud", which means "finished". This scrap of paper was identified to be from the last page of the book of poems called "The Rubiayat", written by the famous Persian poet Omar Khayyam.

The police put out an announcement through the media that they were searching for a copy of The Rubiayat with the "Tamám Shud" phrase, if not the entire last page, removed. Shortly afterwards, a local Somerton man came forth with a copy of the book, claiming that it had been tossed onto the back seat of his car on the 30th of November the previous year.

Two things of significance were quickly noticed about this particular copy of The Rubiayat: Firstly, there was a phone number pencilled in on the back cover, which led police to a nurse (whose real name is withheld) who is known as Jestyn; and secondly, pencilled into the book was the short code that can be seen below. So far, this code has never been deciphered. It is unknown whether it is, indeed, a code or simple a meaningless array of letters. The original language of the message is also unknown.

The Inquest

Due to the suspicious nature of the case the coroner was called to perform an inquest. This inquest looked at the circumstances surrounding the body both during the lead up to and during the fallout of the death and the body itself. The inquest report contains detailed information about the state of the body, through witness reports and through the post mortem examination conducted by John Dwyer, and information is also given regarding the suitcase left at the Adelaide Railway Station cloakroom.

One of the most intriguing things about the body was that "The heart was of normal size, and normal in every way", yet "small vessels not commonly observed in the brain were easily discernible with congestion. There was congestion of the pharynx, and the gullet was covered with whitening of superficial layers of the mucosa with a patch of ulceration in the middle of it. The stomach was deeply congested...There was congestion in the 2nd half of the duodenum". "...acute gastritis haemorrhage, extensive congestion of the liver and spleen, and the congestion to the brain." This analysis, conducted, by Dr Dwyer, led him to conclude "I am quite convinced the death could not have been natural" and to say "the poison I suggested was a barbiturate or a soluble hypnotic." and yet that "There are other poisons which do come into the picture which would have decomposed very early after death" (Coronial Inquest[1] pages 12-13).

However, this is seems to be in conflict with the findings of Dr Robert Cowan, who analysed in detail a section of the stomach (and its contents), a section of the liver, a section of muscle, some blood and some urine taken from the body. In this report, Dr Cowan states:

"I feel quite satisfied that if the death were caused by any common poison, my examination would have revealed its nature. If he did die from poison, I think it would have been a vary rare poison. ... I think that the death is more likely to have been due to natural causes than poisoning."

This contradiction baffles experts still as, by the last significant look at the case in 1978, no matching poison had been uncovered[2] to fit the description given by the 2 doctors.

Also, of interest to the case, the clothing the man was wearing had the labels removed, so that no name was present on him, and nor did he possess any other form of identification. The suitcase that was found belonging to the deceased man did contain some labels "bearing the name 'T Keane'" (Coronial Inquest[1] page 22), while other items also had the labels torn out. This name, along with fingerprints and photographs of the deceased were sent around the world to all States in the Commonwealth and New Zealand, and also the important fingerprint bureaus overseas. The reply was "the person is not known to us" (Coronial Inquest[1] page 25).

This is odd, as no one anywhere seemed to know who the man was. Yet according the the Inside Story report[2] in 1978, someone kept leaving fresh flowers on his grave site every spring.

Operation Venona: a brief history

Towards the end of the second world war, in 1943, an operation was launched by the US and UK to spy on the Soviet Union, who had formed the Soviet Bloc (leading to widespread fear of a Third World War and also led to the Cold War). This operation, known as Operation Venona, was a code-breaking operation targeting encrypted Soviet diplomatic communications. This operation revealed that sensitive Australian government information had been leaked to the Soviet Union from an Australian source. Operation Venona uncovered a spy-ring in Australia being run from the Soviet Embassy, and over the next few years, during the Cold War, the UK Security Service made numerous investigative trips to Australia and reports to the Australian Government as to the security situation. Was the Somerton Man a member of that spy-ring, or a member of the UK Security Service? Was he an unsuspecting victim of the Cold War because of his Eastern European appearance? Was it all coincidental, or is there something more behind it all? [3]

Previous attempts at cracking the code

Previous professional attempts at insight into this code were limited due to them having been carried out a long time ago without the benefit of modern techniques and databases. Also another problem was that they appeared to make fixed assumptions about the characters in the code. They did not appear to take into account that some of the symbols are ambiguous. For example, it is almost impossible to discern whether the first character on either the first or the third line is an M or a W (or indeed perhaps something else entirely), and a similar case is present for the fifth line's first character (is it an "I", a "V" or something else?).Also, the second line has been omitted entirely in previous cracking attempts, as it is assumed to be crossed out in error, but there is no definite reason to believe that it indeed is an error. And what of the "X" above the "O" in the fourth line?

The other problem is amateur attempts by members of the public and astrologers is that is has been statistically demonstrated that if you make the mistake of 'cherry picking' you can virtually read anything you want into a sequence of letters. A good example of this is the Bible code controversy[4]. The Bible code is a simple 'skip code'.

"Skip codes involve a different way to read a text than we normally do, usually we read one letter at a time, the first letter, the second letter and the third letter and so forth. But with a skip code we might start with the third letter and then skip ahead ten to the thirteenth letter and then to the twenty third letter and so forth and maybe that would spell out a new word jumping ten letters at a time. Here’s an example, there’s a sentence here that says my way of showing a skip code is encrypted in the very words I put down here. Let’s take the first letter and jump every 14 and make that red. It’s very hard to read as it is written here you can see the letter m-a-r-y–h-a-d, it’s not so legible. So to make it look nicer and to understand it better, what we usually do is we break the line before each red letter. So now we have each line starting with the red letter and now the red letters are in a column and it’s very easy to read. Mary had a little lamb."[5]

The real 'code' concept arises from what this skip code produces. Such skip codes run on the Bible have been accredited for predicting various events in the recent past, such as the global economic collapse supposedly set to start in the year 2002 (which did come true[5]), and that therefore, by reading such codes, it would be possible to discern and prepare for the future.

However, there is substantial evidence present to disprove the theory of the Bible Code concept. One article in particular, "Solving the Bible Code Puzzle"[4], states:

"A paper of Witztum, Rips and Rosenberg in this journal in 1994 made the extraordinary claim that the Hebrew text of the Book of Genesis encodes events which did not occur until millennia after the text was written. In reply, we argue that Witztum, Rips and Rosenberg's case is fatally defective, indeed that their result merely reflects on the choices made in designing their experiment and collecting the data for it. We present extensive evidence in support of that conclusion. We also report on many new experiments of our own, all of which failed to detect the alleged phenomenon."

"One appellation (out of 102) is so influential that it contributes a factor of 10 to the result by itself. Removing the five most influential appellations hurts the result by a factor of 860. Again, these appellations are not more common or more important than others in the list in any previously recognized sense. It should be obvious from these facts that a small change in the data definition (or in the judgment or diligence of the data collector) might have a dramatic effect. More generally, the result of the experiment is extraordinarily sensitive to many apparently minor aspects of the experiment design, ... These properties of the experiment make it exceptionally susceptible to systematic bias. ... there appears to be good reason for this concern."

There are many more examples as to why this type of code analysis is flawed stated in this article.

There have been a several 'decryptions' that have been done by amateur code-breakers nation-wide after the Somerton Man's code was released and published in newspapers.

"Results ranged from, Go & wait by PO. Box L1 1am T TG to Wm. Regrets. Going off alone. B.A.B. decieved me too. But I've made peace and now expect to pay. My life is a bitter cross over nothing. Also I'm quite confident this time I've made Tamam Shud a mystery. St. G.A.B."[6]

Simon Singh, an expert on codes who commented that the Somerton code "looks simple".

Background to cryptographic ciphers

There are two main categories of cryptography. They are transposition and substitution[7]. In transposition, the letters are as they should be but are reordered (systematically) creating an anagram of the original text. In substitution (Note: the word cipher technically refers to a substitution but is often also used in reference to a transposition) the letters of the original text are systematically substituted for others (often using a keyword).

Transposition Schemes

There are many types of transposition schemes such as rail fence, route, columnar, double, among others. These are to be investigated subject to the results of certain tasks of this project (If investigations indicate that a simple transposition was not used, we will not need to investigate further). More detail of this is in section 8 below.

Substitution Schemes (Ciphers)

Vigenere Cipher

The Vigenere Cipher was invented in 1553. It is a substitution cipher scheme that uses a variable letter shift based on a key word[8]. Each letter is shifted along the alphabet and substituted with the corresponding letter. For example, with a shift of 5, the letter C would be substituted with the letter H.

The shift value is calculated from letters of the cipher keyword (A-Z meaning 0-25 respectively). The keyword is repeated until it is the same length as the ciphertext and each consecutive letter is used as the shift for each consecutive letter of the ciphertext.

For example, to encipher the phrase THIS TEXT IS SECRET with the keyword lemon, we shift the first letter by 11 (L) to get E, the second by 4 (E) to get L, and so on. The full resultant ciphertext is ELUGGPBFWFDIOFRE.


Playfair Cipher

The Playfair cipher was invented in 1854 by Charles Wheatstone and promoted by Lord Playfair. It is a cipher that uses a 5-by-5 grid set up using a keyword usually containing 5 different letters (for example using the words death or plain). This keyword is placed along the top row of the 5-by-5 grid (and continued to subsequent rows if necessary and omitting repeat letters in the keyword) with the rest of the alphabet (omitting one letter of choice not in the phrase to be encrypted) filling up the remaining spaces in alphabetical order (left to right, then top to bottom).

For example, suppose we use the same phrase THISTEXTISSECRET as before, and the same keyword LEMON, and omit the letter Q. The 5-by-5 grid will be as follows:


The Playfair cypher is then implemented using each successive pair of letters. If there is an odd number of letters then a random letter is added to the end of the phrase and if any pair of letters are the same then a letter must be added between them (usually an X). Each pair of letters is used to form a rectangle on the grid and the pair of letters is substituted for the opposing corners of the rectangle.

For example: The first pair in the above example is "TH":


So the letters "TH" would be substituted for the letters "RJ"

In the event that a given pair of letters are in the same row or column, the letters are substituted with the letters immediately below if they are in the same column or immediately to the right if they are in the same row (If the letter is on the edge then the letter on the opposing side is used).

For example: The next pair of letters in the above example are "IS". It can quickly be observed that these two letters fall in the same column, so the letters are substituted by the letters immediately below in the table to avoid the letters being the same as in the unciphered text. Hence, the pair "IS" is substituted with the pair "SX" as they are the letters immediately below.

Implementing this process on the entire phrase THISTEXTISSECRET yields RJSXROYSSXRMBSOR.

One Time Pad

A one time pad is conceptually identical to the Vigenere cipher but the encryption key used is exactly the length of the plaintext. This is achieved by using a pad as the key. A pad can be a book or any sufficient length of text that is accessible by both the sender and the receiver of the ciphertext. The one time pad system is theoretically unbreakable provided that the key (pad) is a random sequence (each letter is independent and identically distributed). Provided that both the plaintext and the key (pad) are random sequences (or close enough to), the resultant ciphertext should have an equal distribution of letters. The same key (pad) can only be used once without reducing the security of the message to below 100% (if the two ciphertexts were concatenated then this is effectively a Vigenere cipher with a repeating key). Effectively, the only way to break the code is to know the key.

The one time pad presents as a real possibility for the scheme used in the Somerton Man's code. Since the code was found in a rare copy of The Rubaiyat of Omar Khayyam this yields the obvious idea that a certain poem in that edition of the book was used as the pad. However, the letter frequency distribution of the Somerton Man's code is far from flat as a one time pad encryption is likely to give (although, as the assumption of a random sequence as a pad breaks down, so does the idea of the ciphertext having an even letter distribution). Ultimately though, there is no way of assessing whether a one time pad was used short of figuring out the pad (key) by some means unrelated to the text itself. Clues from the case, such as names of key people, etc. will be looked into with regards to the possibility of a one time pad. However, we do not reasonably expect to be able to determine whether or not a one time pad system was used.

Other Cipher Schemes

The following table shows other famous cipher (substitution) schemes ordered in rough chronological order. We intend to further investigate each of these schemes (and others that may come to light). Note that all of them were invented before 1948 (when the Somerton Man was found). Any cipher schemes invented afterwards can be initially ruled out.

Name Year Invented
Book cipher ancient
Null Cipher ancient
Affine cipher 1466
Alberti cipher 1400s
Alphabetum Kaldeorum 1400s
Autokey cipher 1500s
Arnold Cipher 1700s
Four/two square cipher 1800s
Great Cipher 1800ish
Nihilist cipher 1880
Trifid cipher 1901
Bifid cipher 1901
ADFGVX cipher 1919
M-94 1922
Reservehandverfahren World War II
VIC cipher World War II

Information Theory (Shannon Entropy)

Entropy, from an information theory definition (often referred to as Shannon Entropy), is a measure of the information content or uncertainty of a random variable (as the more uncertain a random variable is, the more information its result represents)[9].

The formula[9] for calculating the Shannon entropy of a discrete random variable [math]X,[/math] with sample space [math]\{x_1 . . . x_n\},[/math] is as follows (where [math]p:X\rightarrow R[/math] gives the probability of [math]x_i[/math] occurring).

[math]H(X) = -\sum_{i=1}^n {p(x_i) \log_b p(x_i)},[/math]

Note that the base of the logarithm determines the units of the result. In this project we will use [math]b=2,[/math] and our result will therefore be in bits per character. It is important to note that the result is the theoretical data size minimum limit of a lossless compression if treated as an array of data.

In this project, we will be estimating the entropy (per character) of different texts and of the Somerton Man's code by taking the letter frequency distribution of the text and using this as an estimate of the probabilities of each letter's occurrence.

Preliminary work and results carried out so far

The Somerton Man's Code

The text (shown below) is subject to some interpretation. For example:

  • The first letter looks as if it could be either an M or a W.
  • There is a line that is most likely crossed out but possibly not and the line may mean something else entirely.
  • There is a cross on top of the O.
  • The fifth to last letter looks like an S but with a line through the middle. This is inconsistent with the other S.
  • The third to last letter is either a G or a C.

A photo of the original code found in 1949 in the back of a copy of the Rubiayat of Omar Khayyam Letter frequency distribution plot of the most likely intended sequence of letters

We consider the most likely intended sequence of characters is as follows (the graph shown above is a letter frequency plot of this): MRGOABABD MTBIMPANETP MLIABOAIAQC ITTMTSAMSTGAB

The Shannon entropy (see 6.3) of this character sequence (the most likely interpretation) is 3.58 bits per letter (estimating the probability mass function from the frequency of letters in the code).

Sampling of Random Letters

The graph below shows the distribution of letters (as a percentage) of 11 samples of 50 random English letters taken from various people:


Of particular interest in the above graph is the frequency of the letter "R", which occurs noticeably more than any other letter. This, combined with the fact that "R" only appears once in the Somerton Man's code, suggests the code is not a meaningless random set of letters, but as the sample set used to date is only 11, this is not yet conclusive.

Letter Frequency Plots

The letter frequency distributions of selected categorical lists and of the eBook "1984" by George Orwell were calculated using some custom java-based software modules (see appendix A). The resultant frequency plots are shown below. This data will be used in many sections of this project.

Orwell Full.png Elements Full.png Australian Cities Full.png

Initial Letter Frequency Plots

The following are letter frequency plots of the initial letters of certain categorical lists and, once again, of "1984" by George Orwell. These will also be used for many tasks as described in section 8 (below).

Orwell First.png Elements First.png Australian Cities First.png

Preliminary Coding of Selected Cipher Schemes

Software modules for enciphering and deciphering the Vigenere cipher have been created and an eBook (once again "1984" by George Orwell) was enciphered using the keyword "LEMON". The letter frequency distribution (or probability mass function from an information theory perspective) was found using the same modules used in section 7.3 (above). The result is shown below. When compared to the distribution of letters in the Somerton Man's code in section 7.1 (above), the differences far outweigh the similarities. It is therefore likely that further investigation (which is dependent on further sampling for statistical inference) will infer that the Vigenere cipher scheme was notused.

Vigenere Orwell Full.png

Similar preliminary work is in progress using the Playfair cipher.

Approach and methodology for remaining part of project

We have chosen to investigate 6 hypotheses on the meaning of the code. They are:

  1. The code is meaningless
  2. The code is in English and the letters have been substituted
  3. The code is in English and the letters have not been substituted
  4. The code is in a foreign language and the letters have not been substituted
  5. The code is in a foreign language and the letters have been substituted
  6. Certain structural features can give information useful for hypotheses 1-5

Using this set of hypotheses, we are effectively testing the code for both transposition and substitution cipher schemes in both English and foreign languages with hypothesis 1 being a "catch all" since we define meaningless to be that the string of letters does not represent meaningful information. We consider the set of hypotheses 1-5 to be exhaustive (exactly one must be true) if and only if the approach we take for hypotheses 2 and 5 do not assume that there is no transposition

Hypothesis 1: The Code is Meaningless

One important hypothesis that we will investigate is that the string of letters is meaningless. By meaningless, we mean that it does not hold any information. Under this hypothesis we assert the following possible scenarios of how the string of letters was written:

  1. The letters were deliberately intended to be random (either intended to confuse people or otherwise)
  2. The Somerton Man (or anyone else) was intoxicated and the letters are a result of delusion.

To test this hypothesis we will survey as many people as possible for 50 random letters each (Let this be known as task 1A). We will then repeat the experiment with intoxicated people and compare the difference and the distribution itself with the Somerton Man's code. Some preliminary results of this have been shown in section 7.2 (above).

It is important to note that this investigation will be done with English speaking people and hence assumes that the random string of letters (as we are assuming it to be) was invented by an English speaking person. This could be investigated further with non-English speaking people but will not be done in this project.

Hypothesis 2: English Code With Substituted Letters

This hypothesis will be given the most attention as there are a few different approaches that can (and will) be taken. As shown in section 6.2, there is a long list of possible popular cipher schemes that could have been used.

The following process will be attempted for each cipher scheme in an attempt to show that it was not used (To narrow down the list of likely cipher schemes). This is task 2A.

  1. Software modules to encipher with the scheme will be created
  2. An English eBook will be enciphered using the software modules
  3. Samples of contiguous text (length 50, approximately that of the Somerton Man's code) will be taken
  4. These samples will be used to determine the likelihood that the Somerton Man's code has been enciphered with the cipher scheme being tested. If this likelihood is small enough (say, less than 5%) this infers that the particular cipher scheme was not used.

With the (hopefully) reduced list of possible cipher schemes, deeper levels of statistics (using the same enciphered eBooks) will be investigated such as the probability of a specific letter preceding another specific letter (transition probability) and compared with the Somerton Man's code in an attempt to further reduce this list.

Due to the nature of some cipher schemes, there may be other methods of ruling them out or determining a key length or other useful information. An example of this is Kasiski examination[10](used to determine the key length of a ciphertext enciphered under the Vigenere cipher). Should the Vigenere cipher not be ruled out easily then Kasiski examination will be explored. As will other similar techniques specific to specific cipher schemes.

The possibility of the cipher scheme being a one time pad has been discussed in section 6.3.4. This possibility comes under this hypothesis but, as mentioned above, the likelihood of determining whether or not a one time pad is used is small and is mostly dependent on clues to do with the case. This is task 2B.

Hypothesis 3: English Code Without Substituted Letters

This hypothesis assumes that each letter is as it is intended but that the letters may be in a different order. To assess this hypothesis, we are investigating the following possibilities:

  1. The string of letters is an anagram (in English)
  2. Each letter represents the first letter of a list of words. This list could be periodic elements, place names, names of people, train station names, etc.

To test these possibilities, eBooks and selected lists (of any possible categories) will be used to generate frequency plots and sampled for sets of 50 and statistical analysis will be used to infer the likelihood of the code representing these specific categories of information (task 3A).

The anagram hypothesis can be ruled out easily by comparing the frequency of letters of the Somerton Man's code and the frequency of letters in the English language (both shown below). Most notably, the Somerton Man's Code has very few Es and Ns and too many As compared to this English eBook. This will be done more formally (statistically) but given the large differences the result that will yield is obvious.

Frequency of letters in an English eBook with the frequency of letters in the code. Note the huge differences

If a list category is found to have large similarity with the Somerton Man's code then transposition schemes will be investigated in an attempt to decode the text (task 3B).

Hypothesis 4: Foreign Language Code Without Substituted Letters

This hypothesis is that the letters have not been substituted (but may have changed order) and that the original text was not in English.

To test this hypothesis, the Shannon entropy (See 6.3) of the Somerton Man's code will be assessed for its information content and compared to entropy values from selected eBooks from as many foreign languages as practical (initial letters will be investigated also). If entropy values differ significantly across languages then we may get an indication of the which language the text is in. This is task 4A.

A secondary test that will be undertaken should the previous yield little result will be to investigate the compression of the text when concatenated with pieces of text (if possible, the same text translated) from different languages under a known compression algorithm[11] (task 4B). Given that the compression algorithm will be selected to suit, this will give us a better comparison of the entropy of the code compared to that of certain languages. We may then be able to statistically infer which language is likely for the original text.

Hypothesis 5: Foreign Language Code With Substituted Letters

This hypothesis is that the Somerton Man's code represents text in a language that is not English and that the letters have been substituted.

This hypothesis is not intended to be investigated in this project (but should be noted as a possibility). However, given that this project has a fixed time frame, this hypothesis may be looked into if time permits or if other hypotheses quickly prove false.

Hypothesis 6: Structural Features May Yield Information

The final hypothesis that we will test is that there are certain structural features of the code that may yield some information. It is important to note that this hypothesis is not independent from hypotheses 2-5 ie. the assertion of this hypothesis as true does not logically prove hypotheses 2-5 false.

The following are examples of characteristics of the Somerton Man's code that will be investigated in regards to this hypothesis:

  • Repeated pairs such as AB many times and ABAB once
  • Set of consecutive non-repeating characters such as BIMPANET
  • Palendrome triplets such as TMT and AIA

Features including, but not limited to, these will be searched for in selected eBooks and categorical lists and other internet resources in order to get an idea of the commonality of these specific features in different languages and categories (task 6A). Subject to the results of this investigation, further study will most likely be done on features in the code that are also inherently more common in certain categories or languages. The structure of this task (6B) is to be determined based on results of task 6A.

Project management

Process Structure, Task Allocation and Schedule

The block diagram below shows the rough individual tasks under each hypothesis. They have been allocated roughly evenly. It is important to note that the outcome of 6A and 6B may change the process structure as the results may indicate things such as original language, cipher key, etc.

Task Allocation.png

The following Gantt chart shows the rough time allocation of each task. Note the dependencies. The final report is dependent on most other tasks but some have not been considered as dependencies because they may not be necessary given the outcome of other tasks. Task 2A is by far the largest as it essentially a separate task for each cipher scheme. The splitting of this task between the group members allows for greater flexibility. That is, if some tasks lead to positive or negative results quickly then the division of cipher schemes under task 2A can be altered to rebalance the work load.

CDR gannt chart.png

Risk management

There is a considerably large amount of uncertainty in this project's structure since it is quite dependent on the results of specific tasks in the project. This means that there is an inherent level of risk in the process structure. Given this though, the process has been structured to allow for changes in the work load balance between the group members by allowing for some discretion in task 2A as to who does what. At the same time, however, each group member also has a series of tasks of their own. This allows minimisation of the risk of overlap in the work done by group members.

This project has very minimal occupational health and safety risk as most of the tasks involve developing software modules and collecting data.


  • Cake for police historians so that they will talk to us - $24.95
  • DVD of 1978 ABC documentary - $88.00
  • Retrieval from the National Archives - $30
  • Coronial Inquest - $0 (it was free)


Ultimately, the best outcome of this project would be to discover which of our hypotheses is correct and decipher the text. However, the project is (deliberately) not structured in a way that is likely to produce this result. Realistically, the project aims to be a broad preliminary investigation into the possibilities of the code's meaning that can hopefully be used to lead future researchers in the right direction. There is some inherent uncertainty as to the projects structure as some tasks' processes are dependant on results of others. We have done our best to estimate the likely course of the project though.


Appendix A: Software modules

The software being developed to assist in the examinations is separated into different modules. The module structure separates the logical individual tasks and combinations of these modules will be used depending on the task. The following are modules that have been or will be created for this project.

  • One module for each cipher scheme that can encipher and decipher a text file (like an eBook)
  • One module to read each letter of a text file (like an eBook) and return the frequency of letters in the text.
  • One module to read a text file and return the letter frequency distribution of the initial letter of words in the text
  • One module to read a random contiguous set of 50 letters from a text file
  • and others that may become necessary

To implement these, the java-programming language is to be used.


  1. 1.0 1.1 1.2 1.3 1.4 1.5 Inquest Into the Death of a Body Located at Somerton on 1st December 1948, State Records of South Australia, GX/0A/0000/1016/0B, 17th & 21st June 1949.
  2. 2.0 2.1 S. Littlemore, "The Somerton Beach Mystery" (Documentary), Inside Story, 24-08-1978
  3. "About ASIO. Significant Events in ASIO's History", 10 May 2009, <>
  4. 4.0 4.1 B. McKay, D. Bar-Natan, M. Bar-Hillel, and G. Kalai, "Solving the Bible code puzzle," Statistical Science, Vol. 14, No. 2, pp. 150–173, 1999.
  5. 5.0 5.1 "BBC: Horizon: The Bible Code - transcript", 11 May 2009, <>
  6. Orr, S 2009, 'Riddle of the end', The Sunday Mail, 11 January, p. 76.
  7. Simon Singh, "The Code Book", ISBN 1-85702-889-9, pp. 7, 1999.
  8. R. Morelli, “The Vigenere [sic] Cipher,” Historical Cryptography Web Site, Trinity College, <> (6 September 2007)
  9. 9.0 9.1 C.E. Shannon, "A Mathematical Theory of Communication", Bell System Technical Journal, vol. 27, pp. 379-423, 623-656, July, October, 1948
  10. "Friedrich W. Kasiski." Encyclopædia Britannica. 2009. Encyclopædia Britannica Online. 10 May. 2009 <>
  11. T. Schürmann and P. Grassberger, Entropy Estimation of Symbol Sequences, Chaos, Vol. 6, No. 3 (1996) 414-427

See also