Authorship detection: 2011 group: Difference between revisions

From Derek
Jump to navigation Jump to search
Dabbott (talk | contribs)
Created page with '== Supervisors == *Prof Derek Abbott *Dr Matthew Berryman *Dr Brian Ng *Mrs Maryam Ebrahimpour ...'
 
A1164699 (talk | contribs)
No edit summary
 
(92 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== Supervisors ==
== Supervisors ==
*[[Derek Abbott|Prof Derek Abbott]]
*[[Derek Abbott|Prof Derek Abbott]]
*[[Matthew J. Berryman|Dr Matthew Berryman]]
*[[Matthew Berryman|Dr Matthew Berryman]]
*[[Brian W.-H. Ng|Dr Brian Ng]]
*[[Brian W.-H. Ng|Dr Brian Ng]]
*[[Maryam Ebrahimpour|Mrs Maryam Ebrahimpour]]
*[[Maryam Ebrahimpour|Mrs Maryam Ebrahimpour]]
===Collaborators===
===Collaborators===
*[[François-Pierre Huchet]], ITII Pays de la Loire, Nantes, France.
*[[Francois Huchet|François-Pierre Huchet]], ITII Pays de la Loire, Nantes, France.
*[[Talis Putnins]], BICEPS, Latvia.
*[[J. José Alviar]], University of Navarra, Spain
*[[J. José Alviar]], University of Navarra, Spain


==Students==
==2011 Students==
*[[Jie Dong]]
 
*[[Leng Tan]]
*[[Yan Xie]]
*[[Tien-en Phua]]
*[[Kai He]]
*[[Zhaokun Wang]]  


== Weekly progress and questions ==
== Weekly progress and questions ==
Line 17: Line 20:
===Semester 2, Week 1===
===Semester 2, Week 1===


====Jie Dong====
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# First meeting with Derek, Brian and Maryam and other group member Leng and Tien-en.
# All team members had the first meeting with a Supervisor Derek, Co-supervisors Brian and Maryam
# Derek, Brian and Maryam introduce us the basic idea of this data mining project
# The basic idea and various applications were introduced by Derek
# The idea of authorship detection was introduced
# Discuss about previous attempts and further exploration on the meeting
# Several applications which data mining technique can be applied was mentioned
# Research the topic about authorship detection and data mining
# Researches of past year students were mentioned and Maryam sent us several past year research report together with the code
# Review the researches of past year students
# Research on the project, especially on SVM and some algorithms


'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Prepare for the proposal seminar.
# Further study on the past researches
# Read research report from past years students.
# Search the proper algorithms
# Understand project handbook.
# Have a group meeting with the other members Kai and Zhaokun


====Leng Tan====
====Kai He====
'''Progress and Status This Week'''
'''Progress and Status This Week'''


# the 1st meeting for the final year project was held with the supervisor, Prof Derek Abbot, co-supervisor, Dr Brian Ng, and Mrs Maryam, along with the team member.
# Met with a Supervisor Derek, Co-supervisors Brian and Maryam.  
# the initial project scope was introduced and general idea of the aim of the project is discussed.
# The supervisors introduced the concept of this project and discuss the outcome from last year project students
# basic idea on the techniques of authorship detection is shown as well.
# Research on authorship detection  
# several ideas for the future application of this project is highlighted.
# Study the previous algorithms
# some hints on getting started was given which is to read Talis's final year report, which will be provided by Mrs Maryam.
# the first milestone of the project which is the proposal seminar is reminded.


''' Plan and Goals for Next Week '''
''' Plan and Goals for Next Week '''


# fully read and understand Talis report.
# Literature search training will be held next week
# have a brief look on the code that will be supplied by Mrs Maryam.
# Have a meeting with team members
# do some research on the background information of some controversial issues like the works on William Shakespeares, the Federalist Paper and the Letter to Hebrew.
# Research on various methods
# read through the project handbook of 2010 to have a rough idea of all the milestones of the project focusing on the project seminar.
# Read papers on authorship detection
 
====Zhaokun Wang====
'''Progress and Status This Week'''


====Tien-en Phua====
1. Fist meeting with Derek and Brian and other group member Kai and Yan.
'''Progress and Status this week:'''
# Met up with project supervisor, Prof Derek Abbot, co-supervisor, Dr Brian Ng, and Mrs Maryam
# Derek discuss the concept behind authorship detection
# Derek explains about multi-dimensional graphs to link a disputed text to a known author.
# Discuss about possible future applications. Brian suggested code plagiarism and possibly music.
# Was provided by Maryam with other projects by students and started to go through the report by Talis.
# Went through the FYP Project handbook


'''Plan and Goals for new week:'''  
2. Derek and Brian introduced the outline and background about this project
# Identity the methods Talis used in his report
 
# Research on various methods
3. Based on previous year researches, Derek gave some suggestion about the following research.
# Read up on past works regarding authorship detection
 
# Research on controvesy
4. Derek passed the previous research resources to us.
 
 
''' Plan and Goals for Next Week '''
 
1. Read through and understand previous research report.
 
2. Research on controversy.
 
3. Research on various methods.
 
4. Prepare the proposal seminar.


===Semester 2, Week 2===
===Semester 2, Week 2===


====Jie Dong====
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Three methods are chosen for this project: word frequency, word recurrence interval, and trigram markov model
# Review past year’s three methods: word frequency, word recurrence interval and trigram markov model
# Reading material on SVM (SVM tutorial)
# On-going researches
# Play with SVM software on Matlab
# Attend a literature search training session with the other members
# Prepare slides for proposal seminar presentation on project aim, background, and part of project process
# Discuss algorithms chosen for this project on the meeting
# Prepare the proposal seminar on week 3


'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Combine slides with other group member and do some modification
# Modify the slides and send it to supervisors
# Send slides draft to supervisor for feedback
# Prepare the presentation
# Do more modification
# Analysis the chosen algorithms
# Presentation on Thursday
# Discuss the project management with the other members next meeting


====Leng Tan====
====Kai He====
'''Progress and Status This Week'''
'''Progress and Status This Week'''


# identified 3 methods that was mentioned by Talis.
# Attend the literature search training
# have a brief knowledge and information of the controversial issue.
# Identity the algorithms are used in this project
# have a brief idea on the upcoming propose seminar.
# Prepare the proposal algorithms and complete the slides for presentation
# Further reading on research papers
 
''' Plan and Goals for Next Week '''


'''Plan and Goals for Next Week '''
# Set up the Work Breakdown Structure, Milestones, Gantt Chart and Project Budget
# Send the presentation slides to supervisors
# Prepare the presentation of proposal seminar next week
# Analysis the proposal algorithms used in this project
# Research and discuss the classifier
# Have a team meeting with the other members


# research on SVM.
====Zhaokun Wang====
# research on the backgroud history of the project
'''Progress and Status This Week'''
# research on the different technique use before in history
# prepare project proposal


====Tien-en Phua====
# Abstract on proposal seminar.
'''Progress and Status this week:'''
# Allocate seminar role for each group member.
# Identity the three methods that Talis applied in his project, namely Word Frequency, Word Recurrence and Trigram Markvo
# Prepare outline PowerPoint slides.
# Briefly understand how the three methods work
# Identify the brief idea on the project.
# Identity the past works done by other researchers.
# Identity three main controvesy namely the Federalist papers, Shakespeare plays and the Letter to the Hebrews


'''Plan and Goals for new week:'''  
 
# Prepare for Project Proposal
''' Plan and Goals for Next Week '''
# Develop Gantt chart, project budget and risk analysis
# Present proposal seminar.
# Identity major milestones in project
# Identify the methods on project.
# Write up on controvesy
# Identify classifiers on project.
# Further research on three methods


===Semester 2, Week 3===
===Semester 2, Week 3===


====Jie Dong====
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# We were introduced to Matthew and François-Pierre Huchet who are also participating in this project in Monday's meeting.
# Complete the Gantt Chart, Work Breakdown Structure, Milestones, Budget and risk analysis with the other team members
# Came up with draft(first whole draft) of proposal presentation slides. Discuss about the role of each person.
# Modifications on the slides of presentation
# Send slides to Brian and Matthew for feedback
# Prepare the presentation
# modify our slides
# Introduce the Common N-grams
# Presentation on Thursday


'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Do more researches for three methods and SVM
# Research on SVM classifier for the algorithm Common N-grams used
# Prepare for stage 1 design document
# Start to design the Common N-grams
# Make stage one progress report template


====Leng Tan====
====Kai He====
'''Progress and Status This Week'''


# Modify the slides after getting a feedback from Brian
# Prepare the presentation this week
# Identity classifiers used with the algorithms
# Plan the upcoming goal for the proposal algorithms
# Start to design the method: Maximal Frequent Word Sequence
''' Plan and Goals for Next Week '''
# Have a detail review on the method of Maximal Frequent Word Sequence
# Understand the classifier of Naïve Bayes
# Prepare the stage one progress report
====Zhaokun Wang====
'''Progress and Status This Week'''
'''Progress and Status This Week'''


# rough draft slides on the past research have been done for the propose seminar.
# Discuss about proposal slides with Brian.
# a comparison list of the different technique is done.
# Modify the slides.
# start research on SVM that is to be added in the slides with the different technique
# Present proposal seminar.
# had a meeting with supervisors, and was introduced to Dr Matthew.
# focus 100% on the propose seminar.


'''Plan and Goals for Next Week'''


# have a more detailed review on the 3 methods.
''' Plan and Goals for Next Week '''
# read the criteria for the stage 1 design document.
 
====Tien-en Phua====
'''Progress and Status this week:'''  
# Prepare for project proposal
# Developed gantt chart, project budget and risk analysis
# Developed slides for milestones and controvesy
# Research on SVM (Support Vector Machine)
# Gain a better understanding on Word Frequency, WRI and Trigram Markvo


'''Plan and Goals for new week:'''
# Further researches about methods.
# Proceed to develop Stage 1 Design Document
# Prepare for stage one report
# Understand SVM
# Develop Work Breakdown Structure
# Delegate task to individual members
# Read up on the other 4 reports


===Semester 2, Week 4===
===Semester 2, Week 4===


====Jie Dong====
====Yan Xie====
'''Progress and Status this week:'''
# Work on the method of Common N-grams by using Java
# Fully read paper of the algorithm and classifier
# Discuss the design of Common N-grams with the other members
# Delegate tasks of the stage one progress report to individual members


'''Plan and Goals for new week:'''
# Complete parts of Executive Summary, Previous Studies, Coding Requirements and Tasks on Stage Two Report on the stage one progress report
# Modify Work Breakdown Structure, Risk Assessment, Milestones, Monitoring Scheme and Proposed Budget
# Complete writing on Common N-grams and SVM
# Write up the draft of the stage one progress report and send it to supervisors for feedback
# Modification on stage one progress report until deadline
====Kai He====
'''Progress and Status This Week'''
'''Progress and Status This Week'''
# In this project, we plan to have each person working on one method -- I am working on Trigram Markov model
# Read past reports for trigram Markov information
# Make stage 1 design document template
# Write project aim, background, and project approach in design document


'''Plan and Goals for Next Week'''
# Researches on the method of Maximal Frequent Word Sequence have completed
# Modify the design document draft
# Coding on Maximal Frequent Word Sequence
# Send to supervisors for feedback
# Have a meeting with the other members to delicate the tasks of the stage one progress report
# More modification
# Write Project Background and Significance, Technical Background, Motivations and Key Requirements of the stage one progress report
# Prepare a tutorial on SVM for other group members
# Modify the stage one report with the criteria
# Grammar checking
 


====Leng Tan====
''' Plan and Goals for Next Week '''


'''Progress and Status This Week'''
# Coding on Maximal Frequent Word Sequence
# Complete my tasks on stage one report
# Send the draft to supervisors
# Modify and format


# research on the 3 methods have completed.
====Zhaokun Wang====
# fully read and understood the criteria for stage 1 design document.
'''Progress and Status This Week'''
# have a brief meeting with group members to delicate the tasks in preparing the stage 1 design document.


# Test previous methods.
# Compared with previous researches, clarity and identify methods and classifiers we use.
# Processing stage one report.


'''Plan and Goals for Next Week'''
''' Plan and Goals for Next Week '''


# do a rough draft on the tasks that is allocated.
# Finish stage one report.
# do a layout design for the document.
# Allocate the report roles for each group members.


====Tien-en Phua====
===Semester 2, Week 5===
'''Progress and Status this week:'''
# Develop Work Breakdown Structure
# Identity tasks required for Stage 1 Design Document
# Broken down task and assigned to each member
# In the process of development of Stage 1 Design Document
# Further research on SVM and Word Frequency


'''Plan and Goals for new week:'''  
====Yan Xie====
# Complete write up on Word Frequency and SVM
'''Progress and Status this week:'''
# Complete Stage 1 Design Document
# Done my allocated parts of the stage one report
# Coding and further research on Word Frequency
# Attend a group weekly meeting within the team and discuss uncompleted sections
# Read up on the other 4 reports
# Help formatting
# Send the report draft to supervisors
# Modify the report after getting feedback from supervisors


===Semester 2, Week 5===
'''Plan and Goals for new week:'''
# Develop the method of Common N-grams
# Read papers
# Learn to use SVM


====Jie Dong====
====Kai He====
'''Progress and Status This Week'''


'''Progress and Status this week:'''
# Finish Project Background and Significance, Technical Background, Motivations and Key Requirements
# Write Input and Output Specifications, and Testing and Verification
# Help to write the part of Project Management
# Grammar checking and formatting
# Modification on the stage one progress report after getting feedback from supervisors
# Done the final version of the stage one progress report and submit
# Coding on Maximal Frequent Word Sequence


# Done abstract, project aim, background and significance
# Done description of data extraction part for Trigram Markov model in design document
# Feedback from supervisors on design document
# Final modification on design document
# Format the design document on wiki


'''Plan and Goals for Next Week:'''
''' Plan and Goals for Next Week '''
# Design on Trigram Markov model
# learn to use SVM
# a bit coding on trigram Markov model


====Leng Tan====
# Coding on Maximal Frequent Word Sequence
# Have a meeting with the other members discussing the upcoming goals
# Review papers


'''Progress and Status this week:'''
====Zhaokun Wang====
'''Progress and Status This Week'''


# Done Literature Review of design document
# Allocate stage one-report roles.
# Done description of data extraction part for WRI in design document
#Allocate research method: common N-gram for me.
# Done project approach and milestone for design document
# Allocate classifier method: dissimilarity calculation for me.
# added modified WBS in appendix
# Modify stage one report after feedback.
# done initial check and compilation of Design document


'''Plan and Goals for Next Week:'''


# start do rough design for WRI of data extraction in java
''' Plan and Goals for Next Week '''
# read SVM


==== Tien-en Phua ====
# Coding and developing N-gram
'''Progress and Status this week:'''
# Researching on dissimilarity
# Completed design document
#* Project Requirements
#* Description of data extraction of Function Word Frequency analysis
#* Project Budget
#* Background and Significance of Hebrews
#* Edited Gantt Chart, WBS to synchronise
#* Edited and grammar check etc
# Basic layout of software design for data extraction algorithm
# Wiki page
'''Plan and Goals for Next Week:'''
# Commence programming of algorithm using Java
# Read up on SVM


===Semester 2, Week 6===
===Semester 2, Week 6===


====Jie Dong====
====Yan Xie====
 
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Research on Trigram Markov model
# Read the papers of the algorithm of Common N-grams
# Two models are proposed:
# Have a big structure of programming Common N-grams
#* Simple Trigram Markov model: only consider the effect of trigram in the text
# Review paper of SVM
#* Potential problem with first model: sparse data, new trigram appears in the test text, lead to poor cross entropy
# The classifier SVM – still consider how to use the produced output text file as the input of the SVM
#* Second model: Hidden Markov model on trigram: Not only count on trigram, but also unigram and bigram effects are taken into consideration. The transition probability is consisted from all three probabilities.
# Participate the group meeting
# The existence of punctuation and uppercase letter should be considered for text written in English.
# Programming on text file input and exception handle in JAVA


'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Discuss the models with supervisor
# Discuss the code with the team
# SVM problem
# Coding on Common N-grams
# Programming on first model
# Design SVM


====Leng Tan====
====Kai He====
'''Progress and Status This Week'''


'''Progress and Status this week:'''
# Research on Maximal Frequent Word Sequence
# Done a design for the WRI code after discussion with group members.
# Develop the programing on Maximal Frequent Word Sequence
# written about 50% of the code for data extraction using WRI.
# Debugging
# read a bit on SVM but still don't understand it.
# Help the other members coding
 
''' Plan and Goals for Next Week '''
 
# Complete about 30% - 40% of the code for data extraction using Maximal Frequent Word Sequence
# Discuss classifiers
 
====Zhaokun Wang====
'''Progress and Status This Week'''
 
# Learning and coding on N-gram
# Debugging
 
''' Plan and Goals for Next Week '''


'''Plan and Goals for new week:'''
# Discussing within team about coding  
# finish the coding for WRI.
# Design classifier method
# try to get help for SVM.


====Tien-en Phua====
===Semester 2, Week 7===


====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Finish the design algorithm code in java for word function frequency (pseudo - code).
# Discuss the Common N-grams problems with the other members
# Start implementing the algorithm code.
# Finish about 50% of the code for data extraction using Common N-grams
# Code have been halfway done.
# Have a group meeting with the other two members reporting my current progress of extraction method of Common N-grams
# Introduce the stage two report


'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Finish coding.
# Continue coding of Common N-grams
# Discuss about SVM problems.
# Participate the meeting about stage two report with the other members
# Try to figure out how to use SVM function in MATLAB


===Semester 2, Week 7===
====Kai He====
'''Progress and Status This Week'''
 
# Review the paper of
# Algorithm for Maximal Frequent Sequences in Document Clustering
# Experimenting with Maximal Frequent Sequences for Multi-Document Summarization
# Discovery of Frequent Word Sequences in Text
# Done 30% of the code for data extraction using Maximal Frequent Word Sequence
# Review the paper of Augmenting Naïve Bayes Classifiers with Statistical Language Models
# Review the criteria of stage two report


====Jie Dong====
''' Plan and Goals for Next Week '''


'''Progress and Status this week:'''
# Coding and Debugging
# Reading chapter about Hidden Markov Chain of "Statistical language learning"
# Discuss implementation of output of data from Maximal Frequent Word Sequence to Naïve Bayes Classifiers
# Came up with my own test text to verify my code is working properly
# Prepare the stage two report
# Meeting with Brian discuss my current work, the current approach does not work efficiently


'''Plan and Goals for new week:'''
====Zhaokun Wang====
# The previous algorithm only considers effect of the trigram words. Result for a test paragraph contains a lot useless information, which about 70% of trigrams only appear once. Information which is worth using in classification is just about 10%. By extracting common trigrams from several test texts, few of them left. Hence, another enhanced model, in which unigrams and bigrams are also taken into consideration, will be tested in the following week.
'''Progress and Status This Week'''
# SVM will also be used to test the result in coming week. Investigating how to use SVM functions in MATLAB, svmtrain and svmclassify (Bioinformatics toolbox)
# Peer review assessment
====Leng Tan====


'''Progress and Status this week:'''
# Group meeting, discussing with other team members.
# Finish the Java coding for WRI technique in data extraction algorithm.
# Coding on N-gram
# Tested and verified that the code is working properly using a small test file. (text file with only few sentences)
# Structuring the stage two report
# Have a meeting with Brian discussing on the SVM input and output.


'''Plan and Goals for new week:'''
# Figure out SVM.
# Test and try out SVM on matlab using small test files.


====Tien-en Phua====
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
# Completed coding for data extraction algorithm (DEA)
# Discuss implementation of output of data from DEA to SVM
# Analyse how other researches analyse their data


'''Plan and Goals for new week:'''
# Keep coding N-gram
# Modification and refining of DEA code
# Group meeting about stage two report
# Continue analysis of how other researches used this DEA for authorship attribution
# Begin to coding dissimilarity classifier
# Try applying data to SVM


===Semester 2, Week 8===
===Semester 2, Week 8===
====Jie Dong====
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Peer review assessment on the design document on "Audio assisted vision system"
# Coding of Common N-grams
# Discuss the project management
# Investigation on SVM in MATLAB
# Investigation on SVM in MATLAB
# Working on modified trigram model
# Participate a meeting discuss how to apply the generate data to the classifier


'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Test my result of java program with SVM
# Complete software coding v1.0 at the end of Week 11
# Start to write the stage two report
# Review SVM from previous attempt
 
====Kai He====
'''Progress and Status This Week'''
 
# Coding and Debugging on Maximal Frequent Word Sequence
# Further research on Naïve Bayes
# Discuss the Naïve Bayes Classifier with the other members
 
''' Plan and Goals for Next Week '''
 
# Write the project management of the stage two report
# Continue coding and debugging
# Weekly meeting with the other team members
 
====Zhaokun Wang====
'''Progress and Status This Week'''
 
# Coding N-gram
# Group meeting about stage two report
# Try to begin coding dissimilarity classifier
# Researching on dissimilarity classifier
 
 
''' Plan and Goals for Next Week '''


====Leng Tan====
# Write stage two report
# Group meeting
 
===Semester 2, Week 9===


====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Receive a stage 1 design document on "Audio Assisted Vision System for Visually Impaired People".
# Add some new classes on the code of Common N-grams
# The document was fully read and take noted on presentation and various other perspective.
# Code modification
# The document was reviewed and a formal peer review report was produced.
# Weekly meeting with the other team members to report the progress of Common N-grams coding
# Investigation on Matlab for SVM was halted for a moment due to the peer review report.
# Write parts of Project Objectives, Background, Algorithm Programming and Project Management on the stage two report
# Get feedback of the stage one progress report from Brian


'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Figure out SVM.
# Complete software coding v1.0 at the end of Week 11
# Test and try out SVM on matlab using small test files.
# Continue code modification
# Testing
 
====Kai He====
'''Progress and Status This Week'''


====Tien-en Phua====
# Done 60% of the code for data extraction using Maximal Frequent Word Sequence
# Help debugging the code of the common N-grams
# Report the code progress so far in the team meeting
# Set up the upcoming goals: Software Coding V1.0, Stage 2 Report Due, Software Testing V1.0 and Software Coding V2.0
# Start to design the training process and classification process using Naïve Bayes Classifier
 
''' Plan and Goals for Next Week '''
 
# Write the parts of Introduction, Objectives, Background, Algorithm Definition, Work Breakdown Structure, Milestones and Budgets on the stage two report
# Choose some simple text files to test
# Further research on the classifier of Naïve Bayes
 
====Zhaokun Wang====
'''Progress and Status This Week'''
 
# Coding and debugging N-gram
# Writing stage two report
# Changing a little bit progress about schedule
# Group meeting with group member and report the stages up to now
 
 
''' Plan and Goals for Next Week '''
 
# Writing the stage two report
# Developing on dissimilarity classifier
# Testing
 
===Semester 2, Week 10===
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Complete the coding of Data Extraction Algorithm. Able to load file, remove punctuations, create a new output file for Support Vector Machine input
# Done most code of the common N-grams
# Review Peer Document and did some research on the principles of echolocation performed by bats to understand the document
# Delete the unused inner classes
# Completed Peer Review on Audio Assisted Vision System For Visually Impair People
# Discuss SVM with the other team members


'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Apply the generated data by the data extraction algorithm to Support vector machine
# Complete software coding v1.0 at the end of Week 11
# Determine progress of project and review schedule.
# Figure out SVM
# Try to test the code using some simple text file
# Write the stage two report


===Semester 2, Week 9===
====Kai He====
====Jie Dong====
'''Progress and Status This Week'''
'''Progress and Status this week:'''
 
# Hidden Markov model is implemented using Java, and the program produces a table containing probabilities information for some common trigrams from some texts input. The problem with it currently is because I am feeding all words appeared in texts into the program, there are few common trigrams among certain number of input texts. For example, I have tried with total 20 input texts from two authors, the number of trigram they do have in common is just one. In this case, I also set the program to allow part of these texts to have common trigram and others just put zero probabilities for these trigrams, the result is still not efficient.
# Write the stage two report
# Read through Tails trigram description and code, I found that he simplified the method and extracted the key specification by deleting the non key words. By testing his idea using Java code, I found it does extract a lot more information than mine, however a question also raised to me is that whether it would reduce the accuracy of classification since it changes  original text to another. This simplification needs to be proved.
# Complete the coding of Maximal Frequent Word Sequence
# Produced result by extraction algorithm is fed into MATLAB SVM methods (svmtrain and svmclassify),it shows my extraction algorithm is not working properly. Sometimes, the predicted author for chosen texts are correct and sometimes are not. In term of SVM itself, it only supports classifying for two groups and multi-group classification produces error. In addition, they can only plot SVM structure for two dimensional data. Hence, more enhanced SVM toolboxes should be studied.
# Working on modified Maximal Frequent Word Sequence
# Test efficiency using different input texts


'''Plan and Goals for next week:'''
# GUI design
# Test efficiency using different groups of input texts
# Try another SVM toolbox from: http://asi.insa-rouen.fr/enseignants/~arakotom/toolbox/index.html


====Leng Tan====
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
# A basic SVM code which receives a text file input is produced.
# The SVM code will need 2 training data group and a number of test data group.
# The standardize format for the input to SVM was decided by team members.
# The input format will be in a MxN matrix where the first column will be the author and subsequent column is the data. (in my case, standard deviations)
# Initial data uses 20 standard deviation columns.


'''Plan and Goals for next week:'''
# Modify code of Maximal Frequent Word Sequence
# The SVM do predict the author wrongly and this need to be resolve.
# Design the Naïve Bayes classifier
# Might be due to insufficient train data.
# Report the progress in the team meeting
# Further testing is required.
# Continue write the stage two report
# Might consider implementing GUI.
# Need to have a meeting with supervisors on progress and GUI implementation (can combine together GUI of Java and Matlab?)


====Tien-en Phua====
====Zhaokun Wang====
'''Progress and Status this week:'''
'''Progress and Status This Week'''
# Research for statistical software for obtaining the covariance of data [http://www.statgraphics.com/ StatGraphics]
# Download and installed a choose software and attempts to operate the program
# Research on a book discussing the possible author of Hebrews [http://orders.koorong.com/search/product/view.jhtml?code=9780805447149 Nacsbt: Lukan Authorship Of Hebrews]
'''Plan and Goals for next week:'''
# Obtain the covariance of the data
# Check to see if data extraction algorithm produce similar results as Talis
# Produce code to "chop" all text file to a specific length for analysis
# Input data to SVM and observe the outcome
# Combine functions for analysis


===Semester 2, Week 10===
# Testing N-gram code and debugging
====Jie Dong====
# Writing stage two report
'''Progress and Status this week:'''
# Coding dissimilarity classifier
# Original JAVA program is re-built in a standard eclipse project
# Group meeting
# Delete Transition class, no longer used
# Change three classes (State, Gram, Record) to inner classes correspondingly
# Reduce original three main methods in separate class to only one in Driver class
# Move methods for User inputs to Driver class, including parameters and paths
# Add three header lines to Java program output: number of texts, number of disputed texts, number of trigram used
'''Plan and Goals for next week:'''
# Standardise three algorithms into one project folder
# Use same training data, unknown data to test three extraction algorithms
# Compare their accuracies in different situations(number of key words, number of texts,etc)


====Leng Tan====
'''Progress and Status this week:'''
# had a meeting with the supervisors and report on the progress of the project.
# SVM code is remain the same for the time being.
# A tabled results should be produced to compare the difference between each data extraction algorithm.
# the main idea of the progress report is discussed.


'''Plan and Goals for next week:'''
# A standardise template to combine all 3 data extraction algorithm was discussed.
# WRI code need to be slightly modified.
# need to plan the initial design for the GUI.


====Tien-en Phua====
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
# Modify code to accept multiple inputs
# Extract out federalist papers for testing on support vector machine using function word analysis
# Meeting with supervisors on Wednesday for progress updates and guidance on next step
# Commencement of progress report


'''Plan and Goals for next week:'''
# Finish coding on N-gram
# Produce a table of result displaying the accuracy of the algorithm with SVM Kernel function
# Coding on dissimilarity classifier
# Complete progress report, project background, project specification, progress thus far and project management
# Writing report  
# Combine the three algorithm together into a single driver file
# Group meeting to report N-gram coding
# Discuss and design possible implementation of a GUI


===Semester 2, Week 11===
===Semester 2, Week 11===
====Jie Dong====
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Update progress report
# Complete software coding v1.0 of the Common N-grams
# JAVA program modification:
# Using my own text to verify this code is working properly
#* Sort list of files read in according to their name order
# Compare using a small test file with a large test file
#* Replace manually parameter setup to automaticly read in data, form train set and testing set according to three header lines
# Begin by building large sets of training data and testing data by randomly collecting extracted features from Author Profiles on SVM
# Done the draft of the stage two report
 
'''Plan and Goals for new week:'''
# Modify the stage two report
# Submit the stage two report
# Use same training data, unknown data to test two extraction algorithms
 
====Kai He====
'''Progress and Status This Week'''
 
# Complete the draft of the stage two report
# Grammar checking and formatting
# The output of Maximal Frequent Word Sequence code is not proper, modification is needed
 
 
''' Plan and Goals for Next Week '''
 
# Delivery the stage two report
# Complete the code of Maximal Frequent Word Sequence
# Test the output
# Have a meeting discuss the upcoming goals


'''Plan and Goals for next week:'''
====Zhaokun Wang====
# Write a standard document to combine our java extraction program together
'''Progress and Status This Week'''
# Complete Progress report


====Leng Tan====
# Modify and finish N-gram
'''Progress and Status this week:'''
# Testing N-gram code using training texts
# Do progress report.
#Coding dissimilarity classifier
#Working on stage two report


'''Plan and Goals for next week:'''
# catch up on assignments and prepare for exams.


====Tien-en Phua====
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
# Update of progress report


'''Plan and Goals for next week:'''
# Modify stage two report
# Complete 4 upcoming assignment
# Using training data to test N-gram coding
# Prepare for power system quiz


===Semester 2, Week 12===
===Semester 2, Week 12===
====Jie Dong====
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Rough draft for java extraction program standard and send other group member the standard
# Submit the stage two report and send it to supervisors
# Modify progress report and upload to Wiki
# Report my individual work done so far
# Report the code of the common N-grams completed and tested
# Report the progress of SVM
# Discuss the upcoming goals with the other members


'''Plan and Goals for next week:'''
'''Plan and Goals for new week:'''
# Stop project for a period of time to prepare for exams
# Prepare for exams


====Leng Tan====
====Kai He====
'''Progress and Status this week:'''
'''Progress and Status This Week'''
# assigments due for this week is completed.
# Send my stage two report to supervisors
# Weekly meeting with the other team members to report the progress of the project
 
''' Plan and Goals for Next Week '''
 
# Stop project
# Work on exams
 
====Zhaokun Wang====
'''Progress and Status This Week'''


'''Plan and Goals for next week:'''
# Submit stage two report
# Stop project as exams are coming.
# Group meeting to report progress
# Coding on dissimilarity classifier


====Tien-en Phua====
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
# Completed all assignments due this week


'''Plan and Goals for next week:'''
# None (prepare about final exam)
# Need to prepare for exams. SWOT week next week.
* Project will "pause" till after exam period, 20 Nov 2010, thereafter the team will be working individually back in their home country and update each other via email


===Semester 1, Week 1===
===Semester 1, Week 1===


====Jie Dong====
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Had a small discussion with the team members and work on SVM.
# Review two algorithms and three classifiers
# Modify SVM program to support multi-group classification function
# Group members present individual report so far on the group weekly meeting
# Test the accuracy of the whole classifying program with English texts
# Work on coding SVM program
# Generate accuracy table with respect to three different variables: tolerance, number of key words and kernal function(linear, quadratic, rbf, polynomial)
# Check the Milestones for the upcoming goals
 
'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Discuss with supervisor about the performance of current program and suggest ways to increase accuracy
# Email supervisors to have a meeting reporting the progress of the report
# Apply interface developed by Joel
# Discuss the performance of the current progress
# Modify the SVM program
# Prepare the project description and images for project exhibition
 
====Kai He====
'''Progress and Status This Week'''


====Leng Tan====
# Meet with the team members discussing the classifiers
'''Progress and Status this week:'''
# Simplify the code of Maximal Frequent Word Sequence
# Brief discussion with team members on the project.
# Work on the Naïve Bayes classifier
# the english texts is used to test the accurancy of the program.
# Do some testing
# Try different kernel function of the SVM while testing the accurancy.


''' Plan and Goals for Next Week '''
''' Plan and Goals for Next Week '''
# Organize a meeting with the supervisors for updates.
# discuss with joel for a constant text length.
# try to combine the code.


====Tien-en Phua====
# Arrange a time meeting with supervisors
'''Progress and Status this week:'''
# Discuss the key methods used in Naïve Bayes with the team
# Conduct a brief meeting with team members to further evaluate on SVM.
# Modified program from using function word objects to use of arrays and arraylist instead. Improve resource management and performance time
# Modified program to take in large amount of data as input instead of a single file
# Modified program to create a new folder to store all temporary (or modified) data. Reduce the clutter in the parent folder
# Test program using the federalist papers


'''Plan and Goals for new week:'''
====Zhaokun Wang====
# Have a meeting with supervisors showing the results.
'''Progress and Status This Week'''
# Further testing
 
# Group meeting to report progress of project during the summer break
# Keep coding on dissimilarity classifier
# Do testing on training data
 
 
''' Plan and Goals for Next Week '''
 
# Plan to meeting with supervisor
# Modify and coding dissimilarity classifier


===Semester 1, Week 2===
===Semester 1, Week 2===
====Jie Dong====
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Met up with supervisors
# Confirm a meeting time with supervisors
# Applied trigram model algorithm on the 170 English text and test the accuracy of SVM for trigram Markov Model
# Complete a project description and image, also email to Braden
# Number of key words used in the test are 5,10,15,20,25,30,35,40,45,50
# Discuss SVM with the team members
# Four different kernel functions were used: Linear, Quadratic, rbf, polynomial. And it has been shown that Linear kernel function have the best performance among these four. However, the accuracy is still very low about 50%.
# Continue working on SVM
 
'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# The effect of punctuations in the text should be taken into consideration, such as "-" and "'"
# Meet up with supervisors
# Modified Trigram software
# Code modification
# Further testing
# Plan the upcoming goals within the team
====Leng Tan====
# Test programs using English text
# Start to prepare the exhibition and final seminar
 
====Kai He====
'''Progress and Status This Week'''
 
# Done half of the program of the Naïve Bayes
# Change the classes in the program
# Code modification
# Check the project description and image
# Have a brief meeting with the team members
 
''' Plan and Goals for Next Week '''
 
# Have a meeting with supervisors
# Develop software
# Prepare the exhibition and final seminar
 
====Zhaokun Wang====
'''Progress and Status This Week'''
 
# Group meeting within group
# Modify and coding dissimilarity classifier
# Working on project description and image
 
 
''' Plan and Goals for Next Week '''
 
# Meeting with supervisor
# Keep coding
# Prepare for the final seminar
 
===Semester 1, Week 3===
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Met up with supervisor
# Get feedback from meeting with supervisors
# applied algorithm on the 170 english text and test the accuracy of the SVM for WRI
# Consider the punctuation remove, lowercase conversion, space combination and word overlapping
# applied different kernel function and observe the different result
# Develop the java code of the Common N-gram
# develop on word count program for text
# Analysis the poor result from text with chapter numbers and titles
 
'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Modify the delete punctuation method in the interface (look at minutes report number 10th for specs)
# Complete the java code of the Common N-gram
# implement interface
# Test the 155 English text, 82 Federalist Paper and 27 Greek New Testament
# change the number of keywords (currently is 20, try 5,10,15,20,25 and observe the difference)
# start using the new testament as test data


====Tien-en Phua====
====Kai He====
'''Progress and Status this week:'''
'''Progress and Status This Week'''
# Had meeting with supervisors
 
# Applied algorithm to 170 English text
# Have a meeting with supervisors to discuss our project’s progress.
# Applied algorithm to 85 Federalist paper
# Consider how to realize overlapping detection using colors in Java.
# Monitor project progress and re-evaluate the project milestone and timeline
# Continue developing the Maximal Frequent Word Sequence Algorithm
# Develop software for chopping text
# Start preparing the final Seminar in week 6.
# Develop software to count total words of text and also the number of occurrence of each word for better text analysis
 
''' Plan and Goals for Next Week '''
 
#       Finish coding the Maximal Frequent Word Sequence Algorithm
#       Have a draft for the final seminar.


'''Plan and Goals for new week:'''
====Zhaokun Wang====
# Identify the reason for in-correct classifications
'''Progress and Status This Week'''
# Further testing to ensure the correct operation
# Study Greek alphabets


===Semester 1, Week 3===
# Getting feedback from supervisor
====Jie Dong====
# Fixing on N-gram (suggestion from supervisors)
'''Progress and Status this week:'''
# Group meeting with team members
# Get rid of the concept of tolerance.
# Considering the meaning of punctuations appeared in the English texts, especially "-" and "'".
# Content which are not written by author should be removed before extraction, such as chapter number and title.
# Test the effect of above modification




'''Goals Next Week'''
''' Plan and Goals for Next Week '''
# Prepare test data using Federalist Paper
# Prepare test data using Greek text


====Leng Tan====
# Keep on dissimilarity classifier
'''Progress and Status this week:'''
# Finish fixing N-gram
# Developed a program to count the total number of words that contained "-" and "'"
# Implemented interface made by Joel
# Modified the WRI method and change the threshold of the number of keywords.
'''Plan and Goals for new week:'''
# Try to improve the accuracy.
====Tien-en Phua====
'''Progress and Status this week:'''
# Analyse the results for the Federalist and 170 English Text
# Continue developing auxiliary software (ie CountWord program, Punctuation program)
# Research on ways to balance the training data to SVM
'''Plan and Goals for new week:'''
# Continue testing on Federalist and 170 English Text
# Aim to achieve an 70% accuracy
# Standardize the training data to SVM


===Semester 1, Week 4===
===Semester 1, Week 4===
====Jie Dong====
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Run program modified last week on English data set
# Engage in removing all chapter numbers and titles
# Varying threshold and size of training data
# Add ranking method in the program
# Achieve a classification accuracy of around 80%
# Finish the code of Common N-gram
# Help group member to prepare for Federalist data set
# Run the completed program on 155 English text, 82 Federalist Paper and 27 Greek New Testament
# Draft the structure of the final seminar PPT


'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Study the cause of unsatisfactory classification accuracy and try to improve it
# Analysis the output of tested text and consider removing tail and setting threshold in the big size of training data
# Perform similar tests on Federalist Paper
# Discuss the tested result with the group members
# Discuss results with other group member, and see their algorithm performance
# Prepare the slides of final seminar with the group members


====Leng Tan====
====Kai He====
'''Progress and Status This Week'''
 
# Maximal Frequent Word Sequence code is completed to combine features for different threshold n.
# Remove titles and redundant information from the allocated 150 English corpus.
# Generate extracted features from the text corpuses.
# A first draft PowerPoint is completed for the final seminar.
# Research on the overlapping problem and find it cannot be done using Java since the text corpuses are plain texts, they do not support color highlighted.
 
''' Plan and Goals for Next Week '''
 
#      Finish coding the Naïve Bayes classifier to take multiple input files.
#      Assemble the PowerPoint and start practicing.
 
====Zhaokun Wang====
'''Progress and Status This Week'''
 
# Finish coding N-gram
#Removing unnecessary marks on the testing texts
# Run all texts using N-gram code
# Group meeting about final seminar
# Finalize dissimilarity classifier
 
''' Plan and Goals for Next Week '''
 
# Prepare for final seminar
# Done running on texts using N-gram
# Compared with training data, and analysis tested texts output
 
===Semester 1, Week 5===
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Had a meeting with supervisors.
# Set threshold in the output of tested test
# English text achieves only around 25-30%.
# Analysis the input format of SVM
# Study the New Testament.
# Work on preparing final seminar
 
'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Try and find the Greek file for the new Testament.
# Send the draft of PPT to Brian
# try use Federalist Text.
# PPT Slides modification
====Tien-en Phua====
# Prepare the presentation with the group members
'''Progress and Status this week:'''
 
# Develop a method of normalizing text.
====Kai He====
# Run test on 170 English Text. Obtained a 100% accuracy
'''Progress and Status This Week'''
# Run test on Federalist Text. Obtained a 91% accuracy
 
# Naïve Bayes classifier code 80% modified. Have bugs in the code.  
# Group meeting to prepare for the final seminar.  
# PowerPoint slides are added to one, roles and tasks are allocated for each member.  
 
 
''' Plan and Goals for Next Week '''


'''Plan and Goals for new week:'''
#       Finish debugging.
# Obtain a full set of Greek text
#       Send the completed PowerPoint to our supervisors for feedback.
# Chop Greek text accordingly
#       Prepare the final seminar
# Require further testing and analysis
# Apply Greek text accordingly


===Semester 1, Week 5===
====Zhaokun Wang====
====Jie Dong====
'''Progress and Status This Week'''
'''Progress and Status this week:'''
# Algorithm update:
#* The new version of trigram extraction algorithm inserts a "#" before a sentence and a "$" after a sentence. For example, there is a string "Today is a good day. I want to go to picnic."After TextEditor class, it will becomes "# Today is a good day & # I want to go to picnic $"
#* The motivation to this modification is because in an English text, each sentence exists relatively independent with each other. In terms of the example above, "......a good day. I want ......", it is not necessary to calculate the probability of apperance of "I" after the bigram "good day". Instead, it will be more significant to characterise an author's writing habit by knowing the probability of apperance of "I" in the start of a sentence, i.e. after the bigram "$ #". Likewise, the probability of a word appearing at the end of the sentence is important to know as well, that is "day $ #".In addition, by this method, we can discover how often is a specific word used in one sentence
#* To determine the beginning and end of a sentence, delimiter "." is used. In the future, with further study of English text characteristics, there might be more delimiters
# Generate classification results based on Federalist Text.


# Allocation the final seminar
# Finish dissimilarity classifier
# Fixing input format on dissimilarity classifier


'''Plan and Goals for new week:'''
# Perform more tests on different disputed texts
# Try another key words selection algorithm: based on occurring frequency


====Leng Tan====
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
Tried using Federalist Text.
best results give accuracy up to 70% when threshold = 10, data dimension = 25. this might be due to the short text length of the Federalist Text.
It is noted that WRI works better without normalization.
Found Greek File for the new Testament but not sure if is the right one.


'''Plan and Goals for new week:'''
# Modify PPT slides for final seminar
Do Federalist Text again with different disputed text.
# Preparing final seminar
Try redo English text again with normalization.
====Tien-en Phua====
'''Progress and Status this week:'''
# Analysis of federalist result as it is most similar in style to the new testaments text
# Namely that most of the federalist paper is written by Hamilton and likewise the new testaments is written by Paul with a few others written by different authors like Luke, John, Peter
# Comparison of results with other feature extraction algorithm
# After comparison of Function Word Analysis (FWA) and frequency occurrence of function words, the FWA proves to be a better algorithm as it produces more accurate results than frequency occurrence.
# Using FWA reduces the need to chop text and allowing lesser data to be "chunk" out.
'''Plan and Goals for new week:'''
# According to Gantt Chart, the implementation of controversies should take place next week.
# Implement both FWA and frequency occurrence to the KJV text
# Frequency occurrence should produce consistent results to Talis.


===Semester 1, Week 6===
===Semester 1, Week 6===
====Jie Dong====
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# With the modification last week, I re-ran the test on English data set
# Classify all authors’ output file after setting threshold when N equals from 2 to 10
# The classification accuracy increased to 85% - 90%. The highest was achieved when threshold = 30
# The Java code of Common N-gram update:
# Clear trend can be observed: increasing size of training data, accuracy increases, threshold firstly increase and then drop
#* eg. In 155 English Text, when n = 2, combine six authors’ features and create a master list.
# Perform tests on Federalist paper, but the accuracy is very low, at about 35% average
#* From N=2 to N=10, it gives 9 master lists. Find each author’s features with its frequency of occurrence in the master list and only list frequencies as one part of the input format of SVM.
# Discuss with supervisor and group member with the result on Federalist paper
# Also classify the output files of Federalist Paper and Greek New Testament
# Since function words analysis achieve a good performance, it was suggested that combine part of them to enhance the algorithm
# Finish the input format of SVM and write matlab code of SVM
 
'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Implement Trigram Markov model to select trigrams with "Golden Key words"
# Prepare the final report and is due on week 11
# Start to prepare form final seminar
# SVM code modification
# Achieve test results on King James Version
# Do some testing
====Leng Tan====
 
'''Progress and Status this week:'''
====Kai He====
# It was verified that the English text actually does not have any problem.
'''Progress and Status This Week'''
# Results shown was not favourable. The prediction was very inconsistent, achieving a low accuracy rate of 53% most of the time.
 
# It was suggested that the testing could be bias to Madison as only Madison text was taken as testing data.
# Naïve Bayes classifier debugged. Now consider how to present the output results.  
# Comparing with the earlier results using the English text, which involves 100 training data and 70 disputed text, the accuracy and consistency was even much lower.
# Have meeting with Brian to talk about our PowerPoint slides.  
# WRI might not be suitable for authorship detection.
# Finalize our PowerPoint.
# It was suggested to combine the Function Word Frequency developed by Joel to enhance the algorithm.
# More practice on the final seminar.  
# Did our final seminar on Friday.
 
''' Plan and Goals for Next Week '''
 
# Consider the structure of the final report.  
# Further test on the methods .
 
====Zhaokun Wang====
'''Progress and Status This Week'''


'''Plan and Goals for new week:'''
# Classify the output files of federalist paper and Greek New Testament
# Examine the WRI algorithm with further testing.
# Fixing problems about input format on dissimilarity classifier
# Implement enhanced version of the WRI by combining the algorithm with function word frequency.
# Classify all authors output files and setting N (2 to 10)
# Prepare the powerpoint slides for the seminar.
# Start to make the initial stage of the video.


====Tien-en Phua====
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
# Analysis results of FWA on KJV
# Analysis results of frequency occurrence on KJV
# Frequency occurrence produces consistent results with Talis listing down Paul, Barnabas, Luke and Matthew as the possible authors
# FWA produces a different results which will be discuss WHY.
# Discuss seminar structure with team
# Delegate task to team members for seminar
# Produce a uniform set of data for testing and results presentation


'''Plan and Goals for new week:'''
# Modify dissimilarity classifier
# Consolidate results from English text, Federalist text and King James Version
# Do testing
# Research on future improvement for FWA
# Conduct a detail literature review on the background of the new testaments


===Semester 1, Week 7===
===Semester 1, Week 7===
====Jie Dong====
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Implemented Trigram Markov model to select trigrams with chosen function words
# Amend the SVM matlab code
# Discovered that trigram containing chosen function words usually occurs more than once. Hence, selection threshold selection words similarly with function word selection method
# Test the 155 English Text, 82 Federalist Paper and 27 Greek New Testament, and produce the output of the dispute text
# Made draft of powerpoint slides on SVM and trigram part
# Gains the performance results and arrive to a conclusion (possible authors)
# Run classification test on King James Version of New Testament
# Meet with the other group members and discuss the results
# Finalise performance results for English text, Federalist and KJV
# Build the structure of the final report
 
'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Combine slides made by all group members and modifiy, slides should be finialised by next week
# Analysis the results of the Common N-gram and compare the classification accuracy of the other algorithm of Maximal Frequent Word Sequence with group members
# Practice makes perfect!!! :)
# Give some suggestions on potential modification
# Discuss results of our own extraction algorithms among group members, make suggestion on potential modification
# Start working on some parts of the final report
====Leng Tan====
 
'''Progress and Status this week:'''
====Kai He====
# Enhanced version of the WRI combined with function word frequency is done.
'''Progress and Status This Week'''
# Get the results and arrive to a conclusion.
 
# Prepare the powerpoint slides for the final seminar.
# Have a brief idea of how the final report will be structured.
# started recording some video footage for the final year project video.
# Capture test results for the final report.
# Meeting with the group.  
 
''' Plan and Goals for Next Week '''
 
# Modify the output file for using SVM.
# Evaluate results.
# Plan to upload things to this wiki
 
====Zhaokun Wang====
'''Progress and Status This Week'''
 
# Group meeting with group members
# Doing tests using dissimilarity method
# Test the 132 English Text, Federalist Paper and Greek New Testament, and produce the output of the dispute texts
# Layout for final report


'''Plan and Goals for new week:'''
# Examine the WRI algorithm with further testing.
# Implement enhanced version of the WRI by combining the algorithm with function word frequency.
# Complete the powerpoint slides for the seminar.
# Start to make the initial stage of the video.


====Tien-en Phua====
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
# Consolidate results from English text, Federalist text and King James Version
# Research on future improvement for FWA
## Calculate mean of a function word in a group of text by an author
## Calculate the standard deviation of a function word in a group of text by an author
## Consider calculating the probability of the occurrence of a function word by inputting the above parameters to SVM
# Possible authors of Hebrews namely Apollos, Clement, Paul, Barnabas, Luke and Peter


'''Plan and Goals for new week:'''
# Writing final report
# Complete presentation slides
# Analysis accuracy between two methods
# Practice presentation at least twice before seminar
# Assist team members in analyzing their results


===Semester 1, Week 8===
===Semester 1, Week 8===
====Jie Dong====
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Practice more times for the Final Year Seminar.
# Summary the results from algorithms of Common N-gram and Maximal Frequent Word Sequence
# Final seminar on Thursday
# Test the other text files (English New Testament) using Common N-gram algorithm and SVM classification
# Write the part of Common N-gram in the final report
# Have a meeting with the other group members discussing the upcoming goal
 
'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Run tests on Gospel of Luke and Acts of Apostles in KJV which were prepared by Joel.
# Analysis the English New Testament output gained from SVM classification and also compared with using the Maximal Frequent Word Sequence algorithm and the Naïve Bayes classification
# Write the final report


====Leng Tan====
====Kai He====
'''Progress and Status this week:'''
'''Progress and Status This Week'''
# Final Year Project Seminar
 
'''Plan and Goals for new week:'''
# Group meeting .
# Need to further discuss on pre-processing of the texts before implementing feature extraction algorithm.
# Obtained test results from the Federal list and New Testaments.
# Run tests on Gospel of Luke and Acts of Apostles in Koine Greek which were prepared by Joel.
# Finish coding in order to use SVM.
# Discuss with Joel on automated function word.
# Help debug  codes from other group members.
 
''' Plan and Goals for Next Week '''
 
# More tests and writings
# To upload things to wiki
 
====Zhaokun Wang====
'''Progress and Status This Week'''
 
#
#
#
 
''' Plan and Goals for Next Week '''


====Tien-en Phua====
#
'''Progress and Status this week:'''
#
# Project final seminar
#
'''Plan and Goals for new week:'''
# Discuss on the techniques for pre-processing of Koine Greek
# Run test on Gospel of Luke and Acts of Apostles in Koine Greek
# Obtain Koine Greek on possible authors of the letter to Hebrews


===Semester 1, Week 9===
===Semester 1, Week 9===
====Jie Dong====
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Meeting with supervisor
# Find the small part of the generated outputs of text files using Common N-gram need to modify and write few lines of code to achieve, e.g. Duplicate feature adding
# Run test on English Version of Gospel of Luke
# All the text file including 155 English texts, 82 Federalist Paper and 27 Greek New Testament, are needed to generate again, and process the output into SVM as input to perform the possibility
# Run test on English Version of Acts of Apostles
# Also, try testing the English version text of New Testament, which contains 27 texts, as well
# Analysis the gained results and compared with the algorithm of Maximal Frequent Word Sequence, and documentation
# Work on writing the final report due to two weeks left
 
'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Perform the same tests on Koine Greek version of New Testament
# Commence the section of SVM of the final report
# Planning for final report
# Email supervisors about the final report due to some queries
# Think of ideas on the video
# Discuss the youtube video and post coming to next three weeks


====Leng Tan====
====Kai He====
'''Progress and Status this week:'''
'''Progress and Status This Week'''
# Meeting with supervisor discuss on final report
 
# Run test on Gospel of Luke
# Compare results with Common N-gram.
# Run test on Acts of Apostles
# Upload and help format stage reports on the wiki page.
'''Plan and Goals for new week:'''
# Upload my weekly reports onto the wiki.
# Start planning for the final report
# Write final report.
# discuss with team on the video
# Methods code modification.
 
''' Plan and Goals for Next Week '''
 
#     Have a draft final report.
 
====Zhaokun Wang====
'''Progress and Status This Week'''
 
#
#
#
 
''' Plan and Goals for Next Week '''


====Tien-en Phua====
#
'''Progress and Status this week:'''
#
# Meeting with supervisor
#
# Run test on Koine Greek + KJV to determine the author of the Gospel of Luke
# Run test on Koine Greek + KJV to determine the author of the Acts of the Apostle
# Analysis results and discuss with team
'''Plan and Goals for new week:'''
# Commence final report writing and discussion
# Obtain set of text for Barnabas and Clement


===Semester 1, Week 10===
===Semester 1, Week 10===
====Jie Dong====
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Discuss with the team on the structure of the final report
# Write up the SVM in the final report
# Validate trigram Markov model using Koine Greek version of New Testament: Luke and Acts
# Meet up with the group for the final report
# Predict potential authors for the Letter to the Hebrews
# Consider the video and post
# Write up section for Support Vector Machine
 
# Start to write on Trigram Markov model
'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Find people to do a brief proof reading on what I write
# Work on the final report
# Complete the report
# Email to supervisors arranging a time to run test, report what we have done and predict the potential authors for the Letter to the Hebrews
# Prepare the post


====Leng Tan====
====Kai He====
'''Progress and Status this week:'''
'''Progress and Status This Week'''
# Discuss with team on the video.
 
# Discuss on the overall style of the report.
# Group meeting for the poster and video.  
# Completed a template to use for the final report.
# Write final report
# Write on past research.
 
# Write on project management.
''' Plan and Goals for Next Week '''
# Write on WRI.
 
'''Plan and Goals for new week:'''
# Plan to have a meeting with supervisors to report our progress.
# Proof read the report.
# Finish the final report.
# Complete the report.
# Upload the rest of my weekly reports to the wiki
# Prepare for final exhibition poster
 
====Zhaokun Wang====
'''Progress and Status This Week'''
 
#
#
#


====Tien-en Phua====
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
# Discussion with the team on the structure for the final report
# Write up background of the letter of Hebrews
# Write up background of the Bible
# Write up on project aim, approach and report strucutre
# Research on a standard set of corpus for team to work on
# Set of text of the Epistle of Barnabas and the First Epistle of Clement to Corinitians obtained in Koine Greek.
# Process the Koine greek text to beta code


'''Plan and Goals for new week:'''
#
# Complete final report
#
# commence planning for exhibition
#


===Semester 1, Week 11===
===Semester 1, Week 11===
====Jie Dong====
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Apply the common set of data to Trigram Markov model
# Output data analysis and documentation
# Complete testing results for Trigram Markov Model to write up results for final report
# Write sections of Common Ngram and SVM  
# Write section for Trigram Markov model and edit SVM part
# Complete the final report
# Prepare appendix section
# Prepare the poster
# Discuss with Clement about layout of poster
# Meet with supervisors and answer potential author who wrote the letter to Hebrews
# Made a draft for our poster
## Background, color theme, layout and detail section content
## Draft for Introduction and Controversy
## Flow diagram of our project appraoch


'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Complete our poster with other members
# Send the poster to Braden
# Prepare for exhibition
# Prepare the project exhibition
# Start recording video with the other group members
 
====Kai He====
'''Progress and Status This Week'''
 
#Write the project final report
#Have meeting with supervisors to present the project's outcomes
#Prepare poster and video for the exhibition
 
 
''' Plan and Goals for Next Week '''
 
#Finalise the poster and video
#Prepare the exhibition
 
====Zhaokun Wang====
'''Progress and Status This Week'''


====Leng Tan====
#
'''Progress and Status this week:'''
#
# complete results for WRI.
#
# Touch up on the final report.
# Preliminary discuss for the poster.
'''Plan and Goals for new week:'''
# Prepare the poster.


====Tien-en Phua====
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
# Prepare a common set of data for team to write up results for final report
* English Text, 156 text, 26 per author. 22 training, 4 disputed
* The Federalist papers, 82 text, 17 disputed, 65 training
* Kings James Version
* Koine Greek using Barnabas, Clement, John, Luke, Mark, Matthew, Paul, Peter
# Write up on results for the english text and discussion
# Write up on results for the federalist papers text and discussion
# Write up on results for the king james version and discussion
# Write up on results for the koine greek and discussion
# Write up abstract
# Prepare appendix


'''Plan and Goals for new week:'''
#
# Prepare for project exhibition
#
#


===Semester 1, Week 12===
===Semester 1, Week 12===
====Jie Dong====
 
====Yan Xie====
'''Progress and Status this week:'''
'''Progress and Status this week:'''
# Complete final exhibition poster
# Finish poster and send it to Braden
# Making flyers for final year exhibition
# Discuss the structure of the video within the team
# Start video editing for introduction and SVM process
# Finish video
# Present results at the project exhibition


'''Plan and Goals for new week:'''
'''Plan and Goals for new week:'''
# Complete the video
# Pop up document to the Wiki page
# Project closeout
 
====Kai He====
'''Progress and Status This Week'''
 
#Send poster to Braden
#Make video
#Demonstrate the project's outcomes at exhibition
#Project closeout
 
''' Plan and Goals for Next Week '''
 
#Upload document to project wiki page
 
====Zhaokun Wang====
'''Progress and Status This Week'''
 
#
#
#


====Leng Tan====
''' Plan and Goals for Next Week '''
'''Progress and Status this week:'''
# Complete the poster with the team
# Make the flyers
# Upload the final report to wiki format
# Start the video editing for results and future application


'''Plan and Goals for new week:'''
#
#Complete the video
#
#


====Tien-en Phua====
'''Progress and Status this week:'''
# start the video editing for SVM and three algorithms
# create new account for youtube


'''Plan and Goals for new week:'''
#complete video and upload to youtube


==See also==
==See also==
*[[Authorship detection: Who wrote the Letter to the Hebrews?]]
*[[Authorship detection: Who wrote the Letter to the Hebrews?]]
*[[Minutes of Meeting 2010: Who wrote the Letter to the Hebrews?]]
*[[Proposal Seminar 2011: Who wrote the Letter to the Hebrews?]]
*[[Critical design review 2010: Who wrote the Letter to the Hebrews?]]
*[[Final Seminar 2011: Who wrote the Letter to the Hebrews?]]
*[[Progress Report 2010: Who wrote the Letter to the Hebrews?]]
*[[Stage One Progress Report 2011: Who wrote the Letter to the Hebrews?]]
*[[Final report 2010: Who wrote the Letter to the Hebrews?]]
*[[Stage Two Progress Report 2011: Who wrote the Letter to the Hebrews?]]
*[[Youtube Video Presentation 2010: Who wrote the Letter to the Hebrews?]]
*[[Final Report 2011: Who wrote the Letter to the Hebrews?]]
*[[Exhibition Poster 2011: Who wrote the Letter to the Hebrews?]]
*[[Youtube Video Presentation 2011: Who wrote the Letter to the Hebrews?]]


==Back==
==Back==

Latest revision as of 02:58, 7 June 2012

Supervisors[edit]

Collaborators[edit]

2011 Students[edit]

Weekly progress and questions[edit]

Semester 2, Week 1[edit]

Yan Xie[edit]

Progress and Status this week:

  1. All team members had the first meeting with a Supervisor Derek, Co-supervisors Brian and Maryam
  2. The basic idea and various applications were introduced by Derek
  3. Discuss about previous attempts and further exploration on the meeting
  4. Research the topic about authorship detection and data mining
  5. Review the researches of past year students

Plan and Goals for new week:

  1. Further study on the past researches
  2. Search the proper algorithms
  3. Have a group meeting with the other members Kai and Zhaokun

Kai He[edit]

Progress and Status This Week

  1. Met with a Supervisor Derek, Co-supervisors Brian and Maryam.
  2. The supervisors introduced the concept of this project and discuss the outcome from last year project students
  3. Research on authorship detection
  4. Study the previous algorithms

Plan and Goals for Next Week

  1. Literature search training will be held next week
  2. Have a meeting with team members
  3. Research on various methods
  4. Read papers on authorship detection

Zhaokun Wang[edit]

Progress and Status This Week

1. Fist meeting with Derek and Brian and other group member Kai and Yan.

2. Derek and Brian introduced the outline and background about this project

3. Based on previous year researches, Derek gave some suggestion about the following research.

4. Derek passed the previous research resources to us.


Plan and Goals for Next Week

1. Read through and understand previous research report.

2. Research on controversy.

3. Research on various methods.

4. Prepare the proposal seminar.

Semester 2, Week 2[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Review past year’s three methods: word frequency, word recurrence interval and trigram markov model
  2. On-going researches
  3. Attend a literature search training session with the other members
  4. Discuss algorithms chosen for this project on the meeting
  5. Prepare the proposal seminar on week 3

Plan and Goals for new week:

  1. Modify the slides and send it to supervisors
  2. Prepare the presentation
  3. Analysis the chosen algorithms
  4. Discuss the project management with the other members next meeting

Kai He[edit]

Progress and Status This Week

  1. Attend the literature search training
  2. Identity the algorithms are used in this project
  3. Prepare the proposal algorithms and complete the slides for presentation
  4. Further reading on research papers

Plan and Goals for Next Week

  1. Set up the Work Breakdown Structure, Milestones, Gantt Chart and Project Budget
  2. Send the presentation slides to supervisors
  3. Prepare the presentation of proposal seminar next week
  4. Analysis the proposal algorithms used in this project
  5. Research and discuss the classifier
  6. Have a team meeting with the other members

Zhaokun Wang[edit]

Progress and Status This Week

  1. Abstract on proposal seminar.
  2. Allocate seminar role for each group member.
  3. Prepare outline PowerPoint slides.
  4. Identify the brief idea on the project.


Plan and Goals for Next Week

  1. Present proposal seminar.
  2. Identify the methods on project.
  3. Identify classifiers on project.

Semester 2, Week 3[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Complete the Gantt Chart, Work Breakdown Structure, Milestones, Budget and risk analysis with the other team members
  2. Modifications on the slides of presentation
  3. Prepare the presentation
  4. Introduce the Common N-grams

Plan and Goals for new week:

  1. Research on SVM classifier for the algorithm Common N-grams used
  2. Start to design the Common N-grams
  3. Make stage one progress report template

Kai He[edit]

Progress and Status This Week

  1. Modify the slides after getting a feedback from Brian
  2. Prepare the presentation this week
  3. Identity classifiers used with the algorithms
  4. Plan the upcoming goal for the proposal algorithms
  5. Start to design the method: Maximal Frequent Word Sequence

Plan and Goals for Next Week

  1. Have a detail review on the method of Maximal Frequent Word Sequence
  2. Understand the classifier of Naïve Bayes
  3. Prepare the stage one progress report

Zhaokun Wang[edit]

Progress and Status This Week

  1. Discuss about proposal slides with Brian.
  2. Modify the slides.
  3. Present proposal seminar.


Plan and Goals for Next Week

  1. Further researches about methods.
  2. Prepare for stage one report

Semester 2, Week 4[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Work on the method of Common N-grams by using Java
  2. Fully read paper of the algorithm and classifier
  3. Discuss the design of Common N-grams with the other members
  4. Delegate tasks of the stage one progress report to individual members

Plan and Goals for new week:

  1. Complete parts of Executive Summary, Previous Studies, Coding Requirements and Tasks on Stage Two Report on the stage one progress report
  2. Modify Work Breakdown Structure, Risk Assessment, Milestones, Monitoring Scheme and Proposed Budget
  3. Complete writing on Common N-grams and SVM
  4. Write up the draft of the stage one progress report and send it to supervisors for feedback
  5. Modification on stage one progress report until deadline

Kai He[edit]

Progress and Status This Week

  1. Researches on the method of Maximal Frequent Word Sequence have completed
  2. Coding on Maximal Frequent Word Sequence
  3. Have a meeting with the other members to delicate the tasks of the stage one progress report
  4. Write Project Background and Significance, Technical Background, Motivations and Key Requirements of the stage one progress report
  5. Modify the stage one report with the criteria
  6. Grammar checking


Plan and Goals for Next Week

  1. Coding on Maximal Frequent Word Sequence
  2. Complete my tasks on stage one report
  3. Send the draft to supervisors
  4. Modify and format

Zhaokun Wang[edit]

Progress and Status This Week

  1. Test previous methods.
  2. Compared with previous researches, clarity and identify methods and classifiers we use.
  3. Processing stage one report.

Plan and Goals for Next Week

  1. Finish stage one report.
  2. Allocate the report roles for each group members.

Semester 2, Week 5[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Done my allocated parts of the stage one report
  2. Attend a group weekly meeting within the team and discuss uncompleted sections
  3. Help formatting
  4. Send the report draft to supervisors
  5. Modify the report after getting feedback from supervisors

Plan and Goals for new week:

  1. Develop the method of Common N-grams
  2. Read papers
  3. Learn to use SVM

Kai He[edit]

Progress and Status This Week

  1. Finish Project Background and Significance, Technical Background, Motivations and Key Requirements
  2. Write Input and Output Specifications, and Testing and Verification
  3. Help to write the part of Project Management
  4. Grammar checking and formatting
  5. Modification on the stage one progress report after getting feedback from supervisors
  6. Done the final version of the stage one progress report and submit
  7. Coding on Maximal Frequent Word Sequence


Plan and Goals for Next Week

  1. Coding on Maximal Frequent Word Sequence
  2. Have a meeting with the other members discussing the upcoming goals
  3. Review papers

Zhaokun Wang[edit]

Progress and Status This Week

  1. Allocate stage one-report roles.
  2. Allocate research method: common N-gram for me.
  3. Allocate classifier method: dissimilarity calculation for me.
  4. Modify stage one report after feedback.


Plan and Goals for Next Week

  1. Coding and developing N-gram
  2. Researching on dissimilarity

Semester 2, Week 6[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Read the papers of the algorithm of Common N-grams
  2. Have a big structure of programming Common N-grams
  3. Review paper of SVM
  4. The classifier SVM – still consider how to use the produced output text file as the input of the SVM
  5. Participate the group meeting

Plan and Goals for new week:

  1. Discuss the code with the team
  2. Coding on Common N-grams
  3. Design SVM

Kai He[edit]

Progress and Status This Week

  1. Research on Maximal Frequent Word Sequence
  2. Develop the programing on Maximal Frequent Word Sequence
  3. Debugging
  4. Help the other members coding

Plan and Goals for Next Week

  1. Complete about 30% - 40% of the code for data extraction using Maximal Frequent Word Sequence
  2. Discuss classifiers

Zhaokun Wang[edit]

Progress and Status This Week

  1. Learning and coding on N-gram
  2. Debugging

Plan and Goals for Next Week

  1. Discussing within team about coding
  2. Design classifier method

Semester 2, Week 7[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Discuss the Common N-grams problems with the other members
  2. Finish about 50% of the code for data extraction using Common N-grams
  3. Have a group meeting with the other two members reporting my current progress of extraction method of Common N-grams
  4. Introduce the stage two report

Plan and Goals for new week:

  1. Continue coding of Common N-grams
  2. Participate the meeting about stage two report with the other members
  3. Try to figure out how to use SVM function in MATLAB

Kai He[edit]

Progress and Status This Week

  1. Review the paper of
  2. Algorithm for Maximal Frequent Sequences in Document Clustering
  3. Experimenting with Maximal Frequent Sequences for Multi-Document Summarization
  4. Discovery of Frequent Word Sequences in Text
  5. Done 30% of the code for data extraction using Maximal Frequent Word Sequence
  6. Review the paper of Augmenting Naïve Bayes Classifiers with Statistical Language Models
  7. Review the criteria of stage two report

Plan and Goals for Next Week

  1. Coding and Debugging
  2. Discuss implementation of output of data from Maximal Frequent Word Sequence to Naïve Bayes Classifiers
  3. Prepare the stage two report

Zhaokun Wang[edit]

Progress and Status This Week

  1. Group meeting, discussing with other team members.
  2. Coding on N-gram
  3. Structuring the stage two report


Plan and Goals for Next Week

  1. Keep coding N-gram
  2. Group meeting about stage two report
  3. Begin to coding dissimilarity classifier

Semester 2, Week 8[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Coding of Common N-grams
  2. Discuss the project management
  3. Investigation on SVM in MATLAB
  4. Participate a meeting discuss how to apply the generate data to the classifier

Plan and Goals for new week:

  1. Complete software coding v1.0 at the end of Week 11
  2. Start to write the stage two report
  3. Review SVM from previous attempt

Kai He[edit]

Progress and Status This Week

  1. Coding and Debugging on Maximal Frequent Word Sequence
  2. Further research on Naïve Bayes
  3. Discuss the Naïve Bayes Classifier with the other members

Plan and Goals for Next Week

  1. Write the project management of the stage two report
  2. Continue coding and debugging
  3. Weekly meeting with the other team members

Zhaokun Wang[edit]

Progress and Status This Week

  1. Coding N-gram
  2. Group meeting about stage two report
  3. Try to begin coding dissimilarity classifier
  4. Researching on dissimilarity classifier


Plan and Goals for Next Week

  1. Write stage two report
  2. Group meeting

Semester 2, Week 9[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Add some new classes on the code of Common N-grams
  2. Code modification
  3. Weekly meeting with the other team members to report the progress of Common N-grams coding
  4. Write parts of Project Objectives, Background, Algorithm Programming and Project Management on the stage two report
  5. Get feedback of the stage one progress report from Brian

Plan and Goals for new week:

  1. Complete software coding v1.0 at the end of Week 11
  2. Continue code modification
  3. Testing

Kai He[edit]

Progress and Status This Week

  1. Done 60% of the code for data extraction using Maximal Frequent Word Sequence
  2. Help debugging the code of the common N-grams
  3. Report the code progress so far in the team meeting
  4. Set up the upcoming goals: Software Coding V1.0, Stage 2 Report Due, Software Testing V1.0 and Software Coding V2.0
  5. Start to design the training process and classification process using Naïve Bayes Classifier

Plan and Goals for Next Week

  1. Write the parts of Introduction, Objectives, Background, Algorithm Definition, Work Breakdown Structure, Milestones and Budgets on the stage two report
  2. Choose some simple text files to test
  3. Further research on the classifier of Naïve Bayes

Zhaokun Wang[edit]

Progress and Status This Week

  1. Coding and debugging N-gram
  2. Writing stage two report
  3. Changing a little bit progress about schedule
  4. Group meeting with group member and report the stages up to now


Plan and Goals for Next Week

  1. Writing the stage two report
  2. Developing on dissimilarity classifier
  3. Testing

Semester 2, Week 10[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Done most code of the common N-grams
  2. Delete the unused inner classes
  3. Discuss SVM with the other team members

Plan and Goals for new week:

  1. Complete software coding v1.0 at the end of Week 11
  2. Figure out SVM
  3. Try to test the code using some simple text file
  4. Write the stage two report

Kai He[edit]

Progress and Status This Week

  1. Write the stage two report
  2. Complete the coding of Maximal Frequent Word Sequence
  3. Working on modified Maximal Frequent Word Sequence
  4. Test efficiency using different input texts


Plan and Goals for Next Week

  1. Modify code of Maximal Frequent Word Sequence
  2. Design the Naïve Bayes classifier
  3. Report the progress in the team meeting
  4. Continue write the stage two report

Zhaokun Wang[edit]

Progress and Status This Week

  1. Testing N-gram code and debugging
  2. Writing stage two report
  3. Coding dissimilarity classifier
  4. Group meeting


Plan and Goals for Next Week

  1. Finish coding on N-gram
  2. Coding on dissimilarity classifier
  3. Writing report
  4. Group meeting to report N-gram coding

Semester 2, Week 11[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Complete software coding v1.0 of the Common N-grams
  2. Using my own text to verify this code is working properly
  3. Compare using a small test file with a large test file
  4. Begin by building large sets of training data and testing data by randomly collecting extracted features from Author Profiles on SVM
  5. Done the draft of the stage two report

Plan and Goals for new week:

  1. Modify the stage two report
  2. Submit the stage two report
  3. Use same training data, unknown data to test two extraction algorithms

Kai He[edit]

Progress and Status This Week

  1. Complete the draft of the stage two report
  2. Grammar checking and formatting
  3. The output of Maximal Frequent Word Sequence code is not proper, modification is needed


Plan and Goals for Next Week

  1. Delivery the stage two report
  2. Complete the code of Maximal Frequent Word Sequence
  3. Test the output
  4. Have a meeting discuss the upcoming goals

Zhaokun Wang[edit]

Progress and Status This Week

  1. Modify and finish N-gram
  2. Testing N-gram code using training texts
  3. Coding dissimilarity classifier
  4. Working on stage two report


Plan and Goals for Next Week

  1. Modify stage two report
  2. Using training data to test N-gram coding

Semester 2, Week 12[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Submit the stage two report and send it to supervisors
  2. Report my individual work done so far
  3. Report the code of the common N-grams completed and tested
  4. Report the progress of SVM
  5. Discuss the upcoming goals with the other members

Plan and Goals for new week:

  1. Prepare for exams

Kai He[edit]

Progress and Status This Week

  1. Send my stage two report to supervisors
  2. Weekly meeting with the other team members to report the progress of the project

Plan and Goals for Next Week

  1. Stop project
  2. Work on exams

Zhaokun Wang[edit]

Progress and Status This Week

  1. Submit stage two report
  2. Group meeting to report progress
  3. Coding on dissimilarity classifier

Plan and Goals for Next Week

  1. None (prepare about final exam)

Semester 1, Week 1[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Review two algorithms and three classifiers
  2. Group members present individual report so far on the group weekly meeting
  3. Work on coding SVM program
  4. Check the Milestones for the upcoming goals

Plan and Goals for new week:

  1. Email supervisors to have a meeting reporting the progress of the report
  2. Discuss the performance of the current progress
  3. Modify the SVM program
  4. Prepare the project description and images for project exhibition

Kai He[edit]

Progress and Status This Week

  1. Meet with the team members discussing the classifiers
  2. Simplify the code of Maximal Frequent Word Sequence
  3. Work on the Naïve Bayes classifier
  4. Do some testing

Plan and Goals for Next Week

  1. Arrange a time meeting with supervisors
  2. Discuss the key methods used in Naïve Bayes with the team

Zhaokun Wang[edit]

Progress and Status This Week

  1. Group meeting to report progress of project during the summer break
  2. Keep coding on dissimilarity classifier
  3. Do testing on training data


Plan and Goals for Next Week

  1. Plan to meeting with supervisor
  2. Modify and coding dissimilarity classifier

Semester 1, Week 2[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Confirm a meeting time with supervisors
  2. Complete a project description and image, also email to Braden
  3. Discuss SVM with the team members
  4. Continue working on SVM

Plan and Goals for new week:

  1. Meet up with supervisors
  2. Code modification
  3. Plan the upcoming goals within the team
  4. Test programs using English text
  5. Start to prepare the exhibition and final seminar

Kai He[edit]

Progress and Status This Week

  1. Done half of the program of the Naïve Bayes
  2. Change the classes in the program
  3. Code modification
  4. Check the project description and image
  5. Have a brief meeting with the team members

Plan and Goals for Next Week

  1. Have a meeting with supervisors
  2. Develop software
  3. Prepare the exhibition and final seminar

Zhaokun Wang[edit]

Progress and Status This Week

  1. Group meeting within group
  2. Modify and coding dissimilarity classifier
  3. Working on project description and image


Plan and Goals for Next Week

  1. Meeting with supervisor
  2. Keep coding
  3. Prepare for the final seminar

Semester 1, Week 3[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Get feedback from meeting with supervisors
  2. Consider the punctuation remove, lowercase conversion, space combination and word overlapping
  3. Develop the java code of the Common N-gram
  4. Analysis the poor result from text with chapter numbers and titles

Plan and Goals for new week:

  1. Complete the java code of the Common N-gram
  2. Test the 155 English text, 82 Federalist Paper and 27 Greek New Testament

Kai He[edit]

Progress and Status This Week

  1. Have a meeting with supervisors to discuss our project’s progress.
  2. Consider how to realize overlapping detection using colors in Java.
  3. Continue developing the Maximal Frequent Word Sequence Algorithm
  4. Start preparing the final Seminar in week 6.

Plan and Goals for Next Week

  1. Finish coding the Maximal Frequent Word Sequence Algorithm
  2. Have a draft for the final seminar.

Zhaokun Wang[edit]

Progress and Status This Week

  1. Getting feedback from supervisor
  2. Fixing on N-gram (suggestion from supervisors)
  3. Group meeting with team members


Plan and Goals for Next Week

  1. Keep on dissimilarity classifier
  2. Finish fixing N-gram

Semester 1, Week 4[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Engage in removing all chapter numbers and titles
  2. Add ranking method in the program
  3. Finish the code of Common N-gram
  4. Run the completed program on 155 English text, 82 Federalist Paper and 27 Greek New Testament
  5. Draft the structure of the final seminar PPT

Plan and Goals for new week:

  1. Analysis the output of tested text and consider removing tail and setting threshold in the big size of training data
  2. Discuss the tested result with the group members
  3. Prepare the slides of final seminar with the group members

Kai He[edit]

Progress and Status This Week

  1. Maximal Frequent Word Sequence code is completed to combine features for different threshold n.
  2. Remove titles and redundant information from the allocated 150 English corpus.
  3. Generate extracted features from the text corpuses.
  4. A first draft PowerPoint is completed for the final seminar.
  5. Research on the overlapping problem and find it cannot be done using Java since the text corpuses are plain texts, they do not support color highlighted.

Plan and Goals for Next Week

  1. Finish coding the Naïve Bayes classifier to take multiple input files.
  2. Assemble the PowerPoint and start practicing.

Zhaokun Wang[edit]

Progress and Status This Week

  1. Finish coding N-gram
  2. Removing unnecessary marks on the testing texts
  3. Run all texts using N-gram code
  4. Group meeting about final seminar
  5. Finalize dissimilarity classifier

Plan and Goals for Next Week

  1. Prepare for final seminar
  2. Done running on texts using N-gram
  3. Compared with training data, and analysis tested texts output

Semester 1, Week 5[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Set threshold in the output of tested test
  2. Analysis the input format of SVM
  3. Work on preparing final seminar

Plan and Goals for new week:

  1. Send the draft of PPT to Brian
  2. PPT Slides modification
  3. Prepare the presentation with the group members

Kai He[edit]

Progress and Status This Week

  1. Naïve Bayes classifier code 80% modified. Have bugs in the code.
  2. Group meeting to prepare for the final seminar.
  3. PowerPoint slides are added to one, roles and tasks are allocated for each member.


Plan and Goals for Next Week

  1. Finish debugging.
  2. Send the completed PowerPoint to our supervisors for feedback.
  3. Prepare the final seminar

Zhaokun Wang[edit]

Progress and Status This Week

  1. Allocation the final seminar
  2. Finish dissimilarity classifier
  3. Fixing input format on dissimilarity classifier


Plan and Goals for Next Week

  1. Modify PPT slides for final seminar
  2. Preparing final seminar

Semester 1, Week 6[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Classify all authors’ output file after setting threshold when N equals from 2 to 10
  2. The Java code of Common N-gram update:
    • eg. In 155 English Text, when n = 2, combine six authors’ features and create a master list.
    • From N=2 to N=10, it gives 9 master lists. Find each author’s features with its frequency of occurrence in the master list and only list frequencies as one part of the input format of SVM.
  3. Also classify the output files of Federalist Paper and Greek New Testament
  4. Finish the input format of SVM and write matlab code of SVM

Plan and Goals for new week:

  1. Prepare the final report and is due on week 11
  2. SVM code modification
  3. Do some testing

Kai He[edit]

Progress and Status This Week

  1. Naïve Bayes classifier debugged. Now consider how to present the output results.
  2. Have meeting with Brian to talk about our PowerPoint slides.
  3. Finalize our PowerPoint.
  4. More practice on the final seminar.
  5. Did our final seminar on Friday.

Plan and Goals for Next Week

  1. Consider the structure of the final report.
  2. Further test on the methods .

Zhaokun Wang[edit]

Progress and Status This Week

  1. Classify the output files of federalist paper and Greek New Testament
  2. Fixing problems about input format on dissimilarity classifier
  3. Classify all authors output files and setting N (2 to 10)

Plan and Goals for Next Week

  1. Modify dissimilarity classifier
  2. Do testing

Semester 1, Week 7[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Amend the SVM matlab code
  2. Test the 155 English Text, 82 Federalist Paper and 27 Greek New Testament, and produce the output of the dispute text
  3. Gains the performance results and arrive to a conclusion (possible authors)
  4. Meet with the other group members and discuss the results
  5. Build the structure of the final report

Plan and Goals for new week:

  1. Analysis the results of the Common N-gram and compare the classification accuracy of the other algorithm of Maximal Frequent Word Sequence with group members
  2. Give some suggestions on potential modification
  3. Start working on some parts of the final report

Kai He[edit]

Progress and Status This Week

  1. Have a brief idea of how the final report will be structured.
  2. Capture test results for the final report.
  3. Meeting with the group.

Plan and Goals for Next Week

  1. Modify the output file for using SVM.
  2. Evaluate results.
  3. Plan to upload things to this wiki

Zhaokun Wang[edit]

Progress and Status This Week

  1. Group meeting with group members
  2. Doing tests using dissimilarity method
  3. Test the 132 English Text, Federalist Paper and Greek New Testament, and produce the output of the dispute texts
  4. Layout for final report


Plan and Goals for Next Week

  1. Writing final report
  2. Analysis accuracy between two methods

Semester 1, Week 8[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Summary the results from algorithms of Common N-gram and Maximal Frequent Word Sequence
  2. Test the other text files (English New Testament) using Common N-gram algorithm and SVM classification
  3. Write the part of Common N-gram in the final report
  4. Have a meeting with the other group members discussing the upcoming goal

Plan and Goals for new week:

  1. Analysis the English New Testament output gained from SVM classification and also compared with using the Maximal Frequent Word Sequence algorithm and the Naïve Bayes classification
  2. Write the final report

Kai He[edit]

Progress and Status This Week

  1. Group meeting .
  2. Obtained test results from the Federal list and New Testaments.
  3. Finish coding in order to use SVM.
  4. Help debug codes from other group members.

Plan and Goals for Next Week

  1. More tests and writings
  2. To upload things to wiki

Zhaokun Wang[edit]

Progress and Status This Week

Plan and Goals for Next Week

Semester 1, Week 9[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Find the small part of the generated outputs of text files using Common N-gram need to modify and write few lines of code to achieve, e.g. Duplicate feature adding
  2. All the text file including 155 English texts, 82 Federalist Paper and 27 Greek New Testament, are needed to generate again, and process the output into SVM as input to perform the possibility
  3. Also, try testing the English version text of New Testament, which contains 27 texts, as well
  4. Analysis the gained results and compared with the algorithm of Maximal Frequent Word Sequence, and documentation
  5. Work on writing the final report due to two weeks left

Plan and Goals for new week:

  1. Commence the section of SVM of the final report
  2. Email supervisors about the final report due to some queries
  3. Discuss the youtube video and post coming to next three weeks

Kai He[edit]

Progress and Status This Week

  1. Compare results with Common N-gram.
  2. Upload and help format stage reports on the wiki page.
  3. Upload my weekly reports onto the wiki.
  4. Write final report.
  5. Methods code modification.

Plan and Goals for Next Week

  1. Have a draft final report.

Zhaokun Wang[edit]

Progress and Status This Week

Plan and Goals for Next Week

Semester 1, Week 10[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Write up the SVM in the final report
  2. Meet up with the group for the final report
  3. Consider the video and post

Plan and Goals for new week:

  1. Work on the final report
  2. Email to supervisors arranging a time to run test, report what we have done and predict the potential authors for the Letter to the Hebrews
  3. Prepare the post

Kai He[edit]

Progress and Status This Week

  1. Group meeting for the poster and video.
  2. Write final report

Plan and Goals for Next Week

  1. Plan to have a meeting with supervisors to report our progress.
  2. Finish the final report.
  3. Upload the rest of my weekly reports to the wiki

Zhaokun Wang[edit]

Progress and Status This Week

Plan and Goals for Next Week

Semester 1, Week 11[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Output data analysis and documentation
  2. Write sections of Common Ngram and SVM
  3. Complete the final report
  4. Prepare the poster
  5. Meet with supervisors and answer potential author who wrote the letter to Hebrews

Plan and Goals for new week:

  1. Send the poster to Braden
  2. Prepare the project exhibition
  3. Start recording video with the other group members

Kai He[edit]

Progress and Status This Week

  1. Write the project final report
  2. Have meeting with supervisors to present the project's outcomes
  3. Prepare poster and video for the exhibition


Plan and Goals for Next Week

  1. Finalise the poster and video
  2. Prepare the exhibition

Zhaokun Wang[edit]

Progress and Status This Week

Plan and Goals for Next Week

Semester 1, Week 12[edit]

Yan Xie[edit]

Progress and Status this week:

  1. Finish poster and send it to Braden
  2. Discuss the structure of the video within the team
  3. Finish video
  4. Present results at the project exhibition

Plan and Goals for new week:

  1. Pop up document to the Wiki page
  2. Project closeout

Kai He[edit]

Progress and Status This Week

  1. Send poster to Braden
  2. Make video
  3. Demonstrate the project's outcomes at exhibition
  4. Project closeout

Plan and Goals for Next Week

  1. Upload document to project wiki page

Zhaokun Wang[edit]

Progress and Status This Week

Plan and Goals for Next Week


See also[edit]

Back[edit]