# **GaAs Asynchronous Morphological Processor for Interactive Mobile Telemedicine**

S Nooshabadi

School of Electrical Engineering, Northern Territory University, Darwin, NT

D Abbott

Centre for Gallium Arsenide VLSI Technology, University of Adelaide, SA

K Eshraghian

Centre for Very High-Speed Microelectronic Systems, Edith Cowan University, Joondalup, WA

J A Montiel-Nelson

Centre for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Spain

Personal communication systems of the future will augment the mobile phone concept to include multimedia services such as: data, e-mail, digitised speech, paging, facsimile, GPS, still images transmission, and eventually real-time video. For telemedicine applications, we have employed delay-insensitive asynchronous design techniques to implement a morphology processor for image coding using GaAs technology. For the implementation we introduce a modified version of the DCVSL family, in order to achieve ultra-fast data rates.

#### 1. Introduction

The development of an *Interactive Mobile Multimedia Personal Communicator* (IM<sup>3</sup>PC) requires the integration of many systems including real-time image and signal processors, computer vision, telecommunications, and high speed networks. As well as the realization of a hand-held unit that can transmit video, other visions that drive this technology are that of *interactive telebanking* (ITB), *interactive personal navigation* (IPN) and *interactive time keeping* (ITK).

With regards to video transmission, the specifications for NTSC/PAL dictate that the processor is required to process each image pixel in less than 100 ns, which corresponds to a bit rate of 80 Mb/s, with an 8-bit dynamic range for the image pixel. All conventional compression standards employ the block based Discrete Cosine Transform (DCT), as the core mathematical tool to code key features of a video sequence, to remove spatial redundancy in the image data. This has two drawbacks: (1) in terms of the number of arithmetic operations required, the DCT turns out to be the most demanding part of the coding or decoding process, and (2) for high compression ratios transform coding introduces a visible distortion known as blocking (as a consequence of the block based processing associated with transform coding), and furthermore the DCT induces blurring and ringing, (Gibbs phenomenon).

In the DCT we transform an image from the pixel image into the frequency domain. Each frame is divided up into  $8\times8$  pixel blocks. To each of these blocks, a two dimensional DCT is applied. For video conferencing applications based on CIF images,  $(352\times288)$  at 30 frames per second, we need to perform 47520,  $8\times8$  DCTs per second. This requires 73 million multiply/add operations per second.

Morphological processors operate on the image through a succession of simple morphological operations (simple addition/subtraction). Morphological filters do not induce ringing or blurring. This feature is particularly critical in applications, such as telemedicine, where image quality cannot be compromised. Moreover, they are attractive because of their low computational complexity and inherent parallelism. To address the complexity, as well as the speed requirement associated with the real time still image, image sequence coding and compression, a VLSI morphology processor based on GaAs technology was implemented.

## 2. Mathematical Morphology

Image morphology is a transformation that maps an original image into a processed image by means of a  $Structuring\ Element\ (SE)\ [2]$ . Four basic morphological transformations for gray-scale images are dilation, erosion, opening, and closing. Various morphological techniques have been suggested for the image coding and compression. One such technique is  $multiresolution\ analysis\ [3]$ . Multiresolution analysis is a multistage decomposition of an image into several sub-images each containing features of a different size. At each stage, first, the input image is processed using a opening-closing operation pair. Next the difference between the input and the output image, denoted as  $G_i$ , is calculated and stored. The output image then becomes the input to the next stage. The size of the SE for the

opening-closing operation pair progressively increases with each successive stage. Figure 1 depicts the multiresolution morphological coding process. The decomposed features  $\{G_1, G_2...G_{N-1}\}$ , contain features of a particular diameter. Moreover, these features have arbitrary orientation. The hashed area in Figure. 1 is the directional decomposer for  $G_k$ . We need N-1 such directional decomposers.

Assuming N=5, the morphological image coding scheme in Figure 1, would, approximately, perform 190 basic simple morphological operations (dilation and erosion) on the full image frame. This is in direct contrast to 1584 per frame, in computationally demanding DCT transforms.



Figure 1. Morphological Multiresolution Directional Coding Scheme

Several morphological processor engines have been reported in the open literature to date. Although these general-purpose cellular machines are flexible because of their programmability, they are relatively slow and do not provide the required real time image processing throughput. The *systolic array* architectures for the morphological processors provide a possible solution for real time applications [4].

# 3. Speed-Independent Asynchronous Systems

High end VLSI processors are facing with one major problem; clock skew. Absence of a global clock reduces the design cycle time for the layout generation and circuit simulation. Also asynchronous design provides design modularity, making system extensions possible without facing the difficult issues of global synchronisation [5]. On the other hand, GaAs is a promising technology to achieve ultra-fast processing speed for high end image processors. Therefore, implementation of the morphological processor in GaAs technology using dual-rail, speed independent 4-phase handshake protocol [6] appeared to be a logical path to pursue.

The high-level block diagram of an image processing system incorporating our asynchronous morphological processor is given in Figure 2. This system is connected to an external synchronous raster scanner and other image processing units. Synchronous units are connected to an external sampling signal.



Figure 2. An Asynchronous Morphological System.

Computation starts with a request signal and ends with a completion detection signal. Completion

detection is generated using very fast Differential Cascode Voltage Switched Logic (DCVSL) [7, 8] adapted for GaAs.

Figure 3 depicts 4-phase signalling request and acknowledge in the architecture.



Figure 3. The 4-Phase Protocol Convention.

In Figure 2 the synchronous units are controlled by a sampling signal. The inReq signal to asynchronous processor indicates that data is valid at the output of the raster scanner. Assuming that the sampling period is longer than the latency in the asynchronous processor for each sample, we do not need to connect outAck from the asynchronous processor back to the scanner.

### 4. Asynchronous Morphological Architecture

Figure 4(a) illustrates the architecture for the flexible 2-D asynchronous morphological engine with its associated 4-phase handshake control. Using asynchronous pipelining, the design minimises the Cycle Time (CT). This wave front array processor is configurable for, dilation or erosion operations on the fly. In Figure 4(a) DEPE and PR represent the Dilation/Erosion Processing Element and the Pipeline Register, respectively. An 8-bit wide data word-length is used for both the gray-scale image samples and SE coefficient values. Since DCVSL implementation is a differential logic family, both true and complement data lines are needed at the input to a DCVSL block, however, only true DCVSL outputs are necessary for transmission to the inputs of the registers.

Figures 4(b-e) show the details of a single PR stage with its associated handshake control circuitry. The PR consists of an *Image Register* (IR), a Structuring element Register (SR) and a *Max/Min Register* (MR), to hold the delayed image samples, structuring element coefficients and max/min outputs, respectively. To maintain speed independency of the asynchronous processor, we have also included, in the PR, latching detection circuitry.

Figures 4(f-h) depict the internal details of a single DEPE. As shown, it consists of two computation blocks with a pipeline stage inserted in between them. The pipeline registers, IR and Adder Register (AR), hold the delayed samples of image pixels and adder sum results, respectively.

## 5. Physical Design

The HSPICE simulation results for the various functional cells in the processor are presented in in Table 1. In order for the computation blocks to generate completion signals as well as logical operations we have employed a very fast, modified version of DCVSL [7, 8]. Figure 5 depicts the modified DCVSL logic gate. It is seen that the *Cycle Time* (CT) is 1.94 ns. This gives a throughput rate of more that 500 MHz or a bit rate of more than 4 Gbits/s for the processor.

Table 1. Propagation Delays Through the Data and Control Paths in H-GaAsIII Process.

| Delay Paths | Delay (ps) |      |      |
|-------------|------------|------|------|
|             | max.       | typ. | min. |
| Muller-C    | 171        | 159  | 137  |
| PR.         | 548        | 532  | 478  |
| Adder       | 440        | 378  | 349  |
| Mux         | 75         | 68   | 50   |
| CT          | 1989       | 1002 | 978  |



Figure 4. (a)2-D Asynchronous Morphological Processor Architecture, (b) Pipeline Registers Circuitry, (c) Image Data Register with Latching Completion Circuitry, (d) Serial-to-Parallel Shift Register for Holding the SE Value, (e) Handshake Circuitry, (f) Dilation-Erosion Processing Element, (g) Computation Block I, (h) Computation Block II, and (i) IPR Delay Element



Figure 5. Modified DCVSL Logic Tree

The transistor count for a systolic array  $7 \times 7$  dilation/erosion processors core for a  $7 \times 7$  square SE template is 61000. The data-bit rate is over 4 Gbits/s. The coding scheme of Figure 1 performs around 190 simple operations (dilation/erosion) on the image frame. Therefore, to perform coding of the multi-directional image features in real time requires 104 operations per pixel in 100 ns. Such high computation rates can be achieved with two of our 500 MHz (2 ns CT) throughput array processors working in parallel (ignoring the other computational overheads like coarse image improvement).

#### 6. Conclusions

A 2-D speed-independent asynchronous morphological processor for image coding has been designed and implemented using GaAs technology. The throughput and bit rates of 500 MHz and 4 Gbits/s, respectively, have been achieved, which makes it suitable for high speed image processing applications. The self-timed nature of the design alleviates the problems associated with the clock distribution present in its synchronous counterpart.

#### 7. Acknowledgement

The partial support provided by the Australian Research Council (ARC) is greatfully acknowledged.

#### References

- [1] J. A. Montiel Nelson, S. Nooshabadi, and K. Eshraghian, "Gallium Arsenide Based Fast Feed Through Logic (FTL)", in *Proce. IEEE Int. Sym. on Cir. & Sys.*, 1997.
- [2] E. R. Dougherty. Mathematical Morphology in Image Processing. Marcel Dekker, Inc., 1993.
- [3] Z. Zhou and A. N. Venetsanopoulos. Morphological Methods in Image Coding. In *IEEE IASSP*, volume 3219, pages 481–484, 1992.
- [4] A. Chihoub, M. LaValva, J. Avins, and J. Turlip. A Field Programmable Gate Array Implementation of a Systolic Architecture for a Morphology Engine. In *Proc. of the SPIE- The Int. Soc. for Opt. Eng.*, volume 2064, pages 95–106, 1993.
- [5] I. E. Sutherland. Micropipelines. Communications of ACM, 38(6):720-738, 1989.
- [6] G. Mago. Realisation Methods for Asynchronous Sequential Circuits. *IEEE Trans. on Comput.*, 20(3), 1971.
- [7] N. Weste and K. Eshraghian. Principles of CMOS VLSI Design. Addison-Wesley, 1985.
- [8] L. G. Heller and W. R. Griffin. Cascode Voltage Switch Logic: A Differential CMOS Logic Family. In ISSCC Digest of Technical Papers, Int. Solid State Conf. 1984, pages 95–106, 1984.