IJCSNS International Joumal of Computer Science and Network Security. VOL.7 No.3. March 2007 201
Secffing Telecommunication based on Speaker Voice as the R$lic
Monther Rateb Enayah and Azman Samsudin,
Universiti
Sains Malaysia, Penang, MalaysiaSummary
This paper
proposesa
techniqueto
generatea
publiccryptographic
key from
user,s voicewhile
speaking over ahandheld device. Making use
of the
human intelligence to identi$/au:henticate the voice of the speaker and therefore use the voice as the public key. The generated public key is used to encryptof
the transferred data over the open communication channel. The implementation of such a system on mobile phones resist any eavesdropon
phone calls, evenfrom
the service provider itself. The proposed protocol also eliminates the need for a trusted third party. This work first analyzes the impactof
using RSA and Difiie-Hellman as public
key
cryptographic methods with RC4 stream cipher as the proposed protocol. Then, the processing stepsfor
the speakeds signal whichis
used to produce the public key. Fiaally the study proposed the useof RSA,
Diffie-Hellman andRC4
algorithmsin the
proposedil:::t
to secure the communication between rwo mobile phone Key words:Crypngraplty, Secure Telecommunication, Voice Recognition. .
l. Introduction
Most mobile
phoneswere poor in
security.One of
the problemswith
these models is scanning, which means thatthird parties in the local area could intercept
and eavesdropin phone calls. This is especially true in
analogue mobile phones wereit is
easyto
eavesdrop by using radio scanners. More recentdigital
systems such asGlobal
Systemfor Mobile
Communications(GSM)
have tried to resolve these fundamental issues; however securityproblems still continue to persist. GSM uses
various cryptographic algorithmsfor
communication security such asA5/l
and A5/2. TheA5l1
and A5/2 stream ciphers areused to provide over-the-air voice privacy in
cellular telephone standard.A5il was
developedfirst and
usedwithin
Europe and the United States.A5/2
is weaker thanA5ll which is
usedin
countries that maynot
be able tosupport the infrastructure
necessaryfor A5/1.
Both algorithms have been recently crackedIl].
This paper proposes a new solution
to
secure mobilephone conversation by doing the encryption
anddecryption
processon the caller and the calle
sides.regardless of the cryptographic algorithm used by the local
network. By
usingthis
approachthe
user canavoid
theattacker from listening or intercepting to the voice calls. In
this paper RSA, Diffie-Hellman (DID and
RC4 cryptosystemswill
be usedto
ensure the securityof
the communication channel.The keys
here aredivided
intopublic key
and private key. Publickey is
generatedfrom
the speaker voice and the corresponding privatekey will
be considered as the
DH
private key.A
shared secretwill
be calculated
to
generate theinput key for
the RC4. RC4algorithm will
generatea
key-streamto complete tle
encryption and decryption process.
Using speaker speech
to
generate thepublic
key and further usedin
encryptionis
an areaof
great promisefor
security applications, the implementationof
such a systemon mobile
phones presentsits own unique
challenges.These
challengescan be identified as
enyironmentalconditions, microphone variability and mobile
phones computational limitation.While the task of securing the
communication channelsfor the mobile phones has been a topic of
substantial research, much
of
thework
has been centered on securing the local network. This research departsfrom securing the
channel between usersby
generating the cryptographicpublic key as they
speak.This paper
is organized asfollows: In
Section 2,we
analyze the impactof
RSA andDH
as public key cryptographic methods andRC4 as stream cipher. Then, Section 3 provides
anoverview of a basic
techniquefor
speaker recognition.Next in
Section4, we
describe the proposed frameworkfrom capturing the
speakerutterance, generating
thecryptographic keys and securing the
communication channel.Finally, in
Section5 we
showour
results and draws together concluding remarks on the research projectwith
the future work.2. Cryptography
Public-Key cryptography is the science of
usingmathematics
to encrypt and decrypt information. With cryptogaphy
storingor
hansmitting sensitive information becomes safer across insecure networkslike
the Internet.Cryptographic techniques
are divided into two
generictypes:
Symmetrickey and Asymmetric key.
Symmetrickey is
a conventional typeof crpto$aphy which is
also202
known
as secretkey
cryptographyl2l.The
samekey
isused
for
encryption and decryption process' Examplesof the symmetric key
cryptosystemsare Data
Encryption Standard(DES), Triple DES
and Advanced Encryption Standard(AES).
Speedof
computationsin
the symmetrickey
algorithmsis
an advantage, comparedto
asymmetric key algorithms. Asymmetric key algorithms which give an altemative wayof
securing data require a huge amountof
time to do the computation for encryption and decryption' Public key cryptography was introduced by
Whitfield Diffie
andMartin
Hellmanin
1975[3].
The term publickey
cryptogpaphyis a synonym for asymmetric
key cryptography. Public keys havetwo
separate keys that are mathematically connected,a public key which
encryptsdata, and a related private key for decryption. Public key is published to the public while the private key is being kept secretly. Some examples
for
public key cryptosystems are Elgamal, RSA,DH,
andElliptical
Curve Cryptography [2].2.I RSA
This is probably the most recognizable
asymmetric algorithm.RSA
was createdby
Ron Rivest,Adi
Shamir, and Leonard Adlemann
1977[a]. To
date,it
is theonly
asymmetricalgorithm in
widespread usethat is
usedfor private/public key generation and encryption.
The operation of the RSA is describedwith tull
detailsin
[5].Two
mathematical problemsplay
the importantde
for the RSA
cryptosystem[6]: the
problemof
factoringvery
large numbers, andthe RSA problem. The
integer factorizationproblem is the problem of fmding a
non-trivial
factorof
a composite almost prime number. When thesenumbers are very large, it
becomesdifficult
to factonze andtill now
noefficient algorithm is known
tofactor these huge almost prime numbers. The
RSA problemis simply
the taskof
taking e-th roots modulo a compositen,
tryingto
get the plaintextz
such thatme:c
mod n, where the RSA public key is e, and
r. An
attacker needsto
factorn
intop
and4,
and computes(p-l)(q-|)
which allows the determination of d from e.
Key
distributionin
RSAlike
other cipher algorithms needs to be secured against man-in-the-middle attack. The attackercan give a false identity to both
sides,if
theattacker intercepts
the
transmissions betweenthe
caller and the calle. Noneof
the partieswill
be able to detect the attacker presence. Defenses against such attacks are often basedon digital
certificatesor other
componentsof
apublic key infrastructure.
2.2
Diffie-Hellman Key
ExchangeThe concept
of DH
key exchangeis
commonly known asDH. DH
representsthe last names of the
inventorsWhitfield Diffie and Martin Hellman. The
method was introducedn
1976, andit
was thefirst
practical methodIJCSNS Intemational Joumal of Computer Science and Network Security, VOL.7 No.3, March 2007
for
agreeing on a shared secret key based on a secure key- exchangeprotocol over an
unsecured communications channel.DH
is not an encryption method rather thanit
is akey
exchange protocol.In [6] full
details on how theDH
concept works is presented.DH
generates a secret numberjust for
one bansaction.This is called a
sessionkey or a symmetric key.
Asmentioned before, all asymmetric key systems
areconsidered slow. If little amount of data is
beingexchanged, the shared secret may be used
to
encrypt the actual data.But when a huge
amountof
datais to
be passed betweenboth
sides,just like in
caseof
phone conversation, encryption requiresa
stream cipher system suchas A5ll, ASl2, FISH, SEAL or RC4. RC4 is
the mostly used stream cipher in such applications.The security
of
theDH
cryptosystem depends on the discrete logarithm problem. The protocol assumes thatit
is computationally infeasibleto
calculatethe
shared secretkey, K
=gx.,x'
modn , iven the two public
values(g"
rnodz) and (g'" -od ,) where z is a
suffrciently large prime.Breaking DH protocol is equivalent to
calculatediscrete logarithms under certain
assumptionsas
whatMaurer has
showedin t7l. The DH key
exchange is vulnerable to a man-in-the-middle atiack[6]. ln
this attack, an attackeris
placed betweenAlice
andBob. Alice
andBob will be
usedas a
conventionalterms referring
tocommon
charactersused in cryptography field.
The attackerfools Bob by
sendinghis public key to
Bob insteadof Alice public key. Bob will
transmithis
public key. Whereby, the attackerwill
changeit with
his publickey
and sendsit to Alice. At this
stage,tle
attacker andAlice will
agreeon a
secretkey. On
the other hand, the attacker andBob will
agreeon
another secretkey. After
this exchange, the attackersimply
decrypts any messages sent outby Alice or
Bob, and then the attackeris
able to read, insert and modifu them before encrypting againwith the other party public key. This vulnerability is
presentbecause DH does not authenticate the participants.
2.3FiC4
StreamCipher
RC4 is the most
widely
used stream cipher designedin
1987by Ron Rivest for RSA Security [6]. It is
avariable key size stream cipher with
byte-oriented operations. The algorithmis
based on the useof
random permutations.RC4 is a very fast
streamcipher
andit
isused in the SSL/TLS (Secure Socket
Layer/TransportLayer Secwity)
standardsthat
usedin the WEP
(WiredEquivalent Privacy) protocol. Full detail on how
RC4works can be found in [6]. The encryption is
doneby
applying XOR
operationon
abyte of
theplaintext with
one byte of the key-stream. Decryption process is the sameIJCSNS International Journal of Computer Science and Network Security, VOL.7 No.3, March 2007
to the encryption process but the same key-stream byte is XORed
wi&
the ciphertext instead.Many
approachestried to
attackRC4 but
noneof
these approaches is practical against RC4
with
large key length such as 128 bits or more[8].
Therefore, up to now RC4 is considered as a secure stream cipher.3. Speech Recognition
Biometrics uses biological information to verify
theidentity of a
person[9]. Biometric
recognition methods include: fingerprint scan, retina scan, face scan, and voice recognition. The choiceofvoice
recognition against othersis made
becausemost of biometric
techniques need complex equipments, and also someof
these techniquesneed the physical presence of the person such
asfingerprint and retina scanning. However,
voicerecognition can be done remotely like in the
caseof
phones,
giving more flexibility when dealing with
the phone.In order Jo
understandhow
speaker recognition works, we need to understand how voice is produced [9].Voice is simply created when air passes the larynx or other parts of the vocal tract. The vibration
of
the larynx creates an acoustic wave, especially the hum sound. This wave ismodified by the motion of the
palate, tongueand
lips.There are also other sounds that are created by other parts
of the vocal tract. The
uniquevoice
pattemswhich
areproduced by individuals depend on two
factors:physiological and behavioral characteristics.
The digitizing
processof the
humanvoice
beginsfrom
the voice producedby
the human. Thevoice is
an analogue signal. Analogue signal means continuous valueswithin
a time range. The analogue signal is converted into an electrical wave ordigital
signal,by
using devices such as microphone.Next, the
continuousdigital
signals are convertedinto a
discrete voltage values.This
process iscalled
sampling. Sampling measuresthe voltage of
the signal at regular time intervals.At
the highest level,all
speaker recognition systemscontain two main modules: Template Matching
and Feature Extraction [10]. Template Matching is the simplerand the most
accurateon some
cases.It works by
comparing the digitized voicewith
thedigitized
template basedon the
amplitudeof the voice signal over
many frequencies at various times over the entire periodof
theidentification
process.But
TemplateMatching
does not distinguish between the speech and the background noise.So
if
the registrationis
donewith
noise, the recognition processmust be
donewith the
same background noise again.Feature Extraction does not really use
any characteristicsof
the speech;it
takes thedigitized
signal and appliesit to
mathematical techniquesto
produce the203
results. These results do not describe the voice
in
physical termsbut
they can be usedto identif
the speaker voice.Feature Extraction
is
much betterto identifr
the speakerthan
TemplateMatching with weak signal
strength and backgroundnoise, due to the
mathematical techniqueswhich
isolates the mathematical featuresfrom
the speech.Hence Feature Extraction is prefened more than Template Matching
in
comparing voiceprints, andit
is implemented in the majority of voice identification systems.3.1
Speech FeatureExtraction
The aim of this module is to take the speech waveform and
convert it to
sometype of
parametric representationfor
further analysis and processing. This process is called the signal-processingfront
end.A
varietyofpossibilities
canbe
chosento perform
FeatureExtraction on the
speech signalto
recognizea
speaker, such asLinear
Prediction Coding (LPC)[ l],
Mel-Frequency Cepstrum Coefficients(MFCC) [11],
and others.MFCC is
the bestknown
and widely used.3.2 Mel-frequency Cepstrum Coefficients
MFCC
main purpose is to perform the same frmctionalityof the human ears [ 1]. MFCC
processor structure isillustrated in Fig. I using block
diagram. Eachstep
is discussedin [2]. Typically
the speech recordingis
doneat a sampling rate above 10000 Hz. This
sampling frequency was chosento
reduce the effectsof
aliasingin
the analog-to-digital conversion. Furthermore,
these sampled signals have theability
to captureall
frequencies upto
5kHz, which
cover most energyof
sounds that are generated by humans.Fig.
I
Block Diagram for MFCC [12].4. Software Based Voice Encryption Systems
There are many voice encryption systems available. Mostof
these cryptosystems make PC-to-PC phonecalls
and they are free for downloadinglike
PGPFone [13], Nautilus[1a]
and Speak Freely[5].
PGPFone has a userfriendly
204 IJCSNS Intemational Journal of Computer Science and Network Security,
VOL.7
No.3, March 2007interface and uses a selection
of
encryption schemes, such as 128bit CAST key,
168bit
Triple-DESkey or
192bit Blowfish key. Nautilus
dependsmainly on DH Key
Exchange. Speak Freely uses IDEA or DES.
Other cryptosystemslike Digital Voice
Protection(DVP)'
STUIII
(Secure TelephoneUnit,
GenerationIII)
and STE arehardware
based encryption
systemsand they are
notdesigned for
PDA's
or advanced cellular phones.There are cryptosystems that secure
cellulartelecommunication over the GSM network, such
as Snapshield[16] and
CryptoPhone[17].
Snapshield has developed Snapcell, aplug-in cellular
encryptionunit
towhich it
secures end-to-endGSM
communications' The problemwith
Snapcell is thatit
uses hardwareunit
which must be installed on the mobile phone and above that, this hardwareunit
can supportonly
some modelsof
Ericssonand Sony mobile phones. Snapshield uses DH
keyexchange
with
AESto
secure the channel. Snapshielddid not
publish the blueprint of their
design,in which
there might be back doorsfor their
algorithm. Furthermore, the attachedunit
also shortens the mobile phone batterylife
time.On the other
hand, CryptoPhonewas
designedby GSMK
CryptoPhones Company[17].
CryptoPhone takesthe
advantageof the high
processing performanceof recently
availablePDAs
andmobile
phonesto do
real- time voice encryption.Unlike
other productsin
the market CryptoPhone gave the details of their inner workings. The problemwith
this design is thatit
can be attacked by man-in-the-middle attack as mentioned previously in
DH.Attacker
cangive to both
sideshis public key
andwait them to
sendtheir public keys, which
enableshim
to generatetwo
shared secretkeys. The
attackercan
now analyzethe transmifted signal and may listen to
the conversation.This problem occurs
becauseboth
sides cannot authenticate each otherbefore start
transmitting.CryptoPhone solved
this
problemby
showingsix
digitskey on caller and calle mobile
phones. Whereby, both sides need to speak three marked digits from the six digits shown. The callerwill
speak his three digits and the callewill
checkif they
matchwith the digits shown on
hismobile
phone. The calle speak the other threedigits
andthe caller
checkif they
matchwith the
showndigits in
order to authenticate each other.5. Methodology
Equipped
with
the background describedin
the previoussections, this section gives an overview for
themethodology used to derive a cryptographic key
from
theuser
speech.The proposed methodology begins with capturing the speaker's
utterance,dividing the
voice samplesof
the utteranceinto
overlappingwindows,
andderiving the
user'skey from the
cepstrum coefficientsusing a feature descriptor. This
procedure enables the users authenticate each otherby
hearing each other voice and generating each other public key.The
next
goal afterthis
processingis to
construct a long enough cryptographic keyfor
each speaker. This keyis the
speaker'spublic key. Our proposed
method authenticatesthe
user'spublic key
valueswhich
makesthis methodology immune to man-inthe-middle
attackunlike today's
available products as have been discussed before. This methodology further discuss the generationof
the user's private key and the generation
ofthe
session keywhich is
used asan input for
producinga keysteam
to encrypt and decryptinformation in
real-time. Generatingthe private key is done iteratively in this
proposedmethodology
until
finding a suitable private key to be used.The
following
subsections coverin
detail the description ofthese steps.5.1
Speech ProcessingOur
proposedmethodology begins with capnring
the speaker utterance and turnsit into a
sequenceof
MFCC (acoustic vectors)using MFCC
speech processing steps.Continuous speech
signal is blocked into
framesof N
samples,
with
adjacent frames being separatedbyM (M
<M). The number
of
samples takenper
frameis
256(N
= 256)which is
appropriate numberto avoid
aliasing. The distance between framesis
100(M:
100).After
cutting the speech signalinto
frameswith
overlap, the outcome is a matrix where each column is a frame ofN
samples fromoriginal
speech sigrral.Next we
employWindowing
andFFT
processing stepsto
transformthe signal from
time domain into the frequency domain. The result is called the single's power spectrum. These processes together can be referred as Windowed Fourier Transform(WFT).
Finally,the power spectrum resulted from WFT process
is convertedinto
mel-frequency cepstrum coefficients afterusing one filter bank for each desired
mel-frequency component.5.2
Mapping
Framesto
FeaturesHaving
derivedthe
acoustic vectorswith N
ftames, the main targetnow
isto
define featuresof
these frames that are exactly the same when the same user speaks the same utterance. From these features, an m-bit feature descriptor is then derived. The approach introduced here isolates one feature in each data vector and generates one descriptorbit from
each feature, so each data vector can be used as onebit, N = m,
eventhough this may not be
necessary;because multiple data vectors could be used to derive one
bit.
sothat.l/>
m.The feature used
to
generate the feature descriptor 6from the data vectors l/(l) ... Z(N)
dependson
theIJCSNS Intemational Joumal of Computer Science and Network Securify,
VOLJ
No.3, March 2007amplitude values
of
the data vectors.In
this approach the i-th feature6 -g if
the amplitude valueis
negative, Q,*1
otherwise.To
map these featuresto a
feature descriptor, simply we need to test whether each feature is positive or negative. SeeEq.(l)
Urn
={o if
anplinde value<o (l) '
||
OtherwiseThe value
D(l
represents the position relative to theorigin
plane,so the value of 0
canbe
interpreted asthe
datavector value falls under the origin plane, while
b(i) indicates that data vectorfalls on
the planeitself or
thepositive
sideof
the plane.At
the end,the
complete b(i)represents the public key in binary digits; it
is recommendedto
have avery long key. If
thepublic
key wasnot
aprime
number thenext prime
numberwill
be taken as the public key. This method has theflexibility
to choosea variable key length. There are many
several features that can be implementedto
generatethe
feature descriptor.5.3 Generating Keys, Encryption and Decryption Flow
At this
stagewe
havederived the
speaker'spublic
key notedby b(i).
Thispublic key is
further usedto
generate the correspondingRSA private key. After
that, the RSAprivate key will be used as the DH private key.
The correspondingDH public key will be calculated
and exchanged in a secure manner. The DH public keywill
be encryptedusing RSA algorithm and
transferredto
theother side. At this
step, eachpart has the other's DH public key after decrypting it using each one's
RSA private key.Now both users can compute the
secretkey by
completingDH
key exchange algorithm. The result is the same secret key for both sides; this secret kev is usedonlv for
one communication session.Next,
havingthe
sharei secret keyin
hand, the keywill
be used as the keyfor
the RC4 sheam cipher algorithm. RC4 stream cipherwill
start to produce a keystream; the keystreamwill
be the samefor both
sides.This
keystreamwill be
usedfor doing
nvo operations;encrypting the
speakervoice
and messages, and decryptingthe
receivedinformation from the
other speaker.The
processof
generating thesekeys is
next discussed in detail. Fig. 2 illustrates the encryptionflow.
Two large random and distinct
primesp
and 4 neededto
be generatedfor
each session; compute(n
=pq),
the Euler totient functionp(n) :
(p- t)(q - I)
and selecting a random integer e,I
< e <qfu),
suchthat gcd(e,p(n)) =
LWe notice that e
which is
thepublic key is
alreadyknown and is
generatedin a
separatemanner from
205
generating
qfu). To
generatethe RSA private key
thefollowing
steps are performed:(D
Generatetwo
large random and distinct primesp
and q, each roughly the same size.(ii)
Compute (n:
pq) and the totient tunctionp(n) :
(p - I)(q
- 1)(iii)
Check the public key e withg(n)
suchthatgcd(e,
d : l' If
this condition is achieved then move on to step 4, else go back to stepl.
(iv)
Use the extended Euclidean algorithm to compute the unique integer d,I
< d < g(n), such thated= I (modpfu).
(v)
Share (n, e), andkeep private key d.This process
is
secure against any attacks on private keys; because thekey
space g(n) and the private key dwill
be changed
for
each session, soit
is hardto factoize
p(n) or even trying to figure out d.-riilltlt-
CAftFf. f s"raar riFcatrrattru
J
f--rpcc -_l
I
SbrEhtbt / -
s'Ji-c-n -L.\ lF.tup*"tE
ll I n**x'y
I l-"
I I---
RsA--l
t
\--l---
__*__f195-&R-Fa_ -) frnft-Eh-
|GEF'gr IrH}rtluitRSA ,
SfnaSrcr*KayI
ffil-V**'si".-|--**--l
Y It ltu'rriip.r. I
Fig.2 Encryption Flow.
Next, the RSA private key
obtained abovewill
be considered as theDH
private key and another processwill
start to generate a secret key for both useni as follows:
(t
The two global integersz
and g arefixed
and known by each user.(ii)
The user'sDH
privatekey
hereis
theRSA
privatekey which
has been already computed.So Alice,s privatekeyis X, =6r,whileBob'sis Xs =da.
(iii) From n, g
andX,
eachpart
computeshis own DH
206
public k"y,
Yn =gx' modn for Alice
andY, = gxu
modz
for Bob.(iv) Both users
exchangetheir DH public keys
after encrypting themwith
RSA public keys.(v) Inorder to get the
secretkey or
sessionkey,
each usersdecrypt the DH public key.
Here,Alice will compute K=Y"" modz to get the
secretkey
andBob will compute K=Yn*" modz to get the
samesecret key.
Finally, after setting up the secret keys by both sides, each user uses the secret
key
asthe RC4 input key
and starts generating a keystream.This
keystreamis
used to perform encryptionby
simply performingXOR
operationfor the out going
data, andthe
same keystreamwill
be usedto
decrypt the incoming datafor
each sideby
doing XOR operation. Decryption is simply done by XORing the keystreamwith the encrypted data stream to get
the original data stream as illustrated in Fig. 3. XOR operationis not computationally expensive for today's
mobile phones.Now
the communication channel is secured' evenif
communication gets tapped, what an eavesdropper gets is nonsense.The audio
stream canbe
appliedto
pre-processing steps beforedoing
encryption.That is
encodethe
audio streambefore doing encryption and
decodesthe
audio streamafter
decryptionto
get theoriginal
audio stream.This process analogically
like
putting the audio streamsin
a secure envelops.,l,,Wfie"
Cltmtorh Scr&r dRFr*GrUtGffim
I
frrcc -l
Slrraruly |
I
o#i""-#.iL"{
-F".*D""+E
orE IEr scrd.r I --l
IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.3, March 2007
orE rEr sct3rr
| |
I
erlu" n"vllltE
t ---T---
sor&-r&E*tur
^. ) f Dtiltl*tt-r. I
cnlnrqgc DIIfr.lB u|rS RSA
SU"e
sl-rr<tY
I
lE";1b.o"r.l ----*\'*7.-
Klysrr.* - f--iA-l
+
6. Results
This section presents the results for the
proposedmethodology
adoptedin Section 5. This section
also discussesthe obtained results from implementing
the system. In order to implement such a system, one must go through several stepswhich
were describedin
detailsin
previous sections. The implementationfor this
simulatedproject is written
usingMATLAB. In
addition,Maple
8 kernel access was used in programming.Some tests
have been done to
measurein
whichdegree the
final
result canwork
under personal computers and somewhat mobile phones (e.g. PC-to-PC,Mobile-to-
Mobile and Mobile-to-PC). These tests were performed onthe following specifications: Intel(R) Pentium(R) M
processor
l.73GHz,512 MB RAM of
memory,Microsoft Windows XP Home Edition
and programming language wasMATLAB.
Testing was also done on the
samePC but with different
processor speed. The processor speed has been reducedto
reach 412MHz which is
nearto or
less thanrecent mobile phones. Some of the available
mobile phones are currently:Dell Axim X30
poweredwith
Intel 624MHz XScale processor,with 64MB RAM
and 64MBROM.
SamsungARM9-based 53C2440, is clocked
at 533MHz.In this project, a standard microphone connected
with
the PC is used as the hardwarefor
speech recording. The standard sound recorder programin
Microsoft Windows is used to do speech recording. The audio format settings arePCM
44.100kllz,
16bit
and stereo.This is
even more than what GSM uses now whichis
8000kHz,
8bit
mono;that means less computation time
neededthan
PCM214.100
kLlz, 16 bit and
stereo.The
speaker speech is recorded and saved in wave format.The timing results for this cryptosystem are computed and presented
in
the same orderof
execution.First of
all,Key
setup startingfrom the MFCC
processing.Table I shows the result of executing MFCC algorithm for different
audio speech lengthsto
get 64 acoustic vectors.The key
generatedfrom
these acoustic vectorsis
64-bit length. Note that these audio files were stored on the harddrive
andMFCC function
needs the same execution timefor
a variable audiofile
length.Table I
also shows the storage requiredto
save the audio files. On the otherhan4
TableII
shows thetiming results for executing MFCC algorithm for
different acousticvector
sizes.A larger vector size
needs more computation time.A
64 acoustic vector can generatefrom
64-bit public key up to 2048-bit public key.Fig.3 Decryption Flow.
IJCSNS International Joumal of Computer Science and Network Securiry. VOL.7 No.3. March 2007
Table l: Time in seconds 1o generate 64 Acoustic Vecton from MFCC fordifferent
Audio length in seconds
Time for MFCC on P 1.7 GHz
Time for MFCC on P 412
MHz
Required Storage
0.0156 0.0313 I80 KB
2 0.0156 0.0313 356 KB
J 0.0 t 56 0.0313 527 KB
207
Table
2
shows differentkey
sizes after applying the feature descriptoron the
acoustic vectors resulted fromMFCC
process. The processfor
generatingRSA
private key and checking theprimality of
thepublic
key is placed next.Table
3
shows averagetiming
rosultsfor
generating RSA keysfor
variable key sizes. The average time is done using ten tests for each key size.From Table 3, it is
obviousthat using 64-bit
longRSA
keysis very
fast.In the
senseof
security,a
64-bitkey is
secure enoughto
transferpublic DH
keys.After
RSA keys have been computed,DH key
exchange comesnext. DH function first computes the
corresponding privatekey.
Secondly,it
computes the second user public key.Finally,
exchange thekey
usingRSA
and computes secret sharedkey. Table 4
showsthe timing result
to generate the secret key.When the secret key is known, the key is used as RC4 key. Table 5 shows the time required to initialize RC4.
Table 5: 5: TimeTime needed to
Time in seconds on P 1.7 GHz Time in seconds on P 412 MHz
0.0006 0.0056
Finally,
the encryption processwill
start. Encryption tests were done and the timing results were takenfor
eitherdoing XOR
operation alone,or
encoding and then doingXOR
operation over bytesof
data.Timing
tests includesthe time
neededto
generatethe proper
keystream andperform an XOR
operation.The
encoderused here
isAN643 Adaptive differential pulse code
modulation(ADPCM).
Table 6 shows the timing results for generatingkeystream and performing encryption or
decryptionprocess. Each
byte of
data needs one byteof
keystream.Note that the time needed
to
decrypt one bSrte is the same as the time needed to encryTt one byte using either simpleXOR
operationor doing XOR
operationand
applyrng decoder after that.Table
From these tests, tlte overall time needed to setup the system depends
on the time
neededto take the
MFCCsample and the generation of keys, raking
into considerationtheir
lengths. Table7
shows thetotal
time needed to start the system. Note the MFCC audio length is not included.4:A tirre needed to in seconds
DH Secret key size Time to prepare DH key on P 1.7 GHz
Time to prepare DH key on P 412 MHz
64-bil 0.015625 0.21875
128-bir 0.062s 0.3125
256-bit 0.0937s 0.4375
5t 0. I 562s 0.828125
1024-bit 0.28125 r.53437
able Z: Time in sec(,nds to sizes from
Key size Time for
MFCC on P 1.7 GHz
Time for MFCC on P 412
MHz
6/-bit 0.0156 0.0313
128-bit 0.031 I 0. l 563
256-bit 0.328 r 1.1094
5 l2-bil l. I 250 3.5 156
Table 3: Average time needed to generate RSA keys in seconds RSA key size Time to prepare RSA
keys on P 1.7 GHz
Time to prepare RSA keys on P 412 MHz
64-bir 0.21875 o.515625
128-bit 0.40625 1.40625
256-bit o.53125 2.359375
5 l2-bit 0.78125 2.76562s
I 024-bit 1.328125 6.093'75
6: Time needed to initialize RC Operation Time in seconds
on P 1.7 GHz
Tinre in seconds onP 412
MEz
XOR(1-byte) 0.00001 0.00003
Encode & XOR (l-byte) 0.00007 0.00015
XoR(2-ble)
0.q)002 |
0.m0MEncode & XOR (2-byte) 0.00015 0.0003
Table Time to the Key size Time needed
on P 1.7 GHz
Time needed on P 412 MHz
6+bit 0.266175 0.802575
128-bit 0.5 1605 t.9l r95
256-bit 0.9693 3.943175
5 tz-bit 2.0't87 7.14625
1024-bit 2.95A575 12.58062
208 IJCSNS Intemational Joumal of Computer Science and Network Security, VOL.7 No'3, March 2007
These results show that
a
64-bit system requires lessthan one
secondto
preparethe keys after storing
the MFCC audio sample.A
128-bit also requires less than one secondfor
normal PC and less thantwo
secondson
the mobile phone simulated processor. The drawbackof
this protocolis
thatit
needs afew
secondsto
take the user'svoice
and setupthe
cryptographic keys,but this
can be solvedby
encryptingthe first
secondswith a fixed
key known between usersuntil
the system initializes. Note that the results obtained here using arc 44,100kHz PCM with 16 bit
stereowave format. Meanwhile, the GSM
usesmuch less wave quality format which can
reduce thesystem setup time to the
half
of these results, meaning that recent mobile phones cannrn
the systemin
a much fastmanner.
7. Conclusion
This
paper discussest}te
developmentof a
systemcombining audio Feature Extraction and public
keycryptography. For Feature Exffaction,
Mel-Frequency Cepstrum Coeffrcients was selected due to the fast process, the qualityof
the results andit
is applicable to any spokenword. MFCC performs these signal
processing steps:Frame blocking, Windowing, Fast Fourier
Transform,Cepstnrm and Mel-frequency wrapping.
In this
paper,in
orderto
perform authentication,MFCC
was implementedto
authenticate othersby
generating a keyfrom
the user's voice.Both RSA
andDH
are usedto
generatethe
secretshared key. This key is furtherused as an RC4 key. RC4 is a fast algorithm to generate keystream and
it
is suitablefor real-time
applications.RSA algorithm
generates public keys which are used to secure the exchange processofDH public keys. It is
secureto
exchangeDH public
keyswithout
encryptionbut
exchanging themthis way
gives additional securityfor
the proposed method. Overall, the results for this project are as follows:(i) Key authentication is done implicitly
andautomatically using MFCC feature
extaction.
(ii)
The generatedkey is
secure against the man-in-the- middle attack.(iii) The protocol is based on RSA, DH and
RC4 cryptosystems.(iv) Using
pseudo-random numbersfor the private
keysand the
sessionkeys. Also using
pseudorandom number generator for the key streams. These keys are non-deterministic.The time required to setup the keys is fast enough for both PC and mobile phone usage.
Also
the protocol requires a very small storage.References
[]
Sara Robinson. 2000. Cell phone flaw opens security hole.ZDnet
Newshtp : //www. snapshield.com/Articles-helpfu l/Cell-phone-fl a w.htm (Accessed March 13, 2006).
[2] Chey Cobb. 2004.
Cryptographyfor
Dummies. ForDummies: Hungry Minds Inc,U.S.
l3l W. Diffre and M. E. Hellman'
1979.Privacy
andauthentication:
An
introductionto
cryptography. Proc'of
the IEEE, 67 (3\:397 -427.
t4l
A.J. Menezes, P.C. van Oorschot, and S.A. Vanstone. 1997.Handbook of applied cryptography. CRC Press.
[5]
Wikipedia. 2006. RSA. Public-key encryption' Wikipedia.http://en.wikipedia.org/wiki/RSA (Accessed
March
13, 2000.[6] William
Stallings.2003.
Cryptographyand
Network Security: Principlesand
Practice,Third edition.
U.S' Prentice Hall.[7]
U. Maurer. 1994. Towards the equivalence of breaking the Diffie-Hellman protocol and computing discrete logarithms' Advances in Cryptology Crypto'94, 271'281.[8]
Wikipedia. 2006. RC4.RC4
stream cipher. Wikipedia.http://en.wikipedia.org/wikilRC4 (Accessed
March
13, 2006).[9] Dualta Cunie. 2003.
Shedding somelight on
Voice Authentication.GsEc-V 1.4b.htp ://www. sans.org/n/whitepapers/authentication/847'php.
[0]
Jim Baumann. 1993. Voice Recognition, Human InterfaceTechnology Laboratory, University of
Washington.hnp ://www.hitl.washington. edr/scivdEVE/I.D'2. d.VoiceR ecognition.htrnt (Accessed January I 6, 2006).
[l]
Davis, S.B Mermelstein.l980. Comparisonof
parametricrepresentations
for
monosyllabicwired
recognition in continuously spoken sentences.IEEE
Transactions on Aooustic, Speech, Signal Processing, Vol. ASSP-28' No.4.t12l
Minh N. Do.
1998.An
Automatic Speaker RecognitionSystem. Swiss Federal tnstitute of
Technology.http://lcavwww. epfl .ch/-rninhdo/asr-proj ect/asr-project.ht ml (Accessed May 19,2006).
tl3l P.
Zimmermann,1996.
PGPfone:Owner's
Manual, http://www.pgpi.comU4l B. Dorsey et ol., 1996.
Nautilus http://www.lila.com/nautilus/[l5] J. Walker, B. C. Wiles,
1995.Documentation,
Speak
Freely,http://www.fourmilab.ch
I I 6] Tadiran, 2006. Snap shield, www.snapshield.com
IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.3, March 2007
I I 7] Cryptophone. 2006. Cryprophone Encryption. Cryptophone.
http://www.cryptophone.de/background/index.html (Accessed March 13. 2000.
Monther Enayah received
the
B.S.degree
in
ComputerSci*ce
from Applied Science University in Jordan,in20M,
and M.S. degree in Computer science from Universiti Sains Malaysia, in Malaysia in 2006. Currently he is aPh.D student
at
the Universiti Sains Malaysiain
the areaof
Securiw andAzman Samsudln received the B.S.
degree
in
Interconnection SwitchingNetworks from University of
Rochester. He also received M.S. and Ph.D degrees
in
Parallel Distributed Computing and Cryptography from Universityof
Denver respectively.Currently he is
chairperson for Information Systems and deputy deanof
Graduate Studies and Research209
Cryptography.