### Transcription

Jurnal Teknologi Maklumat & Multimedia 10(2011): 1 - 11Improvement Anomaly Intrusion Detection using Fuzzy-ART Based on K-meansbased on SNC LabelingZulaiha Ali OthmanAfaf Muftah AdabashiSuhaila ZainudinSaadat M. Al HashmiABSTRACTIntrusion detection has received a lot of attention from many researchers, and various techniques have been used to identifyintrusions or attacks against computers and networks. Data mining is a well-known artificial intelligence technique tobuild network intrusion detection systems. However, numerous data mining techniques have been successfully appliedin this area to find intrusions hidden in large amounts of audit data through classification, clustering or associationrule. Clustering is one of the promising techniques used in Anomaly Intrusion Detection (AID), especially when dealingwith unknown patterns. This paper presents our work to improve the performance of anomaly intrusion detection usingFuzzy-ART based on the K-means algorithm. The K-means is a modified version of the standard K-means by initializingthe value K from the value obtained after data mining using Fuzzy-ART and SNC labeling technique. The result has shownthat this algorithm has increased the detection rate and reduced the false alarm rate compared with Fuzzy-ART.Keywords: intrusion detection, anomaly detection, data mining, NSL-KDD dataset, Fuzz-ART, K-means, labelingABSTRAKBidang pengesanan pencerobohan telah menjadi perhatian ramai penyelidik dan pada masa yang sama berbagai tekniktelah dibangunkan untuk mengesan pencerobohan atau serangan ke atas komputer dan rangkaian. Perlombongandata adalah teknik kecerdasan buatan yang telah dikenal pasti untuk pembangunan sistem pengesanan pencerobohanrangkaian. Berbilang teknik perlombongan data telah berjaya diaplikasikan di dalam bidang ini untuk melombongpencerobohan yang tersembunyi di dalam data audit menerusi pengkelasan, pengelompokan atau petua sekutuan.Pengelompokan adalah satu daripada teknik yang berpotensi untuk Pengesanan Anomali Pencerobohan, khasnya bilaberhadapan dengan corak yang tidak diketahui. Kertas kerja ini memperihalkan kajian untuk meningkatkan prestasipengesanan anomali pencerobohan menggunakan Fuzzy-ART berasaskan alkhwarizmi K-means. Kaedah K-means iniadalah versi K-means piawai yang telah ditambahbaik dengan menggunakan nilai K awal berdasarkan kepada nilaiyang dilombong menggunakan Fuzzy-ART yang menggunakan teknik perlabelan SNC. Keputusan menunjukkan bahawaalkharizmi ini telah meningkatkan kadar pengesanan dan mengurangkan kadar penggera palsu berbanding penggunaanFuzzy-ART semata-mata.Kata Kunci: Pengesanan pencerobohan, pengesanan anomali, perlombongan data, set data NSL-KDD, Fuzzy-ART, K-means,perlabelanIntroductionIntrusion detection is defined as a process of observingany event occurring in computer systems or networks,and analyzing the events for any signs of intrusion (Bauer& Koblentz 1988). An Intrusion Detection System (IDS)is a tool to provide security in computers or the networkenvironment by stopping unlawful access to the systemresources and the information stored within the system. AnIDS can be a software or hardware system or a combinationof both that aims to monitor the different events occurringin the network.IDS can be classified into two categories, namelymisuse detection and anomaly detection (Wang &Megalooikonomu 2005). In the misuse detection approach,Bab 1.indd 1each instance in training data is labeled as “normal” or“intrusion.” It analyzes the gathered information andcompares it to large databases of attack signatures. Incontrast, anomaly detection (behavior-based) methodsbuild models of normal data and then make an effort todetect deviation from the normal profile; any deviation isconsidered as an anomaly. According to Chandola et al.(2007), anomaly can be defined as a pattern in a datathat is not consistent with the specific concept of normalbehavior. Anomaly detection can make use of supervisedor unsupervised methods to identify the anomalies.Unsupervised Anomaly Detection (UND) methods makesome assumptions about the data that motivate the generalapproach (Portnoy et al. 2001). The first assumption is thatthe majority of the network connections are normal traffic,10/15/2012 3:03:18 PM

2Jurnal Teknologi Maklumat & Multimedia 10with only a small number of traffic being anomalies. Thesecond assumption is that the attack traffic is statisticallydifferent from normal traffic. Since the anomalies aredifferent from the normal traffic, they will appear asoutliers in the data which can be detected.Data mining techniques can deal with a largevolume of data required by both IDS categories to findintrusions hidden in network data. Clustering is oneof the data mining techniques that have been used inanomaly intrusion detection, especially when dealingwith unlabeled data. Several clustering algorithms hasbeen proposed by Guan et al. (2003), Abdul Samad et al.(2008) and Ngamwitthayanon et al. (2009), that aim tohave a high detection rate and low false alarm rate forintrusion detection. Among the clustering algorithms,Fuzzy-ART has shown the best clustering technique forintrusion detection. However, there is still much roomfor improvement. Some research have shown that a highdetection rate with low false alarm rate limit the researchinto specific types of attack only.This paper aims to improve the anomaly intrusiondetection system using Fuzzy-ART based on the k-meansclustering algorithm. K-means uses Fuzzy-ART for gettingthe initial stated value for K-means, while Fuzzy-ART usesK-means to reassign the data instances in the obtainedclusters in order to achieve a better result as we namedit as K-means.ClusteringClustering is a common method for data analysis, which isused in many fields including pattern recognition, machinelearning, and data mining. Clustering is the separationof data into subsets of similar objects, each subset beingcalled a cluster. The cluster consists of objects that arerelated between themselves and dissimilar to the objectsin other clusters (Lappas & Pellechrinis 2007).Clustering is a well known data mining technique forunsupervised data. There are a lot of clustering methodsand algorithms, which can be categorized into fourcategories: partitioning methods, hierarchical methods,density-based methods and grid-based methods (Leung& Leckie 2005).Clustering techniques are beneficial in intrusiondetection, which have the ability to cluster maliciousactivity together, and separate them from non maliciousactivity. Clustering techniques provide some considerableadvantages compared with classification techniques,in that they do not require the use of a labeled data fortraining.Applying clustering techniques in network intrusiondetection has received much attention in the networkacademic community. Smith et al. (2002) has applied thealgorithms K-means, Self-Organizing Maps (SOM), andExpected Maximization (EM). The results show that SOMhas the same complexity regardless of the data volumeor clusters used. However, SOM can incorrectly classifyBab 1.indd 2the input data that will correspond to points not found intraining data. As Smith et al. (2002) claimed the K-meansalgorithm has a predictable performance. However, itpoorly manipulates highly dimensional data sets. The EMalgorithm can tolerate missing and unlabeled data and canoffer information about how close a data point is to eachcluster since the data point has a varying membership toall clusters.Guan et al. (2003) introduced a K-means basedclustering algorithm, called Y-means. It overcomesthe deficiencies of the traditional K-means algorithm.The Y-means algorithm has some advantages. It doesnot need to determine the initial number of clusters inadvance. Additionally, it can directly use the raw log dataof information systems as training data without beingmanually labeled. This advantage provides the Y-meansalgorithm with the ability to detect known intrusions aswell as unknown intrusions.Abdul Samad et al. (2008) applied the Fuzzy AdaptiveResonance Theory (Fuzzy-ART) method in KDD Cup’99benchmark data set to build an anomaly intrusion detectionclassifier. They also used Principal Component Analysis(PCA) for data reduction. Moreover, they proposed a newcluster labeling algorithm to label clusters as normal orabnormal (attack). The result shows that this researchhas a good performance for classifying the networkconnections.Ngamwitthayanon et al. (2009) investigated a oneshot fast learning option of Fuzzy-ART. They used asubset of KDD Cup’99 data set for experiments whichcontains a normal record connection and two types ofattack categories, which are Dos and Probe. The resultsfound that this can be applied to provide a real-timedetection system with high detection rate and low falsealarm rate.Fuzzy Adaptive Resonance Theory (Fuzzy-ART)The ART neural networks were developed by Carpenteret al. (1987). The ART neural networks can learn withoutforgetting past learning. The first version is ART-1 whichcan only deal with a binary data.Carpenter et al. (1987) developed Fuzzy-ART systemswhich is an extension of ART-1. Fuzzy-ART incorporatescomputations from Fuzzy set theory into ART systems.This is by replacing the crisp inter-section operator ( )by the Fuzzy AND operator ( ) of the Fuzzy set theory.Fuzzy-ART has several advantages over the ART-1 suchas less implementation cost and processing time, and thecapability of handling both binary and analog data. It hasan unsupervised learning feature and does not require apredefined number of clusters. Figure 1 shows the FuzzyART algorithm flowchart.The Fuzzy-ART algorithm has been summarized byCarpenter et al. (1991) as follows:Input vector: Every input vector I is an M-dimensionalvector, where each component of I is in the interval [0,1].10/15/2012 3:03:18 PM

Improvement Anomaly Intrusion Detection using Fuzzy-ART Based on K-means based on SNC Labeling3Input pattern I is presented tofuzzy-ARTInput pattern I is normalized and complementallycodedCompute activation Tj of all saved categories forinput pattern ISelect the category wj that shows highestactivation valueRun the vigilance test for the wining category wjwith respect to pattern Iexclude the wj categoryfrom the search and startthe search overNowj has passed thevigilance test ?All categories aretested for vigilanceNoYesupdate thecategoryYesCreate a new category that pattern I is the firstmemberFIGURE1. Fuzzy-ART algorithm flowchartActivation function: The Tj(I) in Equation (2) calculatesWeight vector: Each cluster or category J corresponds tothe similaritybetween the input pattern I and all of saveda weight vector wj (wji,., wji) of adaptiveweights.Initial clusters centroid obtainedfrom Fuzzy-ARTcategories. After evaluating the Tj for every category, theInitially, all weights are set to one and each category isone with the largest value of choice function is selected:said to be uncommitted.wji . wjm 1(1)T max{T : j 1,2, ,N}.(4)jj clustersCalculate the distance between each instanceand allcentroidWhere N is the number of categories. If there is moreAfter a category is selected it becomes committed.Parameters: Fuzzy-ART networks depend on these threemain parameters:1.2.3.than one category with the same maximum value then thecategory with the smallest j is chosen.Assign each instance to the nearestcluster thatorhasrest:minimumResonancecategory wj that is chosen from thedistanceactivation function in Equation (4) is compared with theVigilance parameter ρ [0,1].Choice parameter α 0 .Learning Rate parameter β [0,1].input pattern I.Category Choice: For each input I and category J, theNochoice function of Tj is defined by: I wj ρ. I YesIsthere any empty cluster? I wj T j(I) α w .j(2)Where the operator is fuzzy AND which is defined by:(x y)i min (xi, yi).(3)(5)If the category wj meets the vigilance parameter ρ thenthe wjth category is selected. Otherwise the jth category isrested. If no category can be selected, a new category isRemovethe emptyclusterscommitted withweightsequalto 1, and the input vectoris assigned to it.update clusters centroid by calculating the average of all instances valuebelonging to each clusterNoBab 1.indd 3Have theclusters centroid changed?Yes10/15/2012 3:03:19 PM

4Jurnal Teknologi Maklumat & Multimedia 10Learning: After selecting a category j, the weight vectorwj is updated according to the next equation.wj(new) β (I wj(old)) (1 – β) wj(old).In this paper, the Fuzzy-ART algorithm is used in orderto find the groups of normal and attack instances, wherebyeach group shares common interests and behaviors.However, the groups or clusters that are obtained as a resultof this algorithm still have many of the normal and attackinstances in the same cluster. For this reason, this paperhas applied K-means clustering algorithm to reassign thedata instances in these clusters in order to achieve a betterclustering quality.Fuzzy-ART based on the K-means clustering algorithmconsists of two phases (Figure 2). After generatingclusters in the first phase using Fuzzy-ART depending onthe vigilance parameter, K-means in the second phasetakes these clusters as the initial clusters and reassignseach instance or object in these clusters to the clusterwith the nearest centroid. This means that the cluster thathas a minimum distance to this object using Euclideandistance, takes this object as a member. After reassigningall instances to the nearest clusters, the updates of thecluster centroids are calculated by the mean value of theobjects in each cluster. This iteration was repeated untilthe centroids do not change any more.(6)Input normalization option: To avoid a categoryproliferation problem in a Fuzzy-ART network, the inputsmust be in normalized form. An alternative normalizationrule is named as complement coding. The complementsof the data are denoted by ac, where:ac 1-a.(7)Then, the complement coded input vector I is2M-dimentional vector:I (a, ac) (a1, .,am, a1c, , amc).(8)In the Fuzzy-ART algorithm, the fast learning occurswhen the learning rate set to 1. Learning in the Fuzzy-ARTcan be interpreted as the extension of the category regiontowards the input sample. Furthermore, the vigilanceparameter is the most important that can affect the FuzzyART clustering quality, which controls the categorizationby employing it as the criterion for a maximum tolerablevalue between the input data pattern and prototype of acategory.NormalizedDataK-meansThe K-means clustering algorithm is a method used topartition a dataset into groups (MacQueen 1967). Thisalgorithm classifies objects to a predefined number ofclusters, which is the k from its name. Each cluster isrepresented by the mean value of the objects in the cluster,which is known as the center of the cluster.The K-means clustering algorithm is one of thesimplest unsupervised learning algorithms that have beenadapted to solve many problem domains. The procedure ofthis algorithm follows a simple and easy way to classifya given data set to a certain number of clusters. The stepsof the K-means algorithm are given as follows:1.2.3.4.ClustersK-meansFIGURE2. Architecture of Fuzzy-ART based on K-meansclustering algorithmOne of the major problems of the K-means algorithmis that it may end with some empty clusters. The emptyclusters are meaningless and they prolong the time ofcalculations. For these reasons, in this research the emptyclusters have been removed within each iteration. Figurenormal3 shows the Fuzzy-ART based on K-means algorithmflowchart.20% d maxLabelingClustersC4C1Recent researches have supported the use of neuralnetworks such as Fuzzy-ART, due to their flexibilityand stability. However, Fuzzy-ART networks have somelimitations such as the unrestricted growth of clusters,which is known as category proliferation. Fuzzy-ARTnetworks often produce numerous clusters, each with onlya small number of members.Bab 1.indd 4ClustersSelect k points randomly to be the initial centroids ofk clusters.Assign each object to the closest centroid.normalRecalculate the centroid of each cluster, which isan average of all attribute values of the examplesbelonging to the same cluster.Repeat steps 2 and 3 until the centroids do notchange.Fuzzy-ART based on K-meansFuzzy-ARTC2C4C3Labelingclusters is a technique to label the clusters asnormal or attack after applying a clusteringC1 process.Portnoy et al. (2001) and Abdul Samad et al. (2008)proposed a labeling algorithm known as NormalMembership Factor (NMF) and this paper introduces alabeling algorithm known as Similarity Normal ClustersC5 et al. 2001).(SNC) (ProvostC2C3C510/15/2012 3:03:21 PM

Create a new category that pattern I is the firstmemberImprovement Anomaly Intrusion Detection using Fuzzy-ART Based on K-means based on SNC Labeling5Initial clusters centroid obtained from Fuzzy-ARTCalculate the distance between each instance and all clusterscentroidAssign each instance to the nearest cluster that has minimumdistanceNoIsthere any empty cluster?YesRemove the empty clustersupdate clusters centroid by calculating the average of all instances valuebelonging to each clusterNoYesHave theclusters centroid changed?endFIGURE3. Flowchart of Fuzzy-ART based on K-means AlgorithmNormal Membership Factor (NMF) LabelingAlgorithmThe aims of the NMF labeling algorithm is to identify theother clusters that may be having normal patterns andthey are gathered into a normal group in order to reducethe false alarm rate. This algorithm uses the concept ofcalculating the weighting degree of probability of theclusters belonging to a normal group. The calculation ofthe weight of clusters and the NMF of the cluster is madeas follows:1.Weight of clusters (ci): the weight (size) of the clusteris considered to greatly reduce the effect of anomalies.The calculation of the weight of clusters is given asEquation 9.1Number of instance in cWC(ci) d(Normal, c ) Number of instance i . (9)iWhere d(Normal, ci) is the distance between thenormal cluster and the other clusters, and i is thenumber of clusters.Bab 1.indd 52.Normal membership factor (c i ): the normalmembership factor is calculated as Equation 10.NMF (ci) WC (ci)0Σ ci.(10)i 10WhereΣ ci summarizes all the weight of thecluster. i 1If NMF values are greater than 40 percent, then theseare gathered into those clusters which are in the normalgroup. The percentage was determined from previousexperiment performed for this approach.Similarity Normal Cluster Labeling TechniqueThe SNC is a cluster labeling algorithm which is builtbased on the assumptions introduced by Portnoy et al.(2001). SNC calculates the similarity percentage betweenthe normal cluster centroid, which has the largest sizeand other cluster centroids using the Euclidean distance.The aim of this algorithm is to identify the other normal10/15/2012 3:03:22 PM

nal Teknologi Maklumat & Multimedia 10clusters that may have a small number of instances. Figure4 shows example of five clusters. The labeling algorithmare illustrated as follows:Step 1: Identify the largest cluster, the one with the largenumber of instances and label it as normal asshown in Figure 4.Step 2: Calculate the distances between the normalcluster centroid C1 and other cluster centroidsusing the Euclidean distances as shown in Figure5. The Euclidean distances equation as shown inEquation 11.normalnormalnormalnormalC4C3C1C3C3C1C5C54. Size of cluster notationC2 C2Step 4: Find the largest cluster in the remainderclusters.Step 5: If the size of the largest cluster is larger than1N , then label this cluster as normal and go backito step 2. Otherwise label this cluster and all therest of the clusters as attack.Where i is the number of attack categories and N is thenumber of attack instances in the data set.Where i is the number of attack categories and N is thenumber of attack instances in the dataset. The chosen sizeof the largest cluster is based on the analysis result.The remaining clusters from the last step are C2, C4and C5. The largest one is C5. Suppose that the size ofthis cluster is larger than the threshold, then label C5 asnormal and go back to Step 2.Similarity MeasuresSimilarity is a fundamental concept in clustering. Itmeasures the similarity between two patterns drawn fromthe same feature space. One of the challenges in clusteringis to choose suitable similarity measurement techniquesbased upon the feature type. The concept of similarity inmost clustering techniques is based on distances. Somewell-known distance functions include Euclidean distance,FIGUREC4C3C1C2Step 3: If the distances are less than 20%, then gatherthese clusters into the normal group. The 20%value was concluded as the best percentage afterseveral rounds of experiments for this proposedapproach.Bab 1.indd 6C4C4C1C2FIGURE20% d max20% d maxC5 C55. Area of selection of normal clustersManhattan distance, and cosine distance. The most useddistance measure to calculate the similarity betweentwo objects is Euclidean distance. Euclidean distance issufficient to successfully group similar data objects.Simply, Euclidean distance is the geometric distancein multidimensional space. For instance, for two vectors xand y in an n-dimensional space, it is defined as the squareroot of the sum of the difference of the correspondingdimensions of the vector. It is computed:d euc ( x, y ) n (xi 1i yi ) 2 .(11)Where n is the number of dimensions in the data vector.ExperimentAL SetupThe experiments conducted follow the standard datamining process (Han & Kamber 2001), which consistsof collecting the dataset, preprocessing, normalizationand mining.The experiments used NSL-KDD data collected fromthe website via the link (http://nsl.cs.unb.ca/NSL-KDD).NSL-KDD data is considered as a modified dataset for KDDCup ’99 intrusion detection benchmark dataset (KDD Cup’99). This dataset was suggested by Tavallaee et al. (2009)to solve some of the inherent problems of the KDD Cup’99 dataset.This research deals with 20% NSD-KDD trainingdataset, which consists of 25,191 records which werereduced to 13,650 records with 41 attributes. The numberof attacks was randomly selected (Abdul Samad et al.10/15/2012 3:03:22 PM

Improvement Anomaly Intrusion Detection using Fuzzy-ART Based on K-means based on SNC Labeling2008), and reduced to about 1.48% of the complete datasetto match the assumption about the distribution of normaland attack instances in the dataset (Han & Kamber 2001).The experimental dataset contains all types of attackwhich belong to one of the four attack categories. Foreach of Dos, Probe, U2R, and R2L attacks, there were83, 70, 11, and 38 records, respectively. Three data setswere generated randomised from the original data set asData Set1, Data Set 2 and Data Set 3. Table 1 shows thedescription of datasets used in this research. The first rowshows the description of original data set, the other threeare the description of other data sets, where the distributionnumber of records are same in all experiments.Each connection record in the experimental datasetis described by 41 attributes; some of these attributes arecontinuous and some are symbolic or discrete. Since mostclustering algorithms require a normalized data, theseattribute values have been converted into a normalizedform using Min-Max normalization (Zong et al. 2001) inorder to feed these algorithms.Before the normalization all attribute values arediscretized. For symbolic features like protocol-type,service, and flag they are mapped manually to integervalues ranging from 1 to N where N is the number ofDataAttach typesNormal--DosBacksymbols (Chimphlee et al. 2006). These attributes werenumbered according to their distribution in the data setafter sorting the attribute values in a descending order(Shirazi 2009). Attributes like duration, source bytes,etc. are discretized using the ChiMerge discretizationtechnique (Kerber 1992).All experiments were carried out using MATLABr2007b on a laptop with Microsoft Windows Vistaoperating system. Finally, the evaluation of anomalyintrusion detection methods is done using two mainmeasures, Detection Rate (DR) and False Alarm Rate(FAR). DR is defined as the percentage of attacks correctlyidentified by the system while the FAR is the percentageof normal instances wrongly identified as attack by thesystem.The experiments are trained with the number of epochvalue of 100, learning rate equal to 1, choice parameter of0.000001, and with different number of vigilance value.The result of the experiments recommended the rangefrom 0.5 to 0.85 as the values of vigilance parameter.These experiments were conducted for Fuzzy-ART basedon K-means versus Fuzzy-ART using the SNC labelingalgorithm and Fuzzy-ART based on K-Means versusFuzzy-ART using NMF labeling algorithm.TABLE1. Data set descriptionsNumber ofOriginal RecordsNumber of RecordData Set 1Number of RecordData Set 2Number of RecordData Set 1515Pod2015553633202020buffer ster20777125,97313,65013,65013,650TotalBab 1.indd 7710/15/2012 3:03:23 PM

8Jurnal Teknologi Maklumat & Multimedia 10Experiment StepsThe experimental steps follow the standard data miningprocess using clustering for anomaly detection proposedby Abdul Samad et al. (2008) which consists of datapreprocessing, mining, labeling and evaluation. Figure 6shows the experiment steps conducted in this research.Data set 1Data set 4Data set 3TABLEData PreprocessingFuzzy-ART based onK-meansFuzzy-ARTSNCNMFPerformance EvaluationFIGURE6. Experiment StepsExperimental resultThe vigilance parameter decides the number of categories(clusters) and the degree of the similarity of each cluster.The network becomes more sensitive to the small changesin input patterns with increases of the vigilance value.Thus, it will increase the number of resulting clusters.Table 2 shows the results of Fuzzy-ART based on K-meansalgorithm with different vigilance values for Data Set 1.The results show that the false alarm rate increasedwith the increase of the vigilance value because morenumbers of clusters are classified as attack. At the sametime, the number of clusters increases. The results alsoshow that the best performance of Fuzzy-ART based onK-means with SNC algorithm is obtained when the valueof the vigilance parameter is equal to 0.8.TABLE2. Results of Fuzzy-ART based on K-means algorithmwith different vigilance values for data set 1VigilanceρBab 1.indd 8In an effort to show more explanation on theperformance of Fuzzy-ART based on K-means algorithm,Table 3 presents the detection rate of each category. Theresults show that the algorithm with vigilance values of 0.8and 0.85 has the best level of detection rate for Dos, Probe,U2R, and R2L. However, the detection rate for normalcategory with these two vigilance values is lower than thedetection rate with the other values of vigilance parameter,which led to an increase in the false alarm rate.DetectionRate %False AlarmRate %No. 0.8085.648.96710.8589.1114.19973. Detection rate of Fuzzy-ART based on K-meansalgorithm for each categoryVigilance 95.7190.90Table 4 shows comparison result of Fuzzy-ART basedon K-means versus Fuzzy-ART algorithm. The resultsshow that the Fuzzy-ART based on K-means has betterperformance in terms of both the detection rate and thefalse alarm rate than the Fuzzy-ART. That means theK-means algorithm can enhance the performance of theFuzzy-ART.TABLE4. Comparison results of Fuzzy-ART based on K-meansand Fuzzy-ART using SNC labeling TechniqueMethodDetection Rate % False Alarm Rate %Fuzzy-ART81.689.59Fuzzy-ART basedon K-means85.648.96The performance of the algorithm was also measuredusing ROC curves. Figure 7 shows the comparisonresults of Fuzzy-ART based on K-means versus FuzzyART algorithm using ROC curves. The figure shows thepercentage of detection rate versus percentage false alarmrate. According to Provost and Fawcett (2001), the upperleft point (0, 1) represents the ideal IDS, which has a 100%detection rate and 0% false alarm rate.The figure shows that, the curve of Fuzzy-ART basedon K-means is higher than the Fuzzy-ART. This meansthat the Fuzzy-ART based on K-means algorithm is moreaccurate and more effective than the Fuzzy-ART alone. Thefigure also shows that, the Fuzzy-ART based on K-meanshas a detection rate about 70% with 5% false alarm rate,while the Fuzzy-ART has detection rate of about 30% withthe same false alarm rate.10/15/2012 3:03:23 PM

Improvement Anomaly Intrusion Detection using Fuzzy-ART Based on K-means based on SNC LabelingFIGURE7. ROC Curves for Fuzzy-ART versus Fuzzy-ART based on K-means algorithmThe performance of the SNC labeling algorithm isthen further evaluated by comparing it with the NMFlabeling algorithm using the Fuzzy-ART based on K-meansclustering algorithm. Table 5 and Table 6 show the miningresult for the SNC and NMF labeling algorithms based onvigilance numbers, respectively. The highlight cells showsthe best result among various vigilance parameters.Table 5 and Table 6 shows that the vigilance parametercontrols the degree of similarity between the instances .800.850.500.550.600.650.700.750.800.85Bab 1.indd 9each cluster. The results show that the false alarm rateincreases with the increase of the vigilance parameterbecause there are a greater number

Pengelompokan adalah satu daripada teknik yang berpotensi untuk Pengesanan Anomali Pencerobohan, khasnya bila . IDS can be a software or hardware system or a combination of both that aims to monitor the different events occurring in the network.