machine learning - Log likelihood to implement Naive Bayes for Text Classification -

- June 15, 2011

I am implementing Neve Bays algorithm for text classification. I have ~ 1000 documents for training and 400 documents for the test. I feel that I have implemented the training correctly, but I am confused in the test part. Here's what I have done:

In my training ceremony:

  vocabulary shapes = gateline cutters inclication (); // Get all the unique words in the entire archive spamModelAre [glossary size]; NonspamModelArray [vocabularySize]; For each training_file {class = GetClassLabel (); // 0 and 1 for spam; non-spam documents = GetDocumentID (); Countertelling docs ++; If (class == 0) {counterTotalSpamTrainingDocs ++; } For each word in the document {freq = GetTermFrequency; // How often does this word appear in this document? Id = GetTermID; // if the unique ID of the word (square = 0) {// spam spam modell [id] + = freak; TotalNumberofSpamWords ++; // Number of total numbers marked as spam in the training docs} other {// non-spam nonspamModelArray [ID] + = freq; TotalNumberofNonSpamWords++; // training the total numbers marked as non-spam docs}} I size vocabulary {spam Moodelare [i] = spam Moodelare [i] / Kulnmberf Sfamsbd; NonspamModelArray [i] = nonspamModelArray [i] / totalNumberofNonSpamWords; } // Pre-CounterTotalSpamTrainingDocs / CounterTelting Docs; // Calculates the pre-feasibility of spam documents    I think that I understand and implement the training correctly, but I'm not sure I can apply the test I am parting properly here, I am trying to go through each exam document and logP (spam | d) and logP (non-spam) for each document.  
  glossary = getUniqueTermsInCollection; // all meet i Unique words in the entire collection for each test_file: document = getDocumentID; Logprobabilipspam = 0; Logprobabilityfonspam = 0; For each word in the document {freq = GetTermFrequency; // How often does this word appear in this document? Id = GetTermID; // unique ID words / Logpi (w1w2 .. wn) = C (wj) a "logP (wj) Logprobabilitispam + = freq * log (Spammodelarey [id]); LogProbabilityofNonSpam + = freq * log (NonspamModelArray [ID]) ;} // to // now I'm calculating the probability of being spam this document if log (Logprabbiliti for Nonspam + log (1-Preeprb) & gt; Logprobabiliti Ofspam + log (Preeprb)) {// RGMX [Logc (I | CK) + LogP (CK)] newclass = 1; // Not spam; Other {newclass = 0; // spam}} / / To    My problem is: I am accurate 1 and 0 (want to return probability of each class rather than the spam / non-spam). I for example want to see my Newclass = 0.8684212 so I can apply threshold later. but I am confused here, I can calculate the probability for each document? can I use logProbabilities to calculate?   < P> 
  Features one The probability of the data described by the set { F1 ,  F2 , ...,  F } in the class  C , nyve According to Bayes prospects, the model is  
  P (C | F) = P (C) * (P (F1 | C) * P (F2 | C) * ... * P (FNC)) / P (F1, ..., FN)    You have all the conditions (in logarithmic form), 1 /  p  (except  F1 , ..., < the em> fn ) do not use the word because they Knives Beys is not used in Clasifayriyr, which are implemented (strictly, the classifier.)  
 you features The frequencies must also be collected, and calculated by them  
  P (F1, ..., Fn)) = (F1) * ... * P (FN)    

 




  



















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




qt - switch/case statement in C++ with a QString type -



-



July 15, 2014








    I want to use switch-case in my program but the compiler generates an error. How can I use the  switch  statement with  QString ?   The compiler gives me this error:    The expression of the switch type 'QString' is invalid    My code is as follows :    boolStopWord (the word QString) {bool flag = false; Switch (word) {case "the": flag = true; break ; "On" case: flag = true; break ; Case "in": flag = true; break ; Case "your": flag = true; break ; Case "near": flag = true; break ; Case "all": flag = true; break ; Case "this": flag = true; break ; } Return flag; }        How do I use the switch statement with a Caststring ?    You can not use the  switch  statement in C ++ language with integral or enum types. You can formally enter the square type of object in the  switch  statement, but this means that the compiler will look for a user-defined conversion to convert it to an integral or enum type.    





Read more





python - sqlite3.OperationalError: near "REFERENCES": syntax error -
foreign key creating -



-



September 15, 2010








    I'm trying to create a predefined key after checking the manual and I wrote it:   Db.execute ("create table" Table 2 (id, integer primary key, somedata integer) "db.execute" (table creation table 2 does not exist (name text, maintain reference (id) ")    and found this:    sqlite 3. Operation error: near" references ": syntax error              Your second ligne is wrong    db.execute ("create table if table does not exist 2 (name text, my_id integer, foreign key (my_id) reference Maintenance (ID)) ")    as explained    





Read more





Python's equivalent for Ruby's define_method? -



-



April 15, 2010








    What is a Python equivalent for Ruby's  define_method , which is the dynamic generation of class methods will allow? (As can be seen in Wikipedia)      Functions are first-class objects in Python and assign them The properties of a class or an example can be a way of doing the same thing as a Wikipedia example:    colors = {"black": "000", "red": "F00", "green": "0f0", "yellow": "ff0", "blue": "00f", "magenta": "fff", "cyan": "0ff", "white": " FFF "} Class MyStream (ARR): Pass for Name, Code in colors.iteritems (): def _in_colour ( W code = code): return '& lt; Span style = "color:% s" & gt;% s & lt; / Span & gt; ' % (Code, self) setter (histrive, "in_" + name, _in_ color)     





Read more

Search This Blog

T C SPAIN

machine learning - Log likelihood to implement Naive Bayes for Text Classification -

Comments

Post a Comment

Popular posts from this blog

qt - switch/case statement in C++ with a QString type -

python - sqlite3.OperationalError: near "REFERENCES": syntax error - foreign key creating -

Python's equivalent for Ruby's define_method? -