Skip to main content

MASTER’S TALK

Master's Talk

Title:  An Overview of the Vector Space Model for Text-based Information Retrieval

Abstract:  The vector space model (VSM) for information retrieval of a text-based document set is a way to convert text-based documents to real-valued vectors.  Under this model, we arrive at a weighted term-document matrix that contains word-based information from the set of documents.  Then, we use the singular value decomposition to factor the term-document matrix, obtaining key information from the matrix, such as its rank and an orthonormal basis for its columnspace.  We discuss VSM-based interpretations of these standard matrix components, and we provide a glimpse of VSM updating techniques.  Finally, we review examples, and we look at results generated from original code showing the VSM in action.

Date:
-
Location:
745 Patterson Office Tower
Event Series:

The Class Number and Binary Quadratic Forms

Let F be a positive definite binary quadratic form. One may classify such forms with fixed discriminant Δ, up to equivalence in GL2(Z) or refine this further to proper equivalence in SL2(Z). These are classical results developed by Lagrange and Gauss and lead to well-known statements about the class number h(Δ). In his paper “Über Bilineare Formen Mit Vier Variabeln”, Kronecker introduces the finer notion of complete equivalence, which is used to study the class number of positive definite forms with integer coefficients of the type ax2+2bxy+cy2. In this talk we will discuss Kronecker’s development of the class number via complete equivalence and compare it with the classical results of Lagrange and Gauss.





 

Date:
-
Location:
207 Whitehall Classroom Building
Event Series:

Deeper Inside PageRank

Google assigns each Web page a score, called a PageRank, based on the number of pages which link to it. It uses this score to determine the order in which search results are presented. Computing the PageRank for each Web page is done by computing the dominant eigenvector of a (very large) probability matrix. Although this can be computed directly using the Power Method, it can also be reformulated into the solution of a linear system. After introducing the "random surfer" model and defining the PageRank vector, I will develop an algorithm to solve the PageRank problem as a linear system. Finally, we will briefly discuss some possible modifications or alternative strategies and their numerical implications.

Date:
-
Location:
745 Patterson Office Tower
Event Series:
Subscribe to MASTER’S TALK