Prof. Bracha Shapira
Image Item
Associate Professor
Department :
Department of Software and Information Systems Engineering
Specialization in Computational Learning and BigData
Room :
Phone :
972-74-7795098
Email :
bshapira@bgu.ac.il
Office Hours :
Education
B.A. 1983 - 1986
Bar-Ilan University - Department of Computer Science
M.Sc. 1990 - 1994
Hebrew University - Department of Computer Science
Names of Advisors - Prof: J. Rosenshein, Dr. Uri Hanani
Title of thesis - Intelligent Diagnosis of Computer Hardware Using Frames
Ph.D. 1995 - 1999
Ben-Gurion University Department of Information Systems Engineering
Name of advisor – Prof. Peretz Shoval
Title of thesis - Advanced Model of Information Filtering Based on
Extended User Profiles and Stereotypes Using Cluster-Analysis
Research Interests
My main research interests are in the field of Information Retrieval (IR) and Information Filtering (IF), and its integration with machine-learning. I am interested in the following topics:
User Profiling - Personalization
I am dealing with user profiles for individual or group of users (stereotypes) as a tool to improve effectiveness and efficiency of systems and accordingly increase users' satisfaction. My studies related to profiling include:
Development of algorithms to improve personalization of news services (Shapira et. al, IUI 2007). These algorithms are developed as part of the ePaper project that I manage at the Deutsche-Telekom Laboratories at BGU. The personalization algorithms include a time-based collaborative filtering algorithm that considers the time span of news items when predicting their relevancy to the users. The study looks at different domains assuming that the relevancy of items for different domains should have different relevancy decay factors. We believe that the time is an important factor for news personalization where the level of the update of the items is of high importance. Another personalization algorithm consists of integrating news ontology news to improve a content-based personalization algorithm (Maidel et al, RecSys 2008). Each user profile is represented by ontology concepts, and the news items are being automatically classified into the same ontology concepts using N-grams language model based multi-label classifier. The relevancy of an item to a user is predicted using a function that measures the similarity (distance) between an item and a user profile by considering not only the co-occurring concepts but also the occurrences of neighboring (parent and child) concepts, according to the hierarchical ontology. During the study we ran many simulations and user experiments in order to calibrate the system and combine both algorithms in order to obtain the best results. These studies are performed with Prof. Peretz Shoval from BGU/ISE and the graduate students: Veronica Maidel and Nimrod Steinbock.
Another issue that is being studied is the use of the Min-Hash and LSH dimensionality reduction methods for the scalability of user-based collaborative personalization. This study is conducted jointly Prof. Peretz Shoval and Igor Dvorkin (a graduate student). We look at different parameters and inclusion of functions to optimize the performance. Scalability seems to be one of the major drawbacks of the user-based collaborative personalization algorithms.
Integration of social network for recommender systems- this research which is performed jointly with Dr. Ofer Arazy from The University of British Columbia in Canada, and the graduate student Ibrahim Elsanae, looks at methods for integrating different social relations into the recommendation process in order to overcome the sparsity problem of the standard collaborative methods and to improve recommendation results (Arazy et al., WITS 2007)
Two-phase adaptation of group profiles – Development of a two-phase model for adaptation of group profiles (stereotypes) for information filtering systems. The new model includes an on-going and a critical-point adaptation and is based on cluster-analysis methods for the group profiles representation and adaptation .We are now in a process of implementing the model and evaluating TREC data. This project which was recently completed was conducted jointly with Prof. Peretz Shoval from BGU/ISE and Mrs. Diana Lisinger – a graduate student.
Content-based intrusion detection- Examination of the feasibility and effect of using profiling techniques to detect intruders on the Web, based on the content of their information access rather than on their actions. This research is part of the information warfare effort as it is intended to be used to detect terrorists on the Web. On this project I am working with Dr. Yuval Elovici, and Dr. Mark Last from BGU/ISE, and Prof. Avrahan Kandell from the University of South Florida, also involved is Omer Zaafrany – a graduate student (Elovici et al, 2004, Journal of Information Warfare) .
Profiling as a privacy preservation tool – Development of a model for privacy preservation that will enable users to access information over the Web without exposing their interests. The model is aimed at preventing eavesdroppers from using identifiable users' tracks and construct an accurate user profile. It is assumed that the user may want or need to send his or her identification over the net but still wishes retain information needs and profile private. Our suggested model is designed to conceal the user profile from an eavesdropper on the path between the user and the surfed site on the Web by generating fake transactions aiming at blurring the actual user's interests (Shapira et al., JASIST 2005). This project was recently ended. I worked on it jointly with Dr. Yuval Elovici from BGU/ISE and Adlay Meshiach – a graduate student.
IR Theory
Research related to this topic include:
IR-Modeling – development of an IR model based on the Information Structure (IS) model taken from the decision science theories. The IS modeling enables a comparison between of different IR systems when standard IR evaluation measures fail to indicate the preferred system (Elovici et al., IR journal 2003). In. On this project I am working with Prof. Paul Kantor from Rutgers University and Dr. Yuval Elovici from BGU/ISE.
Combination of IR systems to improve precision - A model of combination of IR systems is developed that is based on Information Structure (Bunun et. al., forthcoming). We are working on a formal and empirical proof of the effect of combination of systems on retrieval results. We assume that combined IR systems obtain better results than separately operated systems. On this project I am working with Prof. Paul Kantor from Rutgers University,Dr. Yuval Elovici from BGU/ISE and Alex Binun from BGU/ISE.
Evaluation of IR and IF systems – Definition of an evaluation framework for IR systems. The framework includes objective and subjective evaluations and a comparison between these two aspects. The framework consists of simulation runs and user studies. It deals with the "real value" of the information to the user, compared to the "perceived value" that is usually measured. I am also studying the assessment of experts' judgments as a basis for systems' evaluation compared to users' judgments.
Search Engines
The research related to search engines aim at improving user's satisfaction from search results. My research projects related to this topic include:
Social-based search engine – in this study we try to improve search engine results by their personalization using information from the user's social network to improve. The search engine results are ranked according to the ranking of the user close "friends" as derived from the social context of the user. This research is performed with Boaz Tzabar, a graduate student.
Query expansion – development of a semi-automatic query expansion model in order to improve retrieval results. The model was tested by user studies experiments with a tool that was developed, and TREC data (Nemeth et al., SIGIR 2004). The measures that will be used to evaluate the retrieval results are the traditional measures such as precision and recall, and novel user-based measures that I am developing. On this project I worked with Yael Nemeth – a graduate student.
Collaborative systems – We are dealing with the known "free ride" problems in the collaborative systems domain, where users tend to use knowledge inferred from other users' knowledge, but are not willing to contribute their own knowledge. We are trying to tackle the problem by developing an economic model for collaborative systems, where users will "buy" and "sell". We intent to examine the feasibility of such a model and its effect on users' behavior in collaborative environments. On this project I am working with Prof. Paul Kantor from Rutgers University and Dr. Yuval Elovici from BGU/ISE, also involved is Dan Melamed – a graduate student (Melamed et al., 2007 IEEE Intelligent systems).
Question Answering (QA) search engines – We are working on the recently introduced type of search engines designed to return exact answers to users' questions rather than documents relevant to their queries. We are working on an algorithm that should improve the indexing process of a QA search engine. We add information to the index in order to improve performance (search time). On this project I worked with Yelena Tenebaum- while she was a graduate student.
IR and Machine Learning
Privacy preserving data mining (PPDM) using K-annonymity – this study is part of a joint research with TAU, Bar-Ilan, and Haifa University, and is funded by the ministry of science, culture and sports. This study is conducted with Dr. Yuval Elovici, Dr. Lior Rokach, from BGU/ISE and the graduate student Slava Kisilevitz. During the study we developed several algorithms based on K-Anonymity technique to prepare (annonymize) DBs for data-mining while assuring that linking the DBs won't reveal the identity of subjects recorded in the DB. We are running simulation to compare our method to other PPDM methods and results are encouraging (Kisilevitz et al., ISIPS 2008; Kisilevitz et. al., forthcoming).
Multi-label classification- in this study that stems from needs identified in the ePaper project, we develop a new method for multi-label classification (i.e., automatic classification of items to more than one class), that seem to be feasible unlike existing methods that needs to prepare a model for every combination of single methods. The new methods learns latent relations between the single classes and prepare models only for combined classes with identified relations, thus the number of models is dramatically decreased (Tenebaum et al., ITA 2008). This research is conducted with Dr. Lior Rokach and the Ph.D student – Yelena Tenenbaum.
Graph-based information leakage prevention. In this study we deal with the problem of unintended leakage of confident information from organization based on automatic content analysis of outgoing messages, and their classification to "confident" or "non-confident". We develop a new method for identifying documents as "confident" even of most of the document has non-confident information. Unlike standard classifiers that would consider such "mixed" documents as non-confident, our method is designed to overcome this problem by tagging the training data using a special context graph that represents the context of the keywords related to the non-confident data. The model is based on the context of the non-confident content rather than on the appearance of non-confident terms, thus, the classification is more accurate. This research is conducted jointly with Dr. Yuval Elovici and Gilad Katz (a M.Sc, student).
Data protection for XML-based content using positive examples. This study is funded by the ministry of defense and is conducted with Dr. Yuval Elovici, Dr. Lior Rokach, and Eitan Menahen (a Ph.D student). The research aims at blocking messages in a network by learning and building models for "good" and "bad" messages only from positive examples (as only positive examples for "good" messages exist). We develop a new method that uses clustering for positive learning. Also, the research deals with defining a firewall for XMl-based content considering the special features of XML content (i.e., the structure). We implement the method and run simulation to test and calibrate the algorithm.
Intention prediction using HMM. This research is related to the SmartMobile project at the Deutsche-Telekom Laboratories that I manage, in which we need to predict the intention of the user (her next step) from a sequence of her session data. For this we try to use Hidden Markov Model and learn the probabilities of different types of users to perform different sequences of operations, in order to be able to track the user current sessions and predict her next step. This research is performed with Prof, Lior Rokach and Liat Antwarg (an M.Sc. student).
Additional links
Personal Site