* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 21.1439

Wed Mar 24 2010

Calls: Computational Ling, Text/Corpus Ling/Switzerland

Editor for this issue: Kate Wu <katelinguistlist.org>

LINGUIST is pleased to announce the launch of an exciting new feature: Easy Abstracts! Easy Abs is a free abstract submission and review facility designed to help conference organizers and reviewers accept and process abstracts online. Just go to: http://www.linguistlist.org/confcustom, and begin your conference customization process today! With Easy Abstracts, submission and review will be as easy as 1-2-3!
        1.    Evgeniy Gabrilovich, ACM SIGIR Conference

Message 1: ACM SIGIR Conference
Date: 23-Mar-2010
From: Evgeniy Gabrilovich <gabryahoo-inc.com>
Subject: ACM SIGIR Conference
E-mail this message to a friend

Full Title: ACM SIGIR Conference
Short Title: SIGIR

Date: 18-Jul-2010 - 23-Jul-2010
Location: Geneva, Switzerland
Contact Person: SIGIR 2010 Announce
Meeting Email: announcesigir2010.org
Web Site: http://www.sigir2010.org

Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics

Call Deadline: 30-May-2010

Meeting Description:

Feature Generation and Selection for Information Retrieval
Workshop at the 33rd Annual ACM SIGIR Conference (SIGIR 2010)
July 23, 2010
Geneva, Switzerland

Submissions Due May 30, 2010

SIGIR is the major international forum for the presentation of new research
results and for the demonstration of new systems and techniques in the broad
field of information retrieval (IR).

Call for Papers

We solicit submissions for the Workshop on Feature Generation and Selection for
Information Retrieval, to be held on July 23, 2010, in Geneva, Switzerland, in
conjunction with the 33rd Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval (SIGIR 2010). The workshop will bring
together researchers and practitioners from academia and industry to discuss the
latest developments in various aspects of feature generation and selection for
textual information retrieval.

Modern information retrieval systems facilitate information access at
unprecedented scale and level of sophistication. However, in many cases the
underlying representation of text remains quite simple, often limited to using a
weighted bag of words. Over the years, several approaches to automatic feature
generation have been proposed (such as Latent Semantic Indexing, Explicit
Semantic Analysis, Hashing, and Latent Dirichlet Allocation), yet their
application in large scale systems still remains the exception rather than the
rule. On the other hand, numerous studies in NLP and IR resort to manually
crafting features, which is a laborious and expensive process. Such studies
often focus on one specific problem, and consequently many features they define
are task- or domain-dependent. Consequently, little knowledge transfer is
possible to other problem domains. This limits our understanding of how to
reliably construct informative features for new tasks.

An area of machine learning concerned with feature generation (or constructive
induction) studies methods that endow computers with the ability to modify or
enhance the representation language. Feature generation techniques search for
new features that describe the target concepts better than the attributes
supplied with the training instances. It is worthwhile to note that traditional
machine learning data sets, such as those available from the UCI data
repository, are only available as feature vectors, while their feature set is
essentially fixed. In fact, feature generation for specific UCI benchmark
datasets is scorned upon. On the other hand, textual data is almost always
available in its raw format (in some case as structured data with sufficient
side information). Given the importance of text as a data format, it is well
worthwhile designing text-specific feature generation algorithms. Complementary
to feature generation, the issue of feature selection arises. It aims to retain
only the most informative features, e.g., in order to reduce noise and to avoid
overfitting, and is essential when numerous features are automatically
constructed. This allows us to deal with features that are correlated,
redundant, or uninformative, and hence we may want to decimate them through a
principled selection process.

We believe that much can be done in the quest for automatic feature generation
for text processing, for example, using large-scale knowledge bases as well as
sheer amounts of textual data easily accessible today. We further believe the
time is ripe to bring together researchers from many related areas (including
information retrieval, machine learning, statistics, and natural language
processing) to address these issues and seek cross-pollination among the
different fields.

Papers from a rich set of empirical, experimental, and theoretical perspectives
are invited. Topics of interest for the workshop include but are not limited to:
- Identifying cases when new features should be constructed
- Knowledge-based methods (including identification of appropriate knowledge
- Efficiently utilizing human expertise (akin to active learning, assisted
feature construction)
- (Bayesian) nonparametric distribution models for text (e.g. LDA, hierarchical
Pitman-Yor model)
- Compression and autoencoder algorithms (e.g., information bottleneck, deep
belief networks)
- Feature selection (L1 programming, message passing, dependency measures,
- Cross-language methods for feature generation and selection
- New types of features, e.g., spatial features to support geographical IR
- Applications of feature generation in IR (e.g., constructing new features for
indexing, ranking)

The workshop will include invited talks as well as presentations of accepted
research contributions. The schedule will provide time for both organized and
open discussion. Registration will be open to all SIGIR 2010 attendees.

Submission Instructions
Submissions should report new (unpublished) research results or ongoing
research. Submissions can be up to 8 pages long for full papers, and up to 4
pages long for short papers. Papers should be formatted in double-column ACM SIG
proceedings format (http://www.acm.org/sigs/publications/proceedings-templates;
for LaTeX, use "Option 2"). Papers must be in English and must be submitted as
PDF files.

Papers should be submitted electronically using the EasyChair system at
http://www.easychair.org/conferences/?conf=fgsir10 no later than 23:59 Pacific
Standard time, Sunday, May 30, 2010.
At least one author of each accepted paper will be expected to attend and
present their findings at the workshop.

Important Dates
Submission Deadline: May 30, 2010
Acceptance notification: June 25, 2010
Camera-ready submission: July 5, 2010
Workshop date: July 23, 2010

Invited Speakers
The workshop will feature a keynote talk by Dr. Kenneth Church, Chief Scientist
of the Human Language Technology Center of Excellence at the Johns Hopkins
University. Additional invited speakers are to be announced.

Organizing Committee
- Evgeniy Gabrilovich, Yahoo! Research, USA
- Alex Smola, Australian National University and Yahoo! Research, USA
- Naftali Tishby, Hebrew University of Jerusalem, Israel

Program Committee
- Francis Bach, INRIA, France
- Misha Bilenko, Microsoft Research, USA
- David Blei, Princeton, USA
- Karsten Borgwardt, Max Planck Institute, Germany
- Wray Buntine, NICTA, Australia
- Raman Chandrasekar, Microsoft Research, USA
- Kevyn Collins-Thompson, Microsoft Research, USA
- Silviu Cucerzan, Microsoft Research, USA
- Brian Davison, Lehigh University, USA
- Gideon Dror, Academic College of Tel-Aviv-Yaffo, Israel
- Wai Lam, CUHK, Hong Kong SAR, China
- Tie-Yan Liu, Microsoft Research Asia, China
- Shaul Markovitch, Technion, Israel
- Donald Metzler, Yahoo Research, USA
- Daichi Mochihashi, NTT, Japan
- Filip Radlinski, Microsoft Research, United Kingdom
- Rajat Raina, Facebook, USA
- Pradeep Ravikumar, University of Texas at Austin, USA
- Mehran Sahami, Stanford, USA
- Le Song, CMU, USA
- Krysta Svore, Microsoft Research, USA
- Volker Tresp, Siemens, Germany
- Kai Yu, NEC, USA
- ChengXiang Zhai, UIUC, USA
- Jerry Zhu, University of Wisconsin, USA
This Year the LINGUIST List hopes to raise $65,000. This money will go to help 
keep the List running by supporting all of our Student Editors for the coming year.

See below for donation instructions, and don't forget to check out our Space Fund 
Drive 2010 and join us for a great journey!


There are many ways to donate to LINGUIST!

You can donate right now using our secure credit card form at  

Alternatively you can also pledge right now and pay later. To do so, go to: 

For all information on donating and pledging, including information on how to 
donate by check, money order, or wire transfer, please visit: 

The LINGUIST List is under the umbrella of Eastern Michigan University and as 
such can receive donations through the EMU Foundation, which is a registered 
501(c) Non Profit organization. Our Federal Tax number is 38-6005986. These 
donations can be offset against your federal and sometimes your state tax return 
(U.S. tax payers only). For more information visit the IRS Web-Site, or contact 
your financial advisor.

Many companies also offer a gift matching program, such that they will match 
any gift you make to a non-profit organization. Normally this entails your 
contacting your human resources department and sending us a form that the 
EMU Foundation fills in and returns to your employer. This is generally a simple 
administrative procedure that doubles the value of your gift to LINGUIST, without 
costing you an extra penny. Please take a moment to check if your company 
operates such a program.

Thank you very much for your support of LINGUIST!

Read more issues|LINGUIST home page|Top of issue

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.