LINGUIST List 16.1366|
Fri Apr 29 2005
Sum: WebCorpus Counts
Editor for this issue: Jessica Boynton
To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
Message 1: WebCorpus Counts
From: Jerry Kurjian <jkurjianmail.sdsu.edu>
Subject: WebCorpus Counts
Regarding query: http://www.linguistlist.org/issues/16/16-1291.html#1
Below I summarize the comments of Andrew Kehoe and Antoinette Renouf
(5/27/2005), two of the creators of WebCorp, who kindly replied to my query
concerning WebCorp in thread 16.1291 and on Corpora list (corpora AT uib.no):
Within a webpage, WebCorp will gather as many kwics per page as there
exist, if the ''one hit per page'' option is not checked. Across webpages,
WebCorp only gathers hits from up to 200 webpages. Getting fewer than 200
hits might mean that you have chosen to filter some out features out, that
some of the 200 webpages were not accessible to WebCorp or had change, or
that there are fewer than 200 pages that have the search term.
Finally, the authors say they are continuing to upgrade WebCorp, and in an
upcoming version plan to add frequency counts, type/token ratios,
collocation profiles, and ''other statistics.''
Linguistic Field(s): Text/Corpus Linguistics
Respond to list|Read more issues|LINGUIST home page|Top of issue
Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.