Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

Holy Sh*t: A Brief History of Swearing

By Melissa Mohr

Holy Sh*t: A Brief History of Swearing "contains original research into the history of swearing, and is scrupulous in analyzing the claims of other scholars."


New from Cambridge University Press!

ad

A New Manual of French Composition

By R. L. Graeme Ritchie

A New Manual of French Composition "provides a guide to French composition aimed at university students and the higher classes in schools. "


The LINGUIST List is dedicated to providing information on language and language analysis, and to providing the discipline of linguistics with the infrastructure necessary to function in the digital world. LINGUIST is a free resource, run by linguistics students and faculty, and supported primarily by your donations. Please support LINGUIST List during the 2016 Fund Drive.

Summary Details


Query:   Spanish Frequency Counts
Author:  Harriet Bowden
Submitter Email:  click here to access email
Linguistic LingField(s):   Applied Linguistics
Computational Linguistics
Language Documentation
Text/Corpus Linguistics

Summary:   REGARDING QUERY HTTP://WWW.LINGUISTLIST.ORG/ISSUES/15/15-3168.HTML#1

WE POSTED A QUERY IN OUR SEARCH FOR A FREQUENCY DICTIONARY/LIST FOR SPANISH
WITH THE FOLLOWING PREFERRED CRITERIA: (A) ON-LINE OR SEARCHABLE DATABASE,
(B) COUNTS THAT DISTINGUISH DIFFERENT PARTS OF SPEECH, (C) LEMMA (ROOT)
COUNTS AS WELL AS SURFACE FREQUENCY COUNTS AND (D) BASED ON A LARGE CORPUS
WITH A VARIETY OF INTERNATIONAL SOURCES, NOT JUST FROM ONE REGION.

HARRIET WOOD BOWDEN
MICHAEL ULLMAN
GEORGETOWN UNIVERSITY

****
FIRST, WE’D LIKE TO THANK EVERYONE WHO ANSWERED OUR QUERY, POSTED ON
CHILDES AND LINGUISTLIST (PLEASE FORGIVE ANY OMISSIONS):
DONNA JACKSON-MALDONADO
ADELINA ESTÉVEZ
PADRAIC MONAGHAN
MARIA R. BREA-SPAHN, M.S., CCC-SLP
MIQUEL SERRA I RAVENTOS
ANA CODESIDO
DAVID EDDINGTON
CAROLINA IRIBARREN
ADAM ALBRIGHT
SARAH CALLAHAN

SECOND, WE PROVIDE THE LINK TO THE SUMMARY OF RESPONSES TO A SIMILAR
QUESTION, ASKED IN DECEMBER 2003:
HTTP://LISTSERV.LINGUISTLIST.ORG/CGI-BIN/WA?A2=IND0312C&L=LINGUIST&P=R13243

FINALLY, HERE IS A SUMMARY OF THE RESPONSES WE RECEIVED.

1. THE LEXESP CORPUS* FROM THE UNIVERSITY OF BARCELONA ADHERES TO SOME OF
THE LISTED REQUIREMENTS. IT CONTAINS APPROXIMATELY 120,000 WORDS. SYLLABLE
FREQUENCY IS AVAILABLE IN THAT SOFTWARE PROGRAM (CDROM), WHICH IF YOU ARE
INTERESTED, YOU MUST ORDER DIRECTLY FROM BARCELONA. FOR LEXESP YOU CAN GO
TO: HTTP://CLIC.FIL.UB.ES/ (ONCE HERE, GO TO 'DEMOS' > CORPUS TEXTUALES >
CONSULTA CORPUS). THE COMPLETE REFERENCE: SEBASTIÁN, N., MARTÍ, M. A.,
CARREIRAS, M., & CUETOS, F. (2000). LEXESP: UNA BASE DE DATOS INFORMATIZADA
DEL ESPAÑOL. BARCELONA: SERVICIO DE PUBLICACIONES DE LA UNIVERSITAT DE
BARCELONA.
HTTP://WWW.ELDA.ORG/CATALOGUE/EN/TEXT/L0042.HTML

2. THE ALAMEDA AND CUETOS CORPUS*. THIS PROGRAM YOU CAN ACQUIRE BY
E-MAILING DR. ALAMEDA. FOR CUETOS & ALAMEDA YOU CAN GO TO:
HTTP://WWW.PSICO.UNIOVI.ES/REMA/CONTENT.HTML
(JUST TO READ SOMETHING ABOUT IT. THE CD COSTS ~ 28 EUR)

*BOTH OF THESE DATABASES INVOLVE CASTILLIAN SPANISH. IN USING THEE
DATABASES, YOU MUST ALSO BE AWARE THAT LEMMAS AND THEIR DERIVED VERSIONS
ARE INCLUDED ON THE SAME LIST. THEREFORE, IF YOUR INTENT IS TO USE THESE
DATABASES TO COMPUTE THE PROBABILITIES OF SUB-SYLLABIC COMPONENTS, YOU MUST
CLEAN THE DATABASE OUT, OTHERWISE YOUR CALCULATIONS WILL BE INFLATED.

3. THE DICCIONARIO DEL ESPAÑOL DE MÉXICO, DIRECTED BY LUIS FERNANDO LARA AT
EL COLEGIO DE MÉXICO HAS SUCH A DATA BASE. YOU COULD FIND HIM THROUGH THEIR
WEB PAGE: HTTP://MEZCAL.COLMEX.MX/DEM/

4. L0042 : PAROLE SPANISH LEXICON:
THE PAROLE SPANISH LEXICON FOLLOWS STANDARD PAROLE ARCHITECTURE WHICH
INCLUDES MORPHOLOGICAL AND SYNTACTIC LAYERS. IT INCLUDES THE MOST FREQUENT
WORDS FOUND IN A 1 MILLION WORD CORPUS, CODED ACCORDING TO THE PAROLE
SPECIFICATIONS.

THE LEXICON CONTAINS ABOUT 22,000 MORPHOLOGICAL UNITS, OF WHICH 12,209 ARE
COMMON NOUNS, 3,367 VERBS, 4,996 ADJECTIVES. CLOSED CLASSED CATEGORIES ARE
FULLY COVERED.

THE INFORMATION ASSOCIATED WITH EACH MORPHOLOGICAL UNIT CONCERNS
PART-OF-SPEECH AND SUBTYPE, INFLECTION PARADIGM (WITH MORPHOSYNTACTIC
INFORMATION FOR THE ENDINGS ORGANISED IN ABOUT 132 MODELS), POSSIBLE STEMS
IN RELATION WITH THE RELEVANT ENDINGS, LINKING WITH SYNTACTIC LAYER. IN THE
SYNTACTIC LAYER, INFORMATION REGARDING SUBCATEGORISATION FOR VERBS AND
INSERTION CONTEXT FOR NOUNS IS ENCODED FOLLOWING THE PAROLE MODEL.
HTTP://WWW.ELDA.ORG/CATALOGUE/EN/TEXT/L0042.HTML

5. THE LDC WEBSITE MIGHT BE USEFUL, BUT I HAVEN'T FOUND ANYTHING MATCHING
YOUR NEEDS THERE FROM A QUICK BROWSE:
HTTP://WWW.LDC.UPENN.EDU/

6. A NEW FREQUENCY DICTIONARY OF SPANISH IS IN PRESS AT ROUTLEDGE. THE
AUTHOR IS MARK DAVIES (MARK_DAVIES@BYU.EDU). IT IS LEMMATIZED AND TAGGED.
THIS APPEARS TO BE THE LINK:
WWW.CORPUSDELESPANOL.ORG

7. AN OLD, BUT GOOD REFERENCE IS THE A. JUILLAND AND E. CHANG-RODRÍGUEZ
(1964) FREQUENCY DICTIONARY OF SPANISH WORDS. IT HAS INDEX OF FREQUENCY
AND OF USE FOR TOKENS AND TYPES. IT LISTS ALSO THE MOST COMMON FORM OF THE
WORDS STUDIED. IT IS OLD AND NOT IN ELECTRONIC FORM, BUT IT MIGHT BE USEFUL.

LL Issue: 15.3326
Date Posted: 29-Nov-2004
Original Query: Read original query