In my previous post (about finishing initial processing of GallinaBlanca dictionary) I mentioned that verbs can be of some use in interpreting menus, possibly through derivatives of the infinitive form of the verb. So I’ve continued to do some digging in this area and have a few results to share.
Anticipating I’d be looking at verbs, independently of extracting them from the GB dictionary I used about nine online “lists” to compile an aggregate list. These verbs: a) may have nothing to do with cooking or cuisine, b) tend to be more commonly used verbs, and, c) may not be used (at all, or in same way) in Spain. So this is the list I’m calling C.
In the process of other searches I stumbled onto a culinary glossary. It has no connection with Spain and therefore the Spanish words might come from any part of the world. And as I worked with it more extensively and carefully I observe many of the issues with online resources of unknown origin: a) misspellings (probably, don’t want to jump to conclusion just because words seem to be misspelled), b) duplications, often including the singular and plural form, c) words that make no sense appearing in Spanish culinary dictionary (how did these drift in), d) inconsistent formatting and thus order (e.g. A la cazuela vs Cazadora, A la). In a previous iteration of my project I created a “glossary” by merging information from many sources and eventually it became a pisto (hotchpodge, if I can use that word in a non-culinary sense), especially losing any notion of whether the words applied to Spain or some other Spanish speaking area. So with these caveats I’ll call this list G.
And I have my list of verbs from the GallinaBlanca dictionary which I previously described. I’ll call this list D.
Now, simply, it’s too much work to compare the entirety of all three of these lists so I just did the subset (verbs only, of course) of verbs starting with A B or C. While this may be a biased sample it still reveals some interesting information.
Sorting the three lists together (with different fonts and colors for each list so I can distinguish) then I did manual processing to consolidate like terms together. As a result I ended coding each entry with GDC (or – if not in that list). So I generate the following table:
There are 126 verbs that appear in at least one of these lists. Only 5 verbs appear in all three lists. The list with the largest number of unique verbs is the G (glossary, 44), which thus indicates this is potentially very useful as it adds over 50% more verbs than I had previously found. The verbs in the C (common) list may have nothing to do with cooking or food (we’re explore that later in the post) so this may not add much. Only 5 verbs from the GallinaBlanca list don’t appear in the glossary list so whoever compiled that got most of the cooking verbs.
So looking at the verbs that are only in the C (common) list and not in either cooking related list we do see a few surprising omissions (I’m assuming that these are SO common no one bothers to include them):
|abrir||–C||to open; to turn on; to whet (as in appetite)|
|calentar||–C||to heat, heat up, warm up; to inflame|
|combinar||–C||to combine, mix; to put together, match, coordinate|
|comer||–C||to eat; to have for lunch; [Latin America] to have for dinner|
|concinar||–C||not in any dictionary, probably misspelling of cocinar|
|convertir||–C||to turn into, convert into, change into, make|
|cortar||–C||to cut, cut off, carve, slice, cut out; to chop; to cut (dilute sense); …|
So out of the 35 verbs in the C (common) list only I’d probably include these 11 in a general purpose culinary list.
Now some of the verbs in the G (glossary) don’t appear to be useful. Some have no definition in any of the dictionaries I routinely use, including the most authoritative of the Spanish language (which is NOT limited to Spain so could include verbs that don’t get used in Spain). So here are a few I’d consider dubious to include in a culinary glossary:
|achicalar||G–||[Mexico] to cover in honey; soak in honey|
|añejar||G–||to age; [vino] to mature; to get stale|
|apanar||G–||to coat in breadcrumbs (also EMPANAR or EMPANIZAR)|
|apuntillar||G–||to finish off (a toro); to round off|
|ataviar||G–||to dress up|
|blanchir||G–||(not in dict) Wiktionary has it as a French term for make white|
|bresear||G–||(from glossary) To cook to slow fire, during long time, with condiments (generally vegetables, wine, broth and spices). Clearly a spelling error since not found.|
|cantar||G–||to sing; to crow, chirp|
|caramerizar||G–||(not in dict), another spelling? [from glossary] Spread a mold with sugar honey.|
|castigar||G–||to punish; to ground, keep in; to damage, harm|
|cerner||G–||to sift, sieve (same as cernir, which is it?)|
|chapurrar||G–||to speak badly|
I wouldn’t include achicalar as it doesn’t appear to be used in Spain but this is a good point about my goal here. If I wanted to know the Spanish word, used in Spain, for an English word, I wouldn’t include anything that may be only used outside Spain. But my goal is asymmetric – to translate Spanish (on menus) only into English (so I can choose) so including a word in my corpus (and eventually my app) that is not likely to be used in Spain is not a problem (I do need metadata to note this however, for that term). If I never see the term it does no harm to never have it found in any lookup. OTOH, it would be a problem if I’m trying to translate English into Spanish, as in don’t use a word not found in Spain. It appears, for instance, frijoles, which is well-known to most in USA who visit Mexican restaurants is one such word, not commonly used in Spain, but possibly likely a Spaniard would know the word. That might lead to a scene (from The Way) like no tapas in Navarra, only pinxtos, and thus make you look foolish.
blanchir (to make white, which isn’t exactly synonymous with blanch but one might assume that’s what this means) was interesting in that it did not occur in any dictionary but did have an entry in Wiktionary. The standard term for blanch is palidecer (purely in the sense of turn white) and escaldar or blanquear for the culinary sense. I suspect blanchir might be used somewhere (possibly Puerto Rico) where it is just the cognate of the English verb. But, again, in collecting the corpus I should not make judgments like this although I might add metatext to an blanchir entry and meanwhile add it to corpus and then let the “big data” statistical analysis decide if this is a word or not.
bresear really looks like a misspelling (more likely to be brasear, to barbecue) but again it should go into the corpus with metadata notion rather than my passing a judgment on it (IOW, only a real expert in Spanish should be decided what to include or not in any translation dictionary, so if I find only one instance of a misspelled word it will get washed out since there are few occurrences of it in the corpus; OTOH, maybe people do commonly misspell this word so it needs to be in my app). caramerizar appears to be some variant of caramelizar, again perhaps used somewhere and not just a mistake. cerner has exactly the same definition (in the glossary itself, but also spanishdict) as the more common spelling cernir, although both appear in reverse lookup of ‘to sift’ in spanishdict (which is it, then? just a common confusion?) cernido is a possible term to see on a menu so it matters that my dictionary could spot this as past participle of cerner.
So again all this goes to show the work that must be done to really develop a very accurate dictionary that drives my app for menu translation (or to be published as a carefully researched culinary glossary).