A few terms from ensaladas

I’m continuing to extract terms from a large set of recetas, having switched from postres (desserts) to ensaladas (salads).  Now thinking about salads there is a lot more diversity than merely leafy green stuff with some dressing so this is another lode of terms to find and add to my corpus. So here are a few fragments I’ve found:

Ensalada de verdinas con perdiz escabechada, receta fácil Salad of verdinas with pickled partridge, easy recipe

As usual terms that Google Translate doesn’t translate or has silly answers catch my attention, so what are verdinas? Oxford has an entry that translates to ‘moss’ and it’s plausible a salad might include moss. But this is what makes this source so useful, it’s not just titles of dishes, but the full recipe (ingredients and instructions) and a photo of the dish. In this case the photo reveals the clue to verdinas, showing a bag of alubia verdina which are called Verdinas De Nuestra Tierra in the ingredient list. IOW, since I’ve seen alubia often this is just a specific type of bean (visible in the photo) described here.

So moving on:

Remojón  granadino, receta fácil para el Verano Remodo  granadino, easy recipe for summer

Why Google Translate translate remojón to ‘remodo’ remains a mystery as I can’t find any association. Oxford literally translates remojón to ‘soaking’ and granadino to ‘of Granada’ which doesn’t help much. Fortunately this has no English equivalent but is

a specific recipe with oranges, cod, onions, tomatoes and olives, soaked in olive oil for at least four hours.

so an item like this has to be entered in my corpus with a “description” rather than a translation.

And finally:

Salpicón de bogavante con vinagreta de su coral Lobster salty  with vinaigrette of its coral

So we have two mysteries here: 1) what is a ‘salty’ (presumably the translation of salpicón), and, 2) what is ‘its coral’ (untranslated from coral in the Spanish)?

salpicón is the easier one since it’s a particular preparation of “chopped seafood or meat with onion, tomato and peppers” described here so ‘salty’ is a mysterious translation and inaccurate.

Salpicon (or salpicón, meaning “hodgepodge” or “medley” in Spanish) is a dish of one or more ingredients diced or minced and bound with a sauce or liquid.

But to figure out coral required looking at the recipe which fortunately describes it thusly:

the contents of the inside of the head (of the lobster) and the dark colored matter that is full of flavor

While I couldn’t find any English equivalent for coral (or any definition that matches the recipe) I believe this is a delicacy that some adventuresome foodies like. Now I’ve use the heads of shrimp and their shells to make stock so I suppose this is the same but this sound pretty yucky to me, which means if I had this salad and quite possibly enjoyed it I’d rather not know what coral is.

As the last tidbit the recipe text also includes two interesting terms:

  1. brutal bogavante which Google translated to ‘brutal lobster’. What’s this, some lobster with monster claws that fights back? Actually Oxford did explain that brutal has a colloquial meaning of ‘incredible’ or ‘amazing’ which is a lot more appealing (and reasonable guess at translation)
  2. and un platazo which didn’t appear in any dictionary but was found by search in an obscure (scanned) old text as ‘great dish’ which does fit the rest of the context so also is a likely translation.

These “guesses” I sometimes make have some amount of likelihood of being correct. I’m fairly certain of something like verdinas as a type of bean, but it is a guess and therefore has to be entered in my corpus which some uncertainty. And brutal and platazo have even less authoritative evidence and so would have higher uncertainty.  The Google Translates corresponding English to Spanish also can not be viewed as “certain”. Probably only translations appearing in one of the authoritative dictionaries can be entered as p=0.999 in the corpus. So getting as much volume as possible so every term in the corpus has multiple instances will be key to getting the best possible translation dataset.

 

Advertisements

A blogging dilemma

I’m using this blog (partially) to “document” interesting tidbits I encounter while doing research for my anticipated smartphone app to translate menus in Spain. That app needs to have a comprehensive and accurate dataset to use in the translation, not just the equivalent English term (which doesn’t always exist) but also some description. For example, what is sobrasada? Yes, it’s ‘sausage’ but saying that (or even ‘spicy pork sausage’) doesn’t tell you very much.

So I’m using various sources to build up a “big data” corpus which will have translation errors and other errors. But algorithmically I can extract from that corpus what I’ll need to power the app. But I have to build that corpus manually, often exploring “puzzles” I find in trying to figure out a proper equivalent in English for some culinary item I find in Spain (btw, I am focusing on Iberian Spanish and trying to prevent terms only found (or used differently) in the New World from defocusing my corpus).

So I’m doing several things with these posts. First they are a kind of journal (or lab notebook) for various translation/description puzzles I try to solve. While I have many MSWord files with the raw work the blog posts highlight some interesting (at least to me) bits. Second by writing for potential readers I have to work a bit harder to try to have my posts accurate and at least somewhat coherent (instead of the real-time stream-of-consciousness in my raw material). This more careful writing makes the posts better but does have a real downside – it’s SLOW. It might not seem like it to you, Dear Reader, but I probably spend more time writing a post about something interesting in a menu than it took me to decipher the entire menu. So at some point the blogging gets in the way of my work.

But the real “dilemma”  I have is that I just don’t get the posts done, at anywhere near the rate I’m discovering the tidbits I want to write about. And days later when I go back over my raw data I often can’t recreate my thoughts or discover I forget to include links or definitions or whatever and don’t much feel like repeating my work.

My posts are fairly long which is good and bad. It’s good because I try to weave multiple points into a post, often with some background research. It’s bad, because the posts are probably too long for most readers’ attention spans and because I don’t get them done.

So every now I’m tempted to do short posts, literally for each situation I encounter, rather than trying to organize multiple examples into a single post.

For instance, I’ve started looking at a new source. Previously I’d used menus I could extract from restaurant websites along the course of the Camino de Santiago, and several online glossaries and dictionaries. But I’d also stumbled on many sites (focused on Spain and entirely in Spanish) for recetas (recipes). These are more tedious to process but often contain information I don’t find elsewhere and therefore can stuff in my corpus so potentially less frequently used (in menus) terms are still incorporated.

So I just started a small trial to look at this recipe site. Under its recetas tab it has 14 categories, and under Pasta y Arroz (pasta and rice) there are 15 webpages with about 12-16 recetas per page. IOW, this is a lot. And every receta is presented on the webpage as a caption (to a photo) where I can use Google Translate and then manually produce a side-by-side Spanish and English pair, such as:

Ñoquis de calabaza y boniato con salsa de gorgonzola Pumpkin and sweet potato gnocchi with gorgonzola sauce

For this I’d extract for my corpus ñoquis (gnocchi ), calabaza (pumpkin), boniato (sweet potato), salsa (sauce), and gorgonzola (gorgonzola). If I double check these term associations by looking in the Oxford dictionary or the DLE (more authoritative, but harder to use than Oxford) I  could add these associations to my corpus with higher confidence levels. IOW, mistakes are bound to get into the corpus without a lot of checking, but I’m also hoping the “big data” type filtering will eliminate the spurious pairs.

But what I just described as the process in this post took me quite a bit more time than it did for me to extract the side-by-side pair (still tedious but relatively quick) and do a quick visual parsing (really looking for any terms that require more research). Note that while I have no fluency in Spanish I do know a bit about the grammar and thus know how to spot parts-of-speech and change the word order used in Spanish to my normal English and thus find the term-by-term association. This entry was simple to do and the only (slightly) interesting part is that the original ‘gnocchi’ does have a different word in Spanish but ‘gorgonzola’ doesn’t (and as a somewhat interesting question, are these “Italian” words or now so incorporated in English, at least by foodies, to consider them English words (known linguistically as ‘loanwords’).

So of the first webpage of pastas this was the most interesting puzzle:

Escudella con sopa de galets, el plato estrella de la Navidad catalana Escudella (in Oxford as -dilla, but some searches appeared with this spelling; is it a typo? here? and on web?) with soup of galets (is this short for galettas?), the star dish of Catalan Christmas

but Oxford has it with a definition (didn’t have translation) in which case it was a specific dish

no, galets appears to be a type of pasta (shells) https://www.tienda.com/products/galets-nadal-pasta-sandro-desii-su-40.html

This is my raw entry. Since escudella and galets appear in the Google Translate as same word in English (i.e. not translated or perhaps there is no translation) this is the type of thing I look for to do more research. When I merely asked Oxford for the translation of  escudella it said that was missing. What it does show (helpfully) is close matches which in this case I tried its suggestion of escudilla (which is bowl and kinda seems to fit this recipe name). So you see the note I made to myself (in Oxford as -dilla, but some searches appeared with this spelling; is it a typo? here? and on web?) but that’s just a start. Since I’ve done this a lot I immediately used the Oxford a different way; instead of asking for translation I asked for definition (of escudella ) and it had this in Spanish (then with Google’s English:

Plato que consiste en un caldo de carne y hortalizas, colado, en el que se cuece arroz, fideos u otro tipo de pasta; es un plato típico de Cataluña, comunidad autónoma de España. Plate consisting of a broth of meat and vegetables, strained, in which rice, noodles or other type of pasta are cooked; It is a typical dish of Catalonia, autonomous community of Spain.

Now I could immediately point out that Google’s translation of plato as ‘plate’ is not correct as plato also means ‘dish’ which fits better but that’s the typical kind of digression I get into that just makes posts take even longer.

Now meanwhile I thought I recognized galets. I did a previous post about the menu from a store selling cookies (as a bit of diversity from just restaurant menus). So I double checked by asking Oxford for the Spanish translation of ‘cookie’ (which is lists also as biscuit in British English) and it has galletas (as I thought I recalled). So I thought this might be some colloquial term for cookie.

But now my “translation” ‘bowl with soup of cookies’ is pretty obvious nonsense and so no better than the untranslated correspondence. So, since this is a new source and I’d already discovered I could click on each receta and get a full page explanation (intro to the disk, ingredients, preparation) I began to see the flaws in my attempt to unravel this puzzle. As the recipe page itself is entirely in Spanish I have the same kind of puzzle, i.e. Google again botched some of the translation. But there is enough text and importantly a picture that I could try some searching and I found galets as an item I can buy online (I’ve often used this source in this project). These look like (in both the recipe picture and the tienda picture as fairly ordinary pasta shells (I don’t see what’s special about them) but pasta shells are pasta shells (except maybe tiny details) so now I’d know what I am getting if I’d picked this off a menu in a restaurant.

So finally I know both these words don’t have English translations so I’d want a different kind of entry in my corpus of a short description and then potentially a longer one. Thus a diner using my app could learn about this dish.

So there, you see what I mean. This post has taken me far longer than the original analysis. Yet it’s good (for my purposes, hopefully somewhat interesting to you, Dear Reader) to have this more complete explanation (I can re-read this post someday when I’ve completely forgotten this and have to resolve something in my app). But if I’d simply written this one item in the most brief form (to jog my memory later, plus at least some glue prose to make it read better than my raw notes) I would have gotten this done.

But it also means I’d probably have many more posts which is mixed benefit as well. So, IOW, there really isn’t a great answer.

So I have a solution. I can use categories to distinguish the posts that are really minimal and that I create almost immediately after doing the work for the corpus. These will really be post “fragments” but at least I get more recorded.

For instance, I was looking at a menu on Friday and its Menu del Dia was for Mother’s Day so I had in mind a post to create on the 5th. But instead I spent most of the day cooking for our Cinco de Mayo feast (and drinking a few too many margaritas). So I never did that post and now the “joke” of it is gone as its timeliness is past.

So I’ll continue to struggle with this, fragmentary and terse posts, or (sometimes too long) complete posts.

A few random bits

Rather than a focused post I’ll just catch up on a few disparate items.

First I’m recording another milestone along my virtual trek which is arriving in Burgos. Burgos was one of the main locations in the movie The Way (where Tom’s pack was stolen) and its main feature is the cathedral. A virtual trek, (i.e. actually exercising on a treadmill in the basement and transferred the accumulated miles onto a GPS trace of the Camino de Santiago) may seem silly but it serves two purposes for me: 1) walking on a treadmill is really boring so I need to have some goal and sense of accomplishment, since I need the treadmill exercise (esp. during the winter here) so I’m in shape to do some real outside walking, and, 2) the slow pace gives me a chance to fairly thoroughly investigate the route (using satellite views, Google StreetView (often available on the Camino and I see lots of peregrinos) and Points of Interest (so I look at photos of albergues and restaurants, plus sometimes find menus). It’s certainly not the same as the real thing but better than nothing.

Before reaching Burgos I’d not found any online menus in other small towns on my virtual trek since Logroño so I had begun to extract terms from a couple of glossaries I’d previously found. I’d already spent a long time (previously reported) on the GallinaBlanca online dictionary so I was also interested in seeing whether the two other lengthy lists I’d found would just be redundant. So that led me back to a bit of coding (haven’t done that for a while) in order to automate the comparison (each extract I’d done was in an incompatible format so first my code had to generate a canonical extract to compare). During that process one of my lists just disappeared (I was only about 1/4 done with it). That’s disappointing since it was a good list and had many terms I hadn’t previously found. Crunching through dictionaries or glossaries is very tedious and nowhere nearly as interesting as looking at menus (which is the purpose of my project here). But it’s a different way to get a sufficiently large corpus to feed into the menu translator I’m building.

So with Burgos on the horizon I began, once again, to focus on restaurant menus. In the small towns I find the restaurants directly as Google Maps POI’s which are clickable to get some info (esp. user contributed photos) and perhaps then linked to a website. Those with websites (fairly uncommon on the small places in small towns) might have a textual menu (many just have photos) and that allows me to generate side-by-side Spanish and English (usually translated by Google Translate, sometimes other ways) terms that I’ll feed into my corpus. Without all the fancy deep learning AI Google uses to train their translator I’ll be using a more algorithmic process to train mine, but mostly to spot Spanish terms that have multiple translations and try to determine the best (more on that below).

So for Burgos the area is quite large (you have to zoom in a lot on Google Maps for the POIs to appear) so I used a different approach. There are numerous rating services for restaurants (I only partly trust them here in USA, so no clue whether they work well in Spain) so just because it has a convenient format I used the Trip Advisor list, which has a total of 376 restaurants. I’ve only looked through the first 40 or so. Less than half of these have websites and probably only about half of those have text I can scrap off the website (often the menu is a photo or some other type of document where the browser can’t select any text that I can then paste in my working document). So with this vast amount of material I’ve been quite busy with menus, having now crunched through six already (with some stories to tell). And I’ve got enough more to finish to keep me busy as in fact my virtual trek has already left Burgos.

But as a random tidbit, tied to the notion of producing entries for my corpus, is the variable translation of the term ración. And I do mean translation (not definition) and usually by Google. The simplest (and most frequent) literal translation is ‘ration’ but even seeing exactly the same word (although sometimes modified with 1/2) on the same page Google translates it differently and also as ‘portion’ or ‘serving’. That’s a bit of a mystery to me why there is the inconsistency but of course Google claims (in its limited online explanations of how Google Translate works) that it is “context-sensitive” in doing translations (IOW, Google also had a large corpus, mostly of translated material in the United Nations, that their AI analyzed to decide both the translation and the “context”). But within a single website, all about food, one would think the context would always be the same. But it’s not the webpage that represents “context” (I realized) it’s the source corpus where “context” is being deduced. So the notion of using “context” to improve translation doesn’t mean quite what one would think.

Now instead of translation here’s what Oxford has as definitions:

1 Cantidad de alimento que se da en una comida a una persona o animal. Amount of food that is given in a meal to a person or animal.
2 Porción unitaria de algo que puede dividirse en varias partes iguales. Unitary portion of something that can be divided into several equal parts.
3 Cantidad determinada de alimento que se toma como aperitivo entre varias personas o comida informal; suele tomarse como acompañamiento de una bebida en un establecimiento público. Quantity of food that is taken as an aperitif among several people or informal food; It is usually taken as an accompaniment to a drink in a public establishment.
4 Cantidad suficiente de algo, generalmente la que se consume en un solo día o a intervalos regulares por una persona o animal. Sufficient quantity of something, usually that which is consumed in a single day or at regular intervals by a person or animal.

Since porción is literally portion it makes some sense to have that as a translation (along with ‘helping’ and ‘serving’) the part of the definition that seems to make the most sense in the context of a restaurant menu is #3 (also #2) more than the sense of the literal ‘ration’ (as in #1 or #4, more a military term). But it is also a quantity designation (more than pincho) even if it is only consumed by one person. Now deciding how much a 1/2 or 1/4 ración is yet another challenge but it appears most restaurants do price a 1/2 at more than 50% of the price of a whole, so if you want a whole order it as two 1/2’s will cost a lot more. IOW, you probably need to be able to discuss this with your server, once again evidence that a menu translator (vs fluency in Spanish) is not going to be sufficient.

Finally as yet another random tidbit one dessert item that didn’t translate (as I’ve described before, it just is what it is) was mantecado. It wasn’t heard to find this (I thought it might be a brand but it’s just the name of a cookie) with an interesting description (here) where it is described as being similar to polvorón which has its own Wikipedia page (here) that also that mentions mantecados and says they are not the same as polvorón (you could fool me looking at the pictures in that page).

From that same menu (here) for the item espárragos cojonudos Google Translate doesn’t have English for cojonudos (espárragos is asparagus in case you’re wondering). Tracking down cojonudos with search quickly led to the connection to cojones which is a term many Americans know as part of slang but it’s not clear how ‘ballsy’ would apply to asparagus . But this article assures us the slang meaning is not the relevant one and the more respectable is ‘awesome’ or ‘outstanding’. Furthermore a particular asparagus from Navarra chooses to label itself with cojonudos  so I guess the connection to cojones doesn’t bother them (or maybe they’re not aware of the etymology of cojonudos).

 

Updated the Spanish Term Index page

You can see on the menu bar of this post “Spanish Term Index”. This is a “page” (not a post) in WordPress.com terminology. I just caught up a bit and have now indexed terms on the oldest 20 posts (of now 67). I only include terms that I discuss enough in a post to get a reasonable understanding of the term (casual mentions without definition I exclude).

So I have a lot of work to do to catch up with all the more recent posts.

Fortunately WordPress.com and MicroSoft Word cooperate with each other. MSWord has a variety of tools to make updating the list easier. When I’m done then I can copy the list from MSWord and just paste into WordPress.com’s page. This page is going to get very long, once I catch up and then keep updating from new posts.

In addition to providing a guide to you, Dear Reader, this also provides me the opportunity to quickly see if I’ve done terms in previous posts and thus avoid a later post (unless I have new information) that would be redundant.

Note that the index does NOT provide the English translation (you’ll have to click on the links to see that). So I have another page, now out-of-date as well that will be my accumulated glossary, which I hope, someday, to be the most complete and accurate glossary you can find on the Net. Right now it’s more an experiment than my actual “authoritative” (i.e. researched) glossary but [eventually] it will be my glossary.

WordPress.com isn’t the best tool for compiling a glossary but it’s all I’ve got. OTOH, glossaries (or dictionaries) I’ve found elsewhere on the Net aren’t so great either (either the method of access or their content). Maybe if I get a really solid and very good list I’ll spring for a website and build some interactive code to be able to lookup these food terms from Spain. If not a smartphone app (that is, something a lot more portable that could work offline) at least you could come to my glossary, with the browser in your phone (if you have a connection) to get information about menus. That may have to do until I can figure out how to actually code an app and have a really good term list for it.

 

Small experiment

Most of the time I’ve spent on this project has involved looking at various source documents from Spain, then with multiple methods of doing translations. Ultimately the point of all this is to build a large corpus of “pairs” (words or phrases in Iberian Spanish and English translation (or some kind of equivalent). Critically I also need to add some measure of how likely the pairs represent valid equivalents so the code (yet to be done) can attempt to establish the probability of the consolidated list of pairs being correct. And also it has to handle the ambiguity, for instance, very common with ternera (is this veal or beef or both? as it often seems to be used for both.) And the multiple and overlapping and contradictory terms for shrimp vs prawns vs langostines (the small rock lobster) is a strong example of confusion on menus.

So given I haven’t yet designed my corpus or the code in ingest new pairs into the corpus and then process the related pairs I have to do experiments, by hand, on a smaller dataset to attempt to visualize the challenges I will face when this is all done with code on a much larger corpus.

So I recently processed an extensive menu from a single restaurant in Granada and just before that two restaurants in Santo Domingo de La Calzada, La Rioja. By process I mean the mostly mechanical work of getting entire sections of menu text side-by-side in original Spanish and then the translated English. Then I look for untranslated terms or silly translations to try to find other sources on the Net (often recetas) to determine the correct correspondence, for instance, manos de ministro is NOT minister’s hands but a colloquial version of the more common manitas de cerdo, or pig’s trotters (feet).

So having done this I’ll provide a few results. In total I ended up with 277 “pairs” with 50 of those on both lists (and thus likely to be very common food terms from menus – see list below). The two restaurants in Santo Domingo de La Calzada contributed 132 unique pairs and the Granada restaurant contributed 95 unique pairs. The various terms in the list are sometimes not that specific to food, for instance:

  1. blanco and negro, colors but used as qualifiers of chocolate in menus; rosada (pink as a color) ended up being quite a chase when it referred to a specific fish.
  2. aroma or chocolate which are the same in Spanish and English but I include them even though it (and others like it) are obvious loanwords as a piece of code doesn’t just “know” this and has to be told.
  3. especialidad (specialties) or vinagreta (vinaigrette) or salmón (salmon) even though these are easy to guess, eventually an app doing translation still needs to recognize these terms.
  4. arrozcarnedulcehuevoleche, panpescadopolloqueso, salsa and vino that are used so much, not just in Mexican restaurant menus but even in TV ads we can effectively consider these loanwords into English now, but again, a computer program doesn’t know that and so still needs to have this in the corpus that will then be the key to its translation.
  5. I did try to consolidate terms that have alternate gender forms and/or singular/plural but didn’t do this as precisely and consistently as a really good corpus would require

While just findings lists of food/cooking terms is easy on the Net whether they are correct or apply to Spain is more problematic. Even a source like a dictionary should be taken with a small dollop of skepticism. Certainly asking any of the various voice assistants is not going to have a very high accuracy rate. So it is necessary to: a) try to focus on sources and thus pairs that are really for Spain and not somewhere in western hemisphere (unless you, Dear Reader, are planning a trek in Bolivia, then do as you need).

So that was my experiment and I end with this list of 50 pairs that are so common you’re very likely to run into them BUT even this list is not 100% accurate as there are various issues with translation (see previous posts).

Cover up the right-hand column and see how many of these you know.

a la plancha grilled
aceite de oliva olive oil
anchoas anchovies
arroz rice
asados roasted
atún tuna
bacalao cod
blanco white
café coffee
Cantábricas/Cantábrico Cantabrian
caramelizados caramelized
carne meat
casera/o caseras/caseros homemade
cerdo pork
chocolate chocolate
comida meal
croquetas croquettes
deliciosa/o deliciosas delicious
dulce sweet
ensalada salad
frita/o fritas fried
guarnición garnish
helado ice cream
huevo egg
jamón ham
langostinos prawns
leche milk
lomo loin (generically; or cured meat specifically)
miel honey
pan bread
patata potato
pato duck
pechuga breast
pescado fish
pimientos peppers
plato dish
pollo chicken
postre dessert
pulpo octopus
queso cheese
revuelto scrambled
salsa sauce
solomillo tenderloin or fillet
tarta cake, also pie
ternera beef (alt: veal)
tomate tomato
tosta toast
vainilla vanilla
verdura vegetable
especialidad especialidades specialty

Mystery post – pez/peces or pescado

My title contains some bits of useful information. While I’m not absolutely certain some sources say peces is the plural of pez. Of course in English the plural of fish is fish so peces seems relatively uncommon. pecado also translates to fish BUT the key difference is that pescado is the piece of fish on your plate and pez is the living animal.

I let Google Translate loose on my previous “mystery” post and it had three types of results: 1) a few of the words translated correctly, 2) some translated but to nonsense, and, 3) some were missed altogether. I’ve tracked a few of the latter.

My big list of words (with cognates or loanwords removed to avoid giving a clue) was a lengthy list of the names of fish, probably as they are called in Spain. I found two long lists on the Net with Latin (scientific names) as well as names in English, Spanish and some other languages. Both were European sources so less likely to include fish found primarily in South America, but who knows how lists get compiled.

Plants and animals from natural world (versus cultivated plants/animals) are frequently misidentified and very tough to get accurate common names. Sometimes even the scientific names are in dispute or contradictory so big surprise the more colloquial names are. After all who but ichthyologists, some fisherman and a few fish mongers actually know these names accurately and/or could just by looking at a fish decide what to call it.

So this is probably the toughest area to compose an accurate Iberian Spanish to English translation list. I’m going to have a third post in this series about the names I conclude are fairly likely but for now here’s a subset of the list from the mystery post that Google failed to translate at all.

alfonsino Golden eye perch
badexo Lythe or pollack
boga bogue
brama bream Pomfret
brotola de roca Greater forkbeard
calion Shark, porbeagle
callas Callas
capelan capelin
chicharro scad – also called horse mackerel
chincharro Horse mackerel or scad
choupa Black bream or porgy or seabream
chucla picarel
cigala crawfish Norway lobster – also called Dublin Bay prawn
colin Coley or saithe
côngrio conger eel conger eel – also called conger
coregono whitefish
escolano smelt – also called sparling
espadilla frostfish – also called silver scabbardfish
espadín sprat sprat – also called brisling
espárido sea bream
illiseria megrim
lanzon sandeel – also called sand lance
limanda dab
longeirón razor clam – also called razor shell
lucioperca pike-perch
lumpo lumpfish Lumpfish
maganto Dublin Bay prawn or langoustine or scampi
mendo Witch or Torbay sole
merlan whiting
mollera poor cod
muergo razor clam – also called razor shell
musola smooth hound – also called dogfish, flake, huss, rigg
pardete Grey mullet
pejerrey silver side, sand smelt argentine – also called silver smelt
pejesapo angler fish Anglerfish or monkfish
perlón Grey gurnard
pescadillo Hake
plegonero whiting
quisquilla shrimp prawn – also called shrimp
salton sandeel – also called sand lance
salvelino char
solla plaice

The left column is the Spanish (with at least one spelling error, don’t know which (chicharro chincharro) is actually correct). The middle column is the few that the Oxford dictionary recognizes. And the third column is from one of these two sources (here and here) which I originally used to compile the list (I found a third list with scientific (Latin) names but didn’t originally use it and haven’t (yet) processed it). I’m a bit surprised Google missed the names that are in Oxford as I’ve encountered some of these in other places.

Now note that even with some of the Spanish names “translated” there are bunches of fish on this list I don’t recognize and I suspect few people would. So probably only a small subset of this list (the names Google didn’t recognize, not the full list) would ever appear on menus.

The two longer lists, with scientific names, seemed to potentially be the most accurate lists but I’ve found others at some other websites. The trouble with these is the names may not relate to Spain and may be from other Spanish speaking areas. This is a very common problem trying to find and merge and consolidate lists from the Net. In addition what is the level of authority of anyone who provides a list – rarely is that known and I see enough mistakes in almost any list to shed some doubt on the accuracy of the information. But all that said I’ll be trying, in the next post, to produce the largest and most accurate list from the raw material I can find.

So stay tuned for the final result.

Mystery post

This is some work in progress. Guess based on any terms you recognize what the work may be.

abadejo, abadejo de Alaska, aguja, aguja azul, aguja azul del Indo Pac, aguja blanca, aguja negra, alacha, alfonsino, almeja, anguila, arenque, bacaladilla, badexo, barrilete, berberecho, bermejuela, bígaro], bocina, boga, bogavante, boquerón, brama, brema común, brosmio, brótola de fango, brotola de roca, caballa, cabezuda, cabracho,  calandino, calion, callas, camarón, camaron tigre, cangrejo, cangrejo de rio, capelan, caramel, carbonero, carpa, centolla, chancha, chicharro, chincharro, choupa, chucla, cicloptero, cigala, colin, colin de Alaska, côngrio, coregono, croque, eglefino, emperador, eperlano, escolano, espadilla, espadín, espárido, estornino, esturión, esturion, esturion estrellado, falsa limanda, falsó lenguado, faneca, faneca noruega, fletán, fletan del Pacifico, gallineta, gallineta nórdica, gallo, gallo de San Pedro, galludo, gata, gato, golleta, granedero, hipogloso negro, husio, illiseria, jibia, jurel, lampuga, langosta, lanzon, lengua lisa, lenguado, libre de mar, limanda, limanda nórdica, lisa, listado, lobo, longeirón, lubina, lubricante, lucio, lucioperca, lumpo, maganto, maruca, maruca azul, mejillón, mendo, mendo limón, merlan, merluza, merluza de cola azul, merluzzo Francese, mero, mielga, mollera, muergo, mújol, musola, nécora, ostión, ostra, palero, pardete, pargos, pejerrey, pejesapo, perca, perlón, perro del norte, pescadillo, pez de plata, pez de San Pedro, pez espada, pez sable negro, pintarroja, platija, platija americana, plegonero, quimera, quisquilla, rabil, rascacio, raya, reloj anaranjado, reloj del Atlántico, rémol, rodaballo, rubio, salmonete, salmonete de roca, salton, salvelino, sapo, sierra, sierra del sur, solla, tenca, tiburón, tota, trucha, trucha arco iris, trucha arcoiris, vieira, volador, volandeira

Clue, Latin matters in figuring all this out.

 

 

Verbs again

In my previous post (about finishing initial processing of GallinaBlanca dictionary) I mentioned that verbs can be of some use in interpreting menus, possibly through derivatives of the infinitive form of the verb. So I’ve continued to do some digging in this area and have a few results to share.

Anticipating I’d be looking at verbs, independently of extracting them from the GB dictionary I used about nine online “lists” to compile an aggregate list. These verbs: a) may have nothing to do with cooking or cuisine, b) tend to be more commonly used verbs, and, c) may not be used (at all, or in same way) in Spain. So this is the list I’m calling C.

In the process of other searches I stumbled onto a culinary glossary. It has no connection with Spain and therefore the Spanish words might come from any part of the world. And as I worked with it more extensively and carefully I observe many of the issues with online resources of unknown origin: a) misspellings (probably, don’t want to jump to conclusion just because words seem to be misspelled), b) duplications, often including the singular and plural form, c) words that make no sense appearing in Spanish culinary dictionary (how did these drift in), d) inconsistent formatting and thus order (e.g. A la cazuela vs Cazadora, A la). In a previous iteration of my project I created a “glossary” by merging information from many sources and eventually it became a pisto (hotchpodge, if I can use that word in a non-culinary sense), especially losing any notion of whether the words applied to Spain or some other Spanish speaking area. So with these caveats I’ll call this list G.

And I have my list of verbs from the GallinaBlanca dictionary which I previously described. I’ll call this list D.

Now, simply, it’s too much work to compare the entirety of all three of these lists so I just did the subset (verbs only, of course) of verbs starting with A B or C. While this may be a biased sample it still reveals some interesting information.

Sorting the three lists together (with different fonts and colors for each list so I can distinguish) then I did manual processing to consolidate like terms together. As a result I ended coding each entry with GDC (or – if not in that list). So I generate the following table:

G– 44
-D- 4
–C 35
GD- 28
-DC 1
G-C 9
GDC 5

There are 126 verbs that appear in at least one of these lists. Only 5 verbs appear in all three lists. The list with the largest number of unique verbs is the G (glossary, 44), which thus indicates this is potentially very useful as it adds over 50% more verbs than I had previously found.  The verbs in the C (common) list may have nothing to do with cooking or food (we’re explore that later in the post) so this may not add much. Only 5 verbs from the GallinaBlanca list don’t appear in the glossary list so whoever compiled that got most of the cooking verbs.

So looking at the verbs that are only in the C (common) list and not in either cooking related list we do see a few surprising omissions (I’m assuming that these are SO common no one bothers to include them):

abrir –C to open; to turn on; to whet (as in appetite)
agregar –C to add
añadir –C to add
beber –C to drink
calentar –C to heat, heat up, warm up; to inflame
cocinar –C to cook
combinar –C to combine, mix; to put together, match, coordinate
comer –C to eat; to have for lunch; [Latin America] to have for dinner
concinar –C not in any dictionary, probably misspelling of cocinar
convertir –C to turn into, convert into, change into, make
cortar –C to cut, cut off, carve, slice, cut out; to chop; to cut (dilute sense); …

So out of the 35 verbs in the C (common) list only I’d probably include these 11 in a general purpose culinary list.

Now some of the verbs in the G (glossary) don’t appear to be useful. Some have no definition in any of the dictionaries I routinely use, including the most authoritative of the Spanish language (which is NOT limited to Spain so could include verbs that don’t get used in Spain).  So here are a few I’d consider dubious to include in a culinary glossary:

achicalar G– [Mexico] to cover in honey; soak in honey
añejar G– to age; [vino] to mature; to get stale
apanar G– to coat in breadcrumbs (also EMPANAR or EMPANIZAR)
apuntillar G– to finish off (a toro); to round off
ataviar G– to dress up
bardar G– to thatch
blanchir G– (not in dict) Wiktionary has it as a French term for make white
bresear G– (from glossary) To cook to slow fire, during long time, with condiments (generally vegetables, wine, broth and spices). Clearly a spelling error since not found.
cantar G– to sing; to crow, chirp
caramerizar G– (not in dict), another spelling? [from glossary] Spread a mold with sugar honey.
castigar G– to punish; to ground, keep in; to damage, harm
cerner G– to sift, sieve (same as cernir, which is it?)
chapurrar G– to speak badly

I wouldn’t include achicalar as it doesn’t appear to be used in Spain but this is a good point about my goal here. If I wanted to know the Spanish word, used in Spain, for an English word, I wouldn’t include anything that may be only used outside Spain. But my goal is asymmetric – to translate Spanish (on menus) only into English (so I can choose) so including a word in my corpus (and eventually my app) that is not likely to be used in Spain is not a problem (I do need metadata to note this however, for that term). If I never see the term it does no harm to never have it found in any lookup. OTOH, it would be a problem if I’m trying to translate English into Spanish, as in don’t use a word not found in Spain. It appears, for instance, frijoles, which is well-known to most in USA who visit Mexican restaurants is one such word, not commonly used in Spain, but possibly likely a Spaniard would know the word. That might lead to a scene (from The Way) like no tapas in Navarra, only pinxtos, and thus make you look foolish.

blanchir (to make white, which isn’t exactly synonymous with blanch but one might assume that’s what this means) was interesting in that it did not occur in any dictionary but did have an entry in Wiktionary. The standard term  for blanch is palidecer (purely in the sense of turn white) and escaldar or blanquear for the culinary sense. I suspect  blanchir might be used somewhere (possibly Puerto Rico) where it is just the cognate of the English verb. But, again, in collecting the corpus I should not make judgments like this although I might add metatext to an blanchir entry and meanwhile add it to corpus and then let the “big data” statistical analysis decide if this is a word or not.

bresear really looks like a misspelling (more likely to be brasear, to barbecue) but again it should go into the corpus with metadata notion rather than my passing a judgment on it (IOW, only a real expert in Spanish should be decided what to include or not in any translation dictionary, so if I find only one instance of a misspelled word it will get washed out since there are few occurrences of it in the corpus; OTOH, maybe people do commonly misspell this word so it needs to be in my app). caramerizar appears to be some variant of caramelizar, again perhaps used somewhere and not just a mistake. cerner has exactly the same definition (in the glossary itself, but also spanishdict) as the more common spelling cernir, although both appear in reverse lookup of ‘to sift’ in spanishdict (which is it, then? just a common confusion?) cernido is a possible term to see on a menu so it matters that my dictionary could spot this as past participle of cerner.

So again all this goes to show the work that must be done to really develop a very accurate dictionary that drives my app for menu translation (or to be published as a carefully researched culinary glossary).

 

 

 

How to use collected menus

I use this blog to document a project I’m doing which is to obtain an accurate and comprehensive set of terms (isolated words and phrases) to feed a smartphone app so I can “read” menus in Spain. To do this I am first collecting menus on my virtual “trek” (translating miles on a treadmill to position on the Camino de Santiago) and using Google map’s POI to find restaurants and then process those that have websites with some form of menu I can just extract (don’t want to be typing from images and make all those mistakes).

Most of the menus are in Spanish (rarely I can find one that is dual language, and even then: a) their translation may not be so great, and, b) the English menu may not be the same, so this can be tricky). So I use either Google translate (if the menu is standard HTML webpage) or some tedious copy-and-paste to use spanishdict.com (really Microsoft) to translate. Of course these machine translations are often not that great (both wrong and miss many terms) and that is a big issue.

Doing this process is fairly mechanically tedious but doing it slowly also gives me a chance to really observe what is going on (plus get a bit of drill on words, my short-term memory of some Spanish terms is increasing, but based on past projects I know I’ll retain little of that). And, as I’ve documented in some posts occasionally menu items complete befuddle the machine translation which sends me off trying to figure it out myself, an interesting challenge since I have next to zero fluency in Spanish.

Now it is important to note my goal. Learning to speak and hear Spanish is entirely different, especially if you want to have conversations about almost anything (even if still oriented toward travel). I just need to be able to read menus (at least for my limited goal) and choose what I want. And I don’t need to translate in the other direction, so knowing whether ‘mushroom’ is hongo or seta doesn’t matter as much as going the other way.

And, of course, this also does imply knowing something about cuisine in Spain (which can be quite different than what we might encounter in restaurants in USA that happen to use Spanish on their menus). And it is turning out to require knowing something about agriculture in general in Spain, especially in different regions. An ingredient, like chorizo is: a) quite different than the Mexican style chorizo I’d find in markets or restaurants here, and, b) somewhat different in different regions in Spain as each has its own traditional way of making something like chorizo.

So after extracting menus from websites with some sort of translation I end up with side-by-side menu items, like below:

Gambas a la Plancha Prawns on the Plate
Setas a la Plancha Grilled mushrooms
Espárragos Especiales “Dos Salsas” Special Asparagus “Two Sauces”
Ensalada Templada con Gulas y Rape Tempered Salad with Gulas and Rape
Cogollitos de Tudela con Anchoas y Salmón Tudela with anchovies and salmon
Tabla de Ibéricos Iberian Table

I choose these particular items to make a couple of points:

  1. Notice that a la plancha occurs in two consecutive entries and given gambas are prawns and setas are mushroom that means there are two different ways, to both parse and assign a tentative meaning to a la plancha (either ‘grilled’ or ‘on the plate’ (more literal). So what does it really mean? Answer, btw, is that plancha is really “iron” which means a cooking device, either pan or typical restaurant flattop is used to “grill” the item.
  2. In the fourth item gulas appears (and didn’t get translated) and rape is quite ambiguous (is it the English word and therefore shouldn’t be translated or is it a Spanish word that means something entirely different?). gulas are baby eels (or possibly synthetic “worms”, like the fake crab) and rape is a type of fish with more than one translation (monkfish, anglefish).  So how can I use information like this?
  3. Cogollitos de Tudela got translated just to Tudela (the other words in this item are easy to match the Spanish and English). This is actually a flaw (I believe) in Google translation process. Cogollitos is looked up to get “A small heart or flower of garden plant” (or sometimes, just ‘buds’) and Tudela doesn’t appear in any dictionary but turns out to be a town (really just a reference location) where a particular type of lettuce (looks like Romaine) is grown and when served at restaurant the inner leaves are used (often in very attractive presentation). So this is a fairly classic ingredient and dish, especially in northeastern Spain but translation isn’t going to help much. So, a) how certain am I that I’ve figured this out correctly (or even how would I put some certainty on it, like how many different sources I found that confirm my guess at what this is? versus any counter-evidence), and, b) how should I use this information in my corpus.
  4. And what is “Iberian Table”? (a valid literal translation but not helpful). Now doing even a little research on menus one quickly learns that Ibéricos almost certainly refers to a prized pig but how is it connected to Tabla? Sometimes one has to be careful here as I’ve already found an instance where silla (literally ‘chair’, but in the context, really ‘saddle’) refers to a cut of meat so maybe the same is true with tabla? IOW, there is quite a lot of uncertainty here BUT this could be an important item to know.  I suspect, BTW, it’s just a plate with some ham or other cured pork, like an antipasta.

So there are several steps in studying menus:

  1. the mechanical part of getting the Spanish aligned with some sort of translation to English
  2. studying the results for what appears to be clear one-to-one correspondence in terms. But beware – on this single menu both hongos and setas translate to mushrooms? Why are there two difference words (previously hongos had shown up as primarily used in Latin America, not Spain, but obviously this menu contradicts that). And if there is a difference (i.e. they’re not just synonyms) what is it. I have vague evidence hongos refers to cultivated button mushrooms and setas to wild mushrooms (like shiitake or others). That is a big difference.
  3. Some items translate very little and therefore can I find other sources to determine what these items might be? (sometimes yes, sometimes no) And even if I figure out what a word (e.g. Cameros from yesterday’s post) or phrase (a la riojana from yesterday’s post) is, these are not literal translations so how do I mark these. For instance I believe  refers to the mountains in southern Rioja and therefore potentially a breed (or just the husbandry of) sheep that would be recognized as distinctive (like Waygu beef). If I figure this out: a) what confidence do I put on this information, and, b) how to I encode this information in my corpus.

Once a corpus is obtained the assumption is a kind of “big data” can help figure all this out (I haven’t quite figured out what code I’ll write for this, Google claims complex deep-learning AI as their method of training their translation and I don’t have the resources for that approach). But my assumption is that everything in my corpus will have multiple entries and some a lot of entries. So in conjunction with my placing some sort of “certainty” weight on each pair and matching up pairs across a large data space some sort of overall certainty can be derived (probably with a lot of exceptions that have to be looked at my human evaluation which Google says they never do, which also might explain some of their odd translations).

So, just to finish this let me provide an example. From this single menu I extracted (manually, can’t quite imagine how to do this in code) the following table of “pairs” where I’m relatively certain these are correct. IOW, these are mostly just the terms derived via literal translation not the more complicated cases where a lot of guessing is required.

Note: more discussion after this table, please scroll down.

a la Plancha Grilled; on the Plate Lechal Baby lamb
a la Vinagreta Vinaigrette Lenguado Sole
Agua Water Limón Lemon
al Horno Baked Macarrones Macaroni
Albóndigas Meatballs Menestra Stew
Anchoas Anchovies Merluza Hake
Arándanos Blueberries Milhojas Fillets
Arroz Rice Mixta Mixed
Asado Roasted Oveja Sheep
Bacalao Cod Pan Bread
Bebida Drink Patatas Potatoes
Berenjena Eggplant Pato Duck
Bistec de Ternera Beef Steak Pescados Fish
Calabacín Zucchini Pimienta Pepper
Calamares Squid Pimientos Peppers
Carne Meat Postres Desserts
Carrilleras Cheek pieces Precio Price
Cerveza Beer Primeros Platos First courses
Codillo Knuckle Puerros Leek
compartir share Pulpo Octopus
Cordero Lamb Queso Cheese
Croqueta Croquettes Rape Anglerfish
de la Abuela Grandma’s Rebozado Coated
de la Casa of the House Refresco Soda
elegir choose Rellenos Stuffed
en su Tinta in ink reservas reservations
Ensalada Salad Revuelto Scrambled
Entrantes Starters Rojo Red
Espárragos Asparagus Sabores Flavors
Fresco Fresh Salsa Sauce
Frutas Fruit Setas Mushrooms
Gambas Prawns sobre on
Gaseosa Soda Solomillo Sirloin
Guisado Stew; Stewed Tarta de Queso Cheesecake
Helado Ice cream Tomate Tomato
Hongos Mushroom Trucha Trout
Huevo Egg Verduras Vegetable
Incluye Includes Vino Wine
Jamón Ham Yogurt Griego Greek Yogurt
Judías Verdes Green Beans

So a single menu provided a significant (about 80 items) source of raw material to feed into my corpus. Now I’ll just note a few things as to whether further processing should be applied to this list before adding it to a corpus (or, IOW, what metadata should also be embedded in the corpus).

  1.  Judías Verdes ‘green beans’: Should there be an entry verdes as ‘green’ and judias as beans? Now in Spanish adjectives match their noun in both number and gender so verdes might not be the lookup dictionary form for ‘green’ (it’s not, the singular verde is). So that could introduce some confusion in the corpus. And ‘bean’ has multiple translations which often one word being used for the dried beans (or the seeds in the bean pod) versus the whole bean, as in typical green beans.
  2. What about Guisado ? These had two literal translations: ‘stew’ and ‘stewed’ by Google. And in English those are not the same thing even though they’re related. guisado is the past participle of the verb guisar which can mean either just simple ‘cook’ or also ‘stew’.  The context in this menu for the two uses of guisado are “Cordero Guisado” and “Cordero Guisado con Pimientos” so why is Google convinced it’s ‘stew’ (the noun) and ‘stewed’ (the conjugated verb) in these two contexts. Is it right?
  3. Another thing I noticed is that often the English translation doesn’t match the Spanish in number. Figuring out plural and singular forms in a corpus analysis process could be interesting, so putting in an incorrect corresponding pair could be problematic.
  4. And, finally (for today) nouns probably fit into a literal translation mode easier than other parts of speech, or especially colloquial usage, so trucha as trout is fairly high certainty but what about mixta as ‘mixed’? It was used in the context of ensalada (salad) and that item appears to be a typical mixed salad (often “house” salad in US restaurants) but the literal translation of ‘mixed’ would be more likely  variado or diverso; mixta doesn’t occur in lookup dictionary at all, but mixto does in the sense of mixed of both sexes (i.e. a group of people), so why did the salad menu items decide to use feminine form or even mixto at all?

So there are lots of challenges, both extracting the raw data itself, assigning some metadata to the pairs to qualify how they should be treated in the corpus and especially assigning some certainty value (i.e. like a probability, where 1.0 would probably never occur (there is always some ambiguity) and 0.0 is meaningless to even include BUT maybe a single scalar value is insufficient since it’s possible to have high incompatible, in not even mutually exclusive, interpretations).

So all of that is a lot of design work to do and then probably an iterative process once I get some code that can crunch the corpus (thus far, I’ve done some by hand to look for design issues). And, fundamentally, is this even a process I can automate at all or at most the code just brings together related pairs for me to analyze with my intelligence.

Who knows, time will tell.

p.s. [personal]. Doing this mechanical work (and some background study as I go along) and also writing these posts is definitely cramming some Spanish into my brain, but I also know that’s a short-term effect. A year from now I’m not going to remember guisado is the past particular of guisar or that it is related to stews/stewing (as cooking process). So converting this work into: a) a more permanent and usable form (like a smartphone app to carry with me to Spain), and/or, b) creating some drill programs so I could “brush up” just before leaving has a more useful effect.

 

Finished the GallinaBlanca Diccionario

I’ll explain what “finished” means in a minute but first I am almost at another milestone in my journey, so 1/2 mile outside Nájera, about 20 miles from Logroño and about 60 miles to reach Burgos, on my virtual  camino trek. That is since I’m stuck here in the cold midwest USA I do miles on my treadmill in the basement (training for the Camino, I wish!!!) and translate those boring miles onto a GPS track of the Camino de Santiago and then, most of the time, do a little “walking” courtesy of Google StreetView (the Camino is hardly a wilderness trail if a Google car is driving on it).

So what does it mean that I say I finished the GB dictionary. Well it means the tedious part is over. Their dictionary is provided via Javascript popups and one page for each letter of the alphabet and thus: a) there is no way to easily grab all the terms out of the HTML, and, b) Google Translate doesn’t operate on the popups. So I have to manually click each term, use mouse to get the text of its definition in Spanish, paste that in my MSWord document and in the spanishdict.com webpage, get the translation (which it turns out seem to actually be provided my Microsoft; I tried the translation built into MSWord itself and it was pretty ragged), mouse that translation and then paste in the side-by-side table. Then I take the term and attempt to get a simple literal translation (more pasting, possibly into three different webpages).

Needless to say this is big-time tedious (and slow) and that’s what I’ve finished. It may be tedious but going slowly through the list means I take the time to study each result. Often even from the English translation of the definition of the term I really don’t know what the English word would be, which makes that lookup sometimes a surprise. Since this is a specialized vocabulary for cooking many of the terms are more obscure and thus missing in dictionary lookups so it’s off to doing searching and guessing and trial-and-error until I get a reasonable answer. Lots of work but a good learning experience.

So now I have that “done” (probably a few mistakes I’ll have to clean up). So I have pages of stuff like this:

HERVIR (literally boil) Cocer en líquido a una temperatura de 100º. Cook in liquid at a temperature of 100 º.
HORNEAR (literally bake) Cocer en el horno mediante calor seco. Cook in the oven with dry heat.
HUMEAR (literally smoke or steam, and one sense is exactly this definition? ahumar is the culinary verb) Se dice cuando el aceite desprende humo, indicando que está caliente, a punto. It is said when the oil emits smoke, indicating that it is hot, ready.
INCORPORAR (literally incorporate, add, include and mix in) Agregar, unir algo a otra cosa para que haga un todo con ella. Add, join something else to do a whole thing with it.
INSTILAR (literally instill) Echar poco a poco, gota a gota, un líquido en otra cosa. Slowly pouring, drop by drop, a liquid into something else.
LAMINAR (literally laminate) Cortar en láminas muy finas. Cut into very thin slices.

So what am I going to do with this now?

I deliberately picked a chunk of the dictionary that is all verbs because that’s my first attempt to create something derived from this list. There are a lot of verbs in this dictionary because it accompanies recetas (recipes) and these verbs (in some conjugated form) probably occur in the collection of all those recetas. So GallinaBlanca is nicely helping cooks read recetas that might contain a verb they don’t know. There are some fairly obscure verbs in the list.

Now what has this got to do with reading menus which is the focus of my project. Rarely are the menus (at least the list of items you can order) going to have complete sentences explaining the food (perhaps a brief, just a phrase, description). So verbs don’t much matter.

Or do they? A word you will frequently see on menus (even in name of restaurants) is asado.  This is grilled or roasted (as an adjective perhaps modifying some noun) or even just a noun in its own right, grill or roast. But this word has its root in a verb, that is asar (in the infinitive form, i.e. the typical word to lookup in a dictionary (Note: Online dictionaries are often smart enough to handle conjugated forms but typical non-interactive dictionaries (paper or smartphone) require you to see this is a conjugation of a verb and deduce the infinitive form to do the lookup – not easy if you’re unfamiliar with Spanish).  asado is the past participle of asar and as Spanish verbs are far more regular (some exceptions) than English this is almost an algorithmic rule to form past participle from infinitive very (like to baked and baked as a regular case in English). So in a quick extract from my list here are a couple more examples: hervir (to boil) hervido (boiled), estofar (to stew) estofado (stewed), picar (to mince or chop)  picado (minced).

So knowing some cooking verbs could come in handy. Memorizing them all is probably a waste of time but as I intend to collect everything I’ll need this in my smart app that is going to translate menus (having all the conjugations is then easy as well).

But I don’t like to depend on a single source for literal translation (each verb to its most direct English equivalent). Plus some verbs have a ton of different meanings and they are not always labeled as being the culinary sense in every dictionary. And some verbs don’t have much connection, given GallinaBlanca’s definition to the standard (at least online) dictionary definitions. For instance, this tough one to figure out:

ALBARDAR (literally: to saddle, put a  packsaddle on)  Envolver piezas de carne con lonchas finas de tocino, para evitar que se sequen al cocinarlas. Wrap pieces of meat with thin slices of bacon to avoid drying when cooking.

I suppose one might deduce that wrapping meat with bacon is “saddling” it, but really the clue comes from this:

Saddle is a butchery term that refers to the meat that is at the animal’s back and hips. Think of it in terms of the meat that would be in more or less the same place as a saddle on a horse.

I’ve done a fair amount of cooking (and reading cookbooks) and ‘saddle’ as a cut of meat never registered. Or what about this one:

CINCELAR (literally chisel, carve, engrave) Hacer incisiones en una pieza (se utiliza sobre todo para pescados) para facilitar su proceso de cocción, generalmente en los asados. Make incisions in one piece (mainly used for fish) to facilitate their cooking process, usually in roasts.

I’ve done exactly this cooking fish (and more so bread) but I don’t think I’d use any of those literal English verb equivalents to describe the process.

So there is a lot of learn from these verbs. And as I said I don’t like single sources so I sometimes use a page here in this blog (test data) to paste some Spanish in, view that page, and then fire up Google Translate (maybe there is some simpler way but this works without too much hassle).

Now what I’ve read about Google Translate context matters. So a pure list of verbs, especially in infinitive form eliminates any possibility of a contextual AI-ish translation and thus is just a simple literal translation. For verbs with many meanings there is nothing to clue Google about which one to use.

So it was interesting to see how Google did on this translation. I found a total of 132 verbs in GallinaBlanca dictionary. Of these the following 44 had no Google translation:

ABARQUILLAR, ACARAMELAR, ACHICHARRAR, ACIDELAR, ALBARDAR, ALIÑAR, ALMIBARAR, ANISAR, ASAR, ATAR, BATIR, BRASEAR, CASCAR, CATAR, CHAFAR, CONFITAR, DECORAR, DESBABAR, DESBARDAR, DESCAMAR, DESLEÍR, DESMIGRAR, EMBORRACHAR, EMBRIDAR, ENHARINAR, ESCABECHAR, ESCALFAR, ESCAMAR, ESPECIAR, GUISAR, LAMINAR, LEVAR, MAJAR, MOREAR, NAPAR, PICAR, PINCHAR, POCHEAR, REBOZAR, REHOGAR, REMOVER, ROSTIR, TRUFAR, VOILER

Now Google can be forgiven (except it claims it’s AI does better than rule-based literal translation) for the verbs in RED since none of my dictionaries know what these are. For instance I actually think acidelar is just a typo since the definition GB gives it “Put lemon juice or vinegar in the water to cook poached eggs or vegetables, so that they do not blackened. ” is fairly similar for the known acidular whose definition is “Sprinkle with an acidic liquid fruit, vegetables or vegetables so that they retain their whiteness or colour.” But the definitions are not exactly the same and for me to declare acidelar to be a mistake is premature; after all it could be some alternate spelling or perhaps a regional difference from the standard dictionary Spainish, or, worse, it might be the spelling used in Spain versus what is used elsewhere. I simply do not have enough data to decide.

So what about something like

MOREAR (not in any dictionary) Dar vuelta sobre el fuego bajo y con un poco de aceite en un sartén o cacerola a los alimentos, para que tomen color antes de añadirle salsa o caldo. Turn over the low heat and with a little oil in a frying pan or pan to the food, so that they take color before adding sauce or broth.

This comes up blank in all dictionaries and most web searches I’ve tried. So the question is do I believe this is even a word (or perhaps it’s from some other language used in Spain). It certainly sounds like sauté (cooking technique) but that is saltear GB defines as “Stir the food in butter or hot oil when frying in an uncovered skillet.”

Now for the words not in RED I did find literal translations of them including ASAR which I find surprising that Google doesn’t know (this, as you recall, is the verb I used as example above to explain why I’m investigating verb, i.e. it is the infinitive root for asado, a very common word on menus). And I’m also surprised it didn’t know GUISAR (cook, stew; cook up) since I can recall from memory seeing that and especially its past participle guisado (refers, as a noun, to  casserole, stew, or, most generically, dish) and as an adjective as stewed. And I’ve seen rebozado (covered in batter or breadcrumbs) on numerous menus and it’s the past participle of REBOZAR (to coat in batter or breadcrumbs) that Google didn’t know. Now, OTOH, TRUFAR (try to guess before reading the translation) is probably sufficiently obscure Google may not have seen this but given the price of the item for this word you’d want to know what it means if you saw it on a mean (it means, to stuff with truffles).

Now as the other verbs which Google did have some translation I’m going through a somewhat tedious process of digging out (again, but this time in a single consistent process) the literal translations so I can compare Google to other sources. And sources are going to matter. Not only is it hard to say with absolute certainty what an appropriate translation is going to be (I believe even fluent Spanish speaking authorities might debate some verbs) I need to do this comparison of various sources in a systematic way, not believing one source over another until I can potentially “confirm” a translation via some processing of a large corpus of translated food related material, IOW, exactly what I’m building up now.

For the verbs Google did translate here are a few of the issues I’ve found thus far (not done with this analysis):

  1.  Often Google chooses the present participle as the translation instead of the infinitive, e.g. ADOBAR, Google says marinating instead of to marinate, not a big deal overall but this might get into a corpus and create a statistical flaw later in the analysis.
  2. For AVIAR Google picked the most literal, namely an adjective ‘avian’ rather than to prepare as the root verb (multiple meanings, this one matches the GB definition, “Prepare birds for cooking. It consists of all pre-elaborations that must be made to a piece: cleaning, flamed, wicking, flanged, etc.”  Note: That GB has defined this in more specific way than spanishdict.com did and given the Latin root for both the verb and the adjective the GB definition is definitely superior (plus being more useful to understand in the context of cooking).
  3. Picking one of several literal translations, but not in the culinary sense (which I do, looking at spanishdict.com because I know culinary is the context), e.g. BRIDAR which Google translates as ‘bridle’ (literally OK), but to tie or truss is much more useful in cooking sense.
  4. Or something like DESPLUMAR, which Google picks the present participle Fleecing, which is a plausible translation. But the GB definition is “Remove the feathers from the bird.” which comes closer to an alternate definition, ‘to pluck’. Amazingly using fleecing is a colloquial usage somewhat like English where someone is taken advantage of and thus “fleeced”.

I’m sure there will be more as I finish grinding through but this post, already TMI, hopefully gives a sense of how I’m post-processing the pure mechanical part of my study to pound the raw data into a more usable form to then create my corpus (all preliminary to creating my AI-ish smart menu translator).