Additions to glossary

The glossary page in this blog, at the moment, has been compiled by hand. This is NOT the process I intend to use for my definitive glossary to embed in my translation app since hand compilation is subject to numerous errors, plus the source material may be incorrect. But I like having some result even before I manage to generate a definitive glossary where each entry is found in numerous sources and checked against authoritative guides.

In the past I’ve searched for glossaries all over the Net and manually consolidated them. The result was a mess due to: a) often the source glossary had mistakes made by whoever compiled it, b) the Spanish terms may not apply to Spain which is my focus (for example, hongo is mushroom in most of Latin America but rarely used in Spain), and, c) terms from a glossary may not overlap as I want with actual references on menus (in Spain) which is my focus.

All that said, nonetheless I continue to make additions. In this case I was looking at some travel books and cookbooks I’d gotten on my previous fascination from Spain and realized I had the Langenscheidt Pocket Phrasebook (Spanish), 2006 edition, which includes a 1400 word dictionary. So I fairly quickly went through the dictionary and extracted words that relate to food or to restaurants. From that list I found which were not already in my v3.2 of the glossary. Now I’ve updated my glossary page, but here I’ll show what kinds of words were missing (previously the glossary page had come entirely from extracts of menus). A few of these terms, I realized, should also be included in my restaurant terms page, so that has been updated as well.

abierto open
achicoria chicory
amarg{o|a} sour
aromáticas herb
asiento seat
avena oats
batido milkshake
boca mouth
bombilla bulb
botella bottle
brazo arm
brécol broccoli
bufet buffet
caballa mackerel
cabeza head
calle street
camarer{o|a} waiter/waitress
carajillo coffee with brandy
cartilago cartilage
cena dinner
centeno rye
cerebro brain
cereza cherry
cerrado closed
cervecería beer hall
cerveza de barril draft beer
cerveza rubia lager
champán champagne
cóctel cocktail
col cabbage
comer to eat
comestibles groceries
composición ingredients
coñac brandy
concha shell
condimentad{o|a} seasoned
confitería candy store
conserva canned food
cortado espresso with a dash of milk
crema de leche coffee creamer
cruasán croissant
cubiertos silverware
cuchillo knife
cuello neck
cuenta bill
cuerpo body
desnatada low-fat
destilería brewery
diente tooth
dinero money
endibia[s] endive, correct spelling as previous was wrong
entero whole
entrada entrance
erizo de mar sea urchin
especias spices
espeto skewered
espina fish bone
espumoso sparkling (in wine context)
estómago stomach
estragón tarragon
estrella star
factura bill
fruta del tiempo seasonal fruit
ginebra gin
gofres waffles
gratuito free of charge
guayaba guava
helada frost
hervid{o|a} boiled
hervid{o|a} cooked
hierba herb
higos fig
hornillo stove
hueso bone
infusion de hierbas herbal tea
jardín garden
jarra jug, pitcher
langosta lobster
lengua tongue
limonada soda
macedonia de frutas fruit salad
manzanilla chamomile tea
margarina margarine
menú menu
mojado wet
molino mill
músculo muscle
nectarina nectarine
número size
ocupad{o|a} taken
ojo eye
pan integral whole grain bread
penecillo roll
pez espada swordfish
pierna leg
poleo de minta peppermint tea
polvo powder
pomelo grapefruit
primavera spring
propina tip
raíz root
reserved{o|a} reserved
ron rum
rosado róse
rosbif roast beef
sala hall, room
salami pepperoni
salida exit
sandía watermelon
sangre blood
sarro tartar
semana week
semiseco medium dry
sémola semolina
servicio restroom, service
servilleta napkin
suplemento surcharge
taberna bar
tenedor fork
terraza terrace
trucha trout
uva grape
vajilla tableware
ventanilla counter (window)



Blog note

After consolidating terms from numerous menus, plus the recent post about restaurant terms, I substantially updated the page under the tab RESTAURANT PHRASES. The main change was the addition of a list of phrases which I’ll include here for convenience. Enjoy!


In this list the notation {x|y} means this word occurs with either x or y in this position, usually this is gender in adjectives, so {a|o}. [x] means optional, most often [s].

a elegir to choose [from]
a tu elección at your choice
acompañad{a|o}[s] accompanied
al centro in the center (of table, i.e. for sharing)
al estilo X in the style of X
al gusto to taste (doneness), i.e. cooked to order
al peso by weight
bebida[s] drinks
carta the a la carte menu
casa literally house, from this restaurant
caser{a|o} homemade
combinados combinations
degustación tasting/taste (often a separate menu)
del día of the day
diario daily (available item or open)
elaboración preparation
eliges tú los ingredientes you choose the ingredients
en temporada in season
entrantes starters (aka appetizers)
especialidad specialties
horario hours (as in when it is open)
incluid{a|o}[s] included
ingredientes ingredients
mesa table (different from tabla)
para acabar to finish (after main part of meal)
para comer to eat (main part of menu)
para compartir to share
para picar to nibble on (aka snacks or appetizers)
por encargo on request
postres desserts
precio[s] price
primeros [platos] (primer) first course
segundos [platos] second course
selección/seleccionado selection/selected
servido [con] served [with]
surtido assortment
tabla board/plank or platter (usually an assortment, often of ham)
unidad unit (abbreviation uds)
vari{e|a}d{a|o}[s] assorted, varied, variety

Quesos de España – A Great Source

I took a break from decoding menus from restaurants in Spain to look at cheeses that originate in Spain. I’ve done this type of investigation before (previously for Italy) and it’s a challenging task. Names of cheeses can be very inconsistent from different sources. Even with DOP names now more common there can still be inconsistencies.

And, of course, using any online source for raw material has the challenge that its author may be wrong or misspelled names or introduced other errors. And consolidating all the names found in different sources is difficult to automate while simultaneously this is a large quantity of information to attempt to mentally collate especially when one is not conversant in the language.

I’ll explain my process below but in case you just want the excellent source I found I’ll describe it first, even though it was after a lot of searching I discovered it.

While it’s entirely in Spanish and as a PDF not subject to Google Translate when accessed through the web browser this is a very nice document: CATÁLOGO ELECTRÓNICO DE QUESOS DE ESPAÑA (slow to download but worth the wait).

It has pictures of the cheeses and even some of the animals for the milk plus standardized descriptions including items like: Zona de Elaboración (processing area), Ingredientes (ingredients), Tipo de Queso (cheese type), Aspecto Exterior (outward appearance) and Aspecto Interior (interior appearance).

And then even more helpful is this section, Características Organolépticas (Organoleptic  characteristics, I had to look up the English definition on this which is “acting on or involving the use of the sense organs”), which then includes: Textura al Tacto (texture to touch), Olor (odor), Textura en Boca (texture in mouth), Aroma (aroma), Sabor (flavor), Otras Sensaciones (other sensations), Gusto Residual (residual taste), Persistencia (persistence). In case you’re not sure what Gusto Residual means here it is for Gamonedo cheese (from  Principado de Asturias):

El gusto después de ser tragado es: a avellana, con predominio suave de humo (The taste after being swallowed is: a hazelnut, with soft predominance of smoke.)

And here is an example of Persistencia for Curado (cured/aged) Mahón-Menorca cheese:

Media-elevada, presencia de mantequilla fundida, aceite de oliva y caldo de carne. Entre quince y treinta segundos  (Medium-high, presence of melted butter, olive oil and meat broth. Between fifteen and thirty seconds)

In addition to this extensive, informative and attractive PDF there is another part of this site where you can filter the list of cheeses, i.e. Buscador de quesos (Cheese Finder (aka Search Engine)). The filters are: Seleccione (Select): Comunidad Autónoma (Autonomous Community), tipo de leche (milk type), calidad diferenciada, régimen de calidad (differentiated quality, quality regime).  So for example I did search for cow’s milk (leche de vaca) cheeses from Cantabria and all (todas) quality regimes and got:


(mark or brand)



Procedencia Leche

(Origin of milk)
Comunidad Autónoma

(Autonomous Community)

Picón-Bejes-Tresviso D.O.P. Leche de vaca CANTABRIA
Queso Nata de Cantabria D.O.P. Leche de vaca CANTABRIA
Queso Pasiego Sin figura de calidad comunitaria reconocida

(No recognized community quality figure)
Leche de vaca CANTABRIA

After finding the list you can click on the cheese name for the full information page equivalent to the CATÁLOGO pages. You could either use the search tool to find a cheese you might want to try (some Spanish cheeses can be obtained online) or browse the CATÁLOGO.

back to my process for compiling a list of cheeses

But undaunted by these challenges, from past experience, I decided it was time to assemble a complete and accurate list. This only slightly matters for reading menus at restaurants and more likely would be useful for purchases at retail establishments but again knowing what you’re eating in another country is the inspiration for my project.

So I proceeded with the usual suspects, first doing several Google searches (to get the terms right to provide the best source materials) and then following several promising sources. As usual Wikipedia had a useful page List of Spanish cheeses with a fairly long list (fortunately tagged by region) with some links to pages for the more common cheeses. Having processed this list I immediately assumed the Spanish language version of Wikipedia would possibly have an even better list and it did – Quesos de España. Another seemingly authoritative source, Spanish Cheese Guide, covers all (?) of the DOP names.

From all these sources I generated a single list which required picked a “canonical” name and then finding all the variations from the sources. For example this cheese, Arzúa-Ulloa, appeared in all my sources (compiled thus far) but as you can see under quite different names even including a misspelling.

Queso Arzúa-Ulloa (P.D.O.) Galicia 1 link
Arzula Illoa 2 link
Arzúa Galicia 3
Arzúa-Ulloa Galicia 5 link
Arzúa-Ulloa Galicia 6 link

So after consolidating the list from five sources and choosing what appears to the the “standard” name (for those cheeses that appear on more than one list) here is what I believe is a fairly comprehensive lists:

Abredo, Acehúche, Afuega’l Pitu, Ahumado de Pría, Alhama de Granada, Alpujarras, Andalucía de cabra, Ansó-Hecho, Aracena, Arribes de Salamanca, Arzúa-Ulloa, Babia y Laciana, Barros, Benasque, Beyos¸Buelles, Burgos, Cabrales, Cáceres, Cádiz, Camerano, Campo Real, Campoo-Los Valles, Casín, Cassoleta, Castellano, Cebreiro, Colmenar Viejo, Flor de Guía, Fresnedillas de la Oliva, Gamonedo, Garrotxa, Gata-Hurdes, Gaztazarra, Genestoso, Gran Canaria, Grazalema, Guriezo, Herreño, Ibores, Idiazábal, L’alt Urgell y La Cerdanya, La Adrada, La Bureba, La Calahorra, La Gomera, La Montaña de León, La Nucía, La Peral, La Serena, La Siberia, La Sierra de Espadán, La Vera, Lanzarote, Letur, Los Montes de Toledo, Mahón-Menorca, Majorero, Málaga, Mallorquí, Manchego, Mató, Miraflores, Montsec, Murcia, Murcia al vino, Nata de Cantabria, Oropesa, Oscos, Ossera, Palmero, Pasiego, Pastor, Pata de mulo, Pedroches. Peñamellera, Picón Bejes-Tresviso, Pido, Quesaílla, Quesucos de Liébana, Requeixo, Roncal, San Simón da Costa, Serrat, Servilleta, Sierra Morena, Tenerife, Teruel, Tetilla, Tiétar, Torremocha del Jarama, Torta del Casar, Trapo, Tronchón, Tupí, Urbiés, Valdeón, Valle de Alcudia, Valle del Narcea, Vidiago, Villalón, Zamorano

There are around 30 more where I’ve found at least one mention but I’ll have to search for each of these individually (once I have the complete list) to see if these cheeses really exist (at least currently) or are just a spurious mention in some online list.

Small experiment

Most of the time I’ve spent on this project has involved looking at various source documents from Spain, then with multiple methods of doing translations. Ultimately the point of all this is to build a large corpus of “pairs” (words or phrases in Iberian Spanish and English translation (or some kind of equivalent). Critically I also need to add some measure of how likely the pairs represent valid equivalents so the code (yet to be done) can attempt to establish the probability of the consolidated list of pairs being correct. And also it has to handle the ambiguity, for instance, very common with ternera (is this veal or beef or both? as it often seems to be used for both.) And the multiple and overlapping and contradictory terms for shrimp vs prawns vs langostines (the small rock lobster) is a strong example of confusion on menus.

So given I haven’t yet designed my corpus or the code in ingest new pairs into the corpus and then process the related pairs I have to do experiments, by hand, on a smaller dataset to attempt to visualize the challenges I will face when this is all done with code on a much larger corpus.

So I recently processed an extensive menu from a single restaurant in Granada and just before that two restaurants in Santo Domingo de La Calzada, La Rioja. By process I mean the mostly mechanical work of getting entire sections of menu text side-by-side in original Spanish and then the translated English. Then I look for untranslated terms or silly translations to try to find other sources on the Net (often recetas) to determine the correct correspondence, for instance, manos de ministro is NOT minister’s hands but a colloquial version of the more common manitas de cerdo, or pig’s trotters (feet).

So having done this I’ll provide a few results. In total I ended up with 277 “pairs” with 50 of those on both lists (and thus likely to be very common food terms from menus – see list below). The two restaurants in Santo Domingo de La Calzada contributed 132 unique pairs and the Granada restaurant contributed 95 unique pairs. The various terms in the list are sometimes not that specific to food, for instance:

  1. blanco and negro, colors but used as qualifiers of chocolate in menus; rosada (pink as a color) ended up being quite a chase when it referred to a specific fish.
  2. aroma or chocolate which are the same in Spanish and English but I include them even though it (and others like it) are obvious loanwords as a piece of code doesn’t just “know” this and has to be told.
  3. especialidad (specialties) or vinagreta (vinaigrette) or salmón (salmon) even though these are easy to guess, eventually an app doing translation still needs to recognize these terms.
  4. arrozcarnedulcehuevoleche, panpescadopolloqueso, salsa and vino that are used so much, not just in Mexican restaurant menus but even in TV ads we can effectively consider these loanwords into English now, but again, a computer program doesn’t know that and so still needs to have this in the corpus that will then be the key to its translation.
  5. I did try to consolidate terms that have alternate gender forms and/or singular/plural but didn’t do this as precisely and consistently as a really good corpus would require

While just findings lists of food/cooking terms is easy on the Net whether they are correct or apply to Spain is more problematic. Even a source like a dictionary should be taken with a small dollop of skepticism. Certainly asking any of the various voice assistants is not going to have a very high accuracy rate. So it is necessary to: a) try to focus on sources and thus pairs that are really for Spain and not somewhere in western hemisphere (unless you, Dear Reader, are planning a trek in Bolivia, then do as you need).

So that was my experiment and I end with this list of 50 pairs that are so common you’re very likely to run into them BUT even this list is not 100% accurate as there are various issues with translation (see previous posts).

Cover up the right-hand column and see how many of these you know.

a la plancha grilled
aceite de oliva olive oil
anchoas anchovies
arroz rice
asados roasted
atún tuna
bacalao cod
blanco white
café coffee
Cantábricas/Cantábrico Cantabrian
caramelizados caramelized
carne meat
casera/o caseras/caseros homemade
cerdo pork
chocolate chocolate
comida meal
croquetas croquettes
deliciosa/o deliciosas delicious
dulce sweet
ensalada salad
frita/o fritas fried
guarnición garnish
helado ice cream
huevo egg
jamón ham
langostinos prawns
leche milk
lomo loin (generically; or cured meat specifically)
miel honey
pan bread
patata potato
pato duck
pechuga breast
pescado fish
pimientos peppers
plato dish
pollo chicken
postre dessert
pulpo octopus
queso cheese
revuelto scrambled
salsa sauce
solomillo tenderloin or fillet
tarta cake, also pie
ternera beef (alt: veal)
tomate tomato
tosta toast
vainilla vanilla
verdura vegetable
especialidad especialidades specialty

Mystery post – pez/peces or pescado

My title contains some bits of useful information. While I’m not absolutely certain some sources say peces is the plural of pez. Of course in English the plural of fish is fish so peces seems relatively uncommon. pecado also translates to fish BUT the key difference is that pescado is the piece of fish on your plate and pez is the living animal.

I let Google Translate loose on my previous “mystery” post and it had three types of results: 1) a few of the words translated correctly, 2) some translated but to nonsense, and, 3) some were missed altogether. I’ve tracked a few of the latter.

My big list of words (with cognates or loanwords removed to avoid giving a clue) was a lengthy list of the names of fish, probably as they are called in Spain. I found two long lists on the Net with Latin (scientific names) as well as names in English, Spanish and some other languages. Both were European sources so less likely to include fish found primarily in South America, but who knows how lists get compiled.

Plants and animals from natural world (versus cultivated plants/animals) are frequently misidentified and very tough to get accurate common names. Sometimes even the scientific names are in dispute or contradictory so big surprise the more colloquial names are. After all who but ichthyologists, some fisherman and a few fish mongers actually know these names accurately and/or could just by looking at a fish decide what to call it.

So this is probably the toughest area to compose an accurate Iberian Spanish to English translation list. I’m going to have a third post in this series about the names I conclude are fairly likely but for now here’s a subset of the list from the mystery post that Google failed to translate at all.

alfonsino Golden eye perch
badexo Lythe or pollack
boga bogue
brama bream Pomfret
brotola de roca Greater forkbeard
calion Shark, porbeagle
callas Callas
capelan capelin
chicharro scad – also called horse mackerel
chincharro Horse mackerel or scad
choupa Black bream or porgy or seabream
chucla picarel
cigala crawfish Norway lobster – also called Dublin Bay prawn
colin Coley or saithe
côngrio conger eel conger eel – also called conger
coregono whitefish
escolano smelt – also called sparling
espadilla frostfish – also called silver scabbardfish
espadín sprat sprat – also called brisling
espárido sea bream
illiseria megrim
lanzon sandeel – also called sand lance
limanda dab
longeirón razor clam – also called razor shell
lucioperca pike-perch
lumpo lumpfish Lumpfish
maganto Dublin Bay prawn or langoustine or scampi
mendo Witch or Torbay sole
merlan whiting
mollera poor cod
muergo razor clam – also called razor shell
musola smooth hound – also called dogfish, flake, huss, rigg
pardete Grey mullet
pejerrey silver side, sand smelt argentine – also called silver smelt
pejesapo angler fish Anglerfish or monkfish
perlón Grey gurnard
pescadillo Hake
plegonero whiting
quisquilla shrimp prawn – also called shrimp
salton sandeel – also called sand lance
salvelino char
solla plaice

The left column is the Spanish (with at least one spelling error, don’t know which (chicharro chincharro) is actually correct). The middle column is the few that the Oxford dictionary recognizes. And the third column is from one of these two sources (here and here) which I originally used to compile the list (I found a third list with scientific (Latin) names but didn’t originally use it and haven’t (yet) processed it). I’m a bit surprised Google missed the names that are in Oxford as I’ve encountered some of these in other places.

Now note that even with some of the Spanish names “translated” there are bunches of fish on this list I don’t recognize and I suspect few people would. So probably only a small subset of this list (the names Google didn’t recognize, not the full list) would ever appear on menus.

The two longer lists, with scientific names, seemed to potentially be the most accurate lists but I’ve found others at some other websites. The trouble with these is the names may not relate to Spain and may be from other Spanish speaking areas. This is a very common problem trying to find and merge and consolidate lists from the Net. In addition what is the level of authority of anyone who provides a list – rarely is that known and I see enough mistakes in almost any list to shed some doubt on the accuracy of the information. But all that said I’ll be trying, in the next post, to produce the largest and most accurate list from the raw material I can find.

So stay tuned for the final result.

Wine Terms

In my last post I mentioned I was using several websites (and pages within those sites) that had English translations to extract side-by-side human English translation of the (presumably) original Spanish. OK, done – so what? Like I’ll be doing with all sources then I begin an extraction process to add pairs (words or phrases) of translations to my corpus. A key part of that also has to be asserted some measure of “certainty” whether the translation is correct. Using a probability type measure (0.0…1.0 obviously fits). Then the corpus analysis program can find as many of the same pair as it can and evaluate a new certainty, i.e. something like – lots of pair instances that are the same but possibly each low certainty may be as good as few of a pair with high certainty. An interesting question, then, is human translation (relatively rare) of websites (mostly menus) more reliable source of information than machine translation.

Of course the extraction process itself (which I do and therefore is subject to error) plays a role as well so I’ll use my small corpus of wine webpages to extract a set of pairs and then use any other sources of wine terminology to confirm/deny my pairs (just manually, so I understand the data, before trying to write code to do this). So here’s my result:    (scroll down past list for more of this post)

abierto 2 open
acerb 2 acerbic
acidez 1 acidity
ácido 1 acid
aciete esencial 2 essential oils
afinamiento 1 refinement
afrutado[s] 1 fruity
agradables 1 nice, pleasant, agreeable
alegre 2 zingy
amoratado 2 inky
amplio 2 big
añada 2 vintage year
arcillo 1 clay
armónico 2 harmonious
aromas 1 aromas
aromática 1 aromatic
barrica Bordelesa 2 Bordeaux cask
barrica 1 cask or barrel
beber 1 to drink
blanco seco 3 dry white
blanco 2 white
boca 1 literally mouth, but can mean palette in wine tasting context
bodega 3 winery
bodeguero 3 winemaker
bota 2 butt
botella 1 bottle
Botritis 2 Botrytis
brillante 1 bright
brotaciones, brotación 1 [not found] budding ? (derivative of brotar)
brotar 1 to sprout, bud
calidad 1 quality
campaña 1 growing period, season
campo 1 field
canela 1 cinnamon
cánones del clasicismo Riojano 1 classic Rioja style (not literal)
capa 2 layer
cata 1 tasting (action of)
cereza 1 cherry
cerrado 2 closed
clarificación 2 fining
clásico de Rioja 1 Rioja classic
comarca 1 region, district
complejidad 1 complexity
complejo 2 complex
corcho cork
cosecha 1 harvest, crop; vintage
crianza en barrica 4 aging in barrel
crianza en madera 1 aged in wood (literally, cask colloquially)
crianza 1 aging
cuerpo 1 body
dejo 2 aftertaste
denso 2 dense
depositos 4 deposits
dorado 2 golden
dulce 2 sweet
elaborado por 3 produced, matured by.
elegante 1 elegant
embotellado por 3 bottled by
embotellar 4 to bottle
en barrica 1 in cask or barrel
envejecimiento 1 aging (also laying down)
equilibrado 1 balanced
equilibrio 1 balance
especiado 2 spicy
espeso 2 thick
estructura 2 structure
evolucionado 2 evolved
expresivo 1 expressive
fermentación alcohólico 4 alcoholic fermentation
fermentación maloláctica 4 malolactic fermentation
fermentación 1 fermentation
final de boca 1 “finish” (literally end/finish of mouth)
final 1 after-taste
fino 1 fine
florals 1 floral
fresco 1 fresh
frescura 1 freshness
frutos cítricos 1 citrus fruits
fuerte 2 strong
graciano 1 red grape variety
grados 1 grade or degree (but alcohol by volume)
heces 2 sediment
hoja 4 leaf
hollejo 2 grape skin
joven 2 young (little or no aging)
jurado de cata 2 wine tasting panel
lágrimas 2 tears
levaduras 4 yeast
lías 2 lees
limpio 1 clean
maceración carbónica 2 carbonic maceration
maceración en frío 2 cold maceration
maceración 1 maceration
madera 1 wood
madura 1 ripe, mature
madurar 1 to mature
manchado 2 literally ‘stained’
manzana 1 apple
maridaje 1 literally marriage or combination; food matches/pairings
Mazuelo 1 red grape variety
mezcla 1 mixture, blend
mosto 1 must (grape juice)
nariz 1 nose (also aroma)
notas 1 notes
olores 1 smell (scents in corpus)
oro 2 gold
oxidación 2 oxidation
parámetros de calidad 1 quality indicators
pasa 2 raisin
pepita 4 seed
perfumado 2 perfumed
persistencia 2 persistence
pimienta 2 black pepper
postgusto (posgusto) 1 [not found] after-taste
prensa 4 press
prensado 1 pressing
pulidos 1 polished
rama 2 branch
recio 2 gutsy
redondo 2 rounded
refrescar 2 refresh
regaliz 2 liquorice
roble Americano 4 American oak
roble Francés 4 French oak
roble 1 oak (as in the barrels)
rojo 2 red
rosado 2 rosé
sabor 1 flavor, taste
sabroso 2 flavorsome
seco 2 dry
sedoso 1 silky
semidulce 2 semi-sweet
semiseco 2 semi-dry
sensación 1 sensation
suave 2 smooth
suelos 1 soils (also ground, floor, land)
tabaco 2 tobacco
tanino 1 tannin
temperatura controlada 1 controlled temperature
temperature de servicio 1 serving temperature, aka, best served at
Tempranillo 1 grape variety
terciopelo 1 velvet
típico 2 typical
trasiegas 1 decant (rackings in corpus)
untuoso 1 literally greasy (aka unctuous), but nicer means ‘smooth’
uva 1 grape
vainilla 2 vanilla
valores 1 values (as in levels of an indicator)
variedad 1 variety or varietal
vendimia 1 vintage, grape harves (whole process)t
vid 4 vine
vina 3 vineyard.
viñedos 1 vineyard, vines
vino blanco 4 white wine
vino de calidad (Quality wine) 3 Must come from a DO or DE. Only wine made from the free-run or lightly pressed juice of ripe healthy grapes, which has undergone a temperature controlled fermentation, qualifies.
vino de cosecha, or vendimia 3 Wines of a particular vintage year. In special cases, if the purpose is to improve the quality of the wine, a maximum of 15% of wine of a previous year may be added.
vino espumoso 4 sparkling wine
vino Fino de Mesa 3 fine table wine.
vino generoso 3 Special aged dry or sweet wines of higher alcoholic strength than table wines. From the Latin term for excellence. Sherries are vinos generosos.
Vino rosado 4 Rosé wine
vino tinto 4 red wine
vino 1 wine
Viura 1 white grape variety
viveza 1 vividness, strength
vivo 2 lively
yema 2 yolk
zarzamora 2 blackberry

I combined four lists. In MSWord I can use different colors and fonts for each list so when I merge them I can easily see where any pair came from, but here in WordPress formatting is more limited so the middle column indicates the source. My extracted list (from all those webpages I processed from both bodegas and restaurants) is 1.  I choose not to provide links for the other three sources, but 2 was certainly the largest.

I eliminated duplication and then used a simple notion of “certainty”. Items from list 1 that are shown here in bold had one or more identical (or almost identical) translation in one of the other lists. This isn’t particularly robust definition of certainty but it will do for this proof concept.

So of the 171 terms in the merged list (82 are from my manual extraction, the remainder from one of the other three lists) only 24 of my extracted terms get marked as “certain” due to occurring in other lists:

afrutado[s], barrica, botella, cata, cosecha, elegante, equilibrado, fermentación, final de boca, fresco, maceración, madura, mosto, postgusto (posgusto), roble, sabor, sedoso, tanino, untuoso, uva, variedad, vendimia, viñedos, vino

There could have been some more since I did not extract really obvious terms from my corpus, such as blanco or seco or dulce or uva. And two of the “confirmed” terms actually are in dispute. Once source admits afrutado is used for ‘fruity’ but this is actually wrong and the term should be frutal. The dictionary confirms afrutado does mean ‘fruity’ but this does not confirm it is the correct term to use in a wine context. Likewise it confirms frutal to be fruit or fruit tree but doesn’t mention how this would be a taste term for wine. So who knows? Which is right? Wine terminology (in English) sometimes contradicts the more common meanings of words since wine tasters understand a particular word in a particular context (and we amateurs just have to learn what they mean). So it’s certainly possible this source might be right BUT how would this ever be confirmed.

Likewise postgusto (clearly ‘after taste’ from context) doesn’t appear in any dictionary. And, in the other lists it appears but is spelled posgusto. Now I’m not sure if this meets the definition of neologism, especially as ‘post’ can mean ‘after’ (in this context) in English but doesn’t occur in Spanish whereas is ‘taste’ or ‘flavor’ so does this word actually exist (or get used in wine documents) and which is the appropriate form?

There was also some conflict between viñedos and vina.  Both are in the dictionary as vineyard but only vina is listed as vines. That is then potentially a flaw in my extraction of pairs since I saw viñedos clearly translated as ‘vines’ in a human translation, but, of course, that person may confuse these two terms.

The term I’m happy I was able to figure out (lots of examination of text to reach my conclusion) is final de boca. This literally would translate to ‘end of mouth’. but it’s more accurate to translate it as ‘finish’, which is actually one of those terms where its usage in wine descriptions has quite different meaning than its common meaning. And one of the lists pronounced that just final is sufficient for ‘finish’ which is one of the literal translations itself. OTOH boca itself has some ambiguity.  It literally means ‘mouth’ but was commonly translated as ‘palette’ in the human translations. That’s not any of the literal translations of ‘palette’. But, again, palette is a word that has different meaning in wine tasting context than its more common meanings.

So, this is all human analysis, with a lot of trial-and-error, back-and-forth, looking in dictionaries and doing web searches. In this contest of John Henry and the machine I think man will win so I really wonder how effective any AI (or just statistical analysis) can be. OTOH, ‘man’ needs to be a fluent Spanish speaker who participates in Jurado de Cata (wine judging panel) and I fall way short of that. But, still, what is the chance I can still produce the best list of wine terms freely available on the Internet? Pretty good, I’d say (given few are even trying).


Verbs again

In my previous post (about finishing initial processing of GallinaBlanca dictionary) I mentioned that verbs can be of some use in interpreting menus, possibly through derivatives of the infinitive form of the verb. So I’ve continued to do some digging in this area and have a few results to share.

Anticipating I’d be looking at verbs, independently of extracting them from the GB dictionary I used about nine online “lists” to compile an aggregate list. These verbs: a) may have nothing to do with cooking or cuisine, b) tend to be more commonly used verbs, and, c) may not be used (at all, or in same way) in Spain. So this is the list I’m calling C.

In the process of other searches I stumbled onto a culinary glossary. It has no connection with Spain and therefore the Spanish words might come from any part of the world. And as I worked with it more extensively and carefully I observe many of the issues with online resources of unknown origin: a) misspellings (probably, don’t want to jump to conclusion just because words seem to be misspelled), b) duplications, often including the singular and plural form, c) words that make no sense appearing in Spanish culinary dictionary (how did these drift in), d) inconsistent formatting and thus order (e.g. A la cazuela vs Cazadora, A la). In a previous iteration of my project I created a “glossary” by merging information from many sources and eventually it became a pisto (hotchpodge, if I can use that word in a non-culinary sense), especially losing any notion of whether the words applied to Spain or some other Spanish speaking area. So with these caveats I’ll call this list G.

And I have my list of verbs from the GallinaBlanca dictionary which I previously described. I’ll call this list D.

Now, simply, it’s too much work to compare the entirety of all three of these lists so I just did the subset (verbs only, of course) of verbs starting with A B or C. While this may be a biased sample it still reveals some interesting information.

Sorting the three lists together (with different fonts and colors for each list so I can distinguish) then I did manual processing to consolidate like terms together. As a result I ended coding each entry with GDC (or – if not in that list). So I generate the following table:

G– 44
-D- 4
–C 35
GD- 28
-DC 1
G-C 9

There are 126 verbs that appear in at least one of these lists. Only 5 verbs appear in all three lists. The list with the largest number of unique verbs is the G (glossary, 44), which thus indicates this is potentially very useful as it adds over 50% more verbs than I had previously found.  The verbs in the C (common) list may have nothing to do with cooking or food (we’re explore that later in the post) so this may not add much. Only 5 verbs from the GallinaBlanca list don’t appear in the glossary list so whoever compiled that got most of the cooking verbs.

So looking at the verbs that are only in the C (common) list and not in either cooking related list we do see a few surprising omissions (I’m assuming that these are SO common no one bothers to include them):

abrir –C to open; to turn on; to whet (as in appetite)
agregar –C to add
añadir –C to add
beber –C to drink
calentar –C to heat, heat up, warm up; to inflame
cocinar –C to cook
combinar –C to combine, mix; to put together, match, coordinate
comer –C to eat; to have for lunch; [Latin America] to have for dinner
concinar –C not in any dictionary, probably misspelling of cocinar
convertir –C to turn into, convert into, change into, make
cortar –C to cut, cut off, carve, slice, cut out; to chop; to cut (dilute sense); …

So out of the 35 verbs in the C (common) list only I’d probably include these 11 in a general purpose culinary list.

Now some of the verbs in the G (glossary) don’t appear to be useful. Some have no definition in any of the dictionaries I routinely use, including the most authoritative of the Spanish language (which is NOT limited to Spain so could include verbs that don’t get used in Spain).  So here are a few I’d consider dubious to include in a culinary glossary:

achicalar G– [Mexico] to cover in honey; soak in honey
añejar G– to age; [vino] to mature; to get stale
apanar G– to coat in breadcrumbs (also EMPANAR or EMPANIZAR)
apuntillar G– to finish off (a toro); to round off
ataviar G– to dress up
bardar G– to thatch
blanchir G– (not in dict) Wiktionary has it as a French term for make white
bresear G– (from glossary) To cook to slow fire, during long time, with condiments (generally vegetables, wine, broth and spices). Clearly a spelling error since not found.
cantar G– to sing; to crow, chirp
caramerizar G– (not in dict), another spelling? [from glossary] Spread a mold with sugar honey.
castigar G– to punish; to ground, keep in; to damage, harm
cerner G– to sift, sieve (same as cernir, which is it?)
chapurrar G– to speak badly

I wouldn’t include achicalar as it doesn’t appear to be used in Spain but this is a good point about my goal here. If I wanted to know the Spanish word, used in Spain, for an English word, I wouldn’t include anything that may be only used outside Spain. But my goal is asymmetric – to translate Spanish (on menus) only into English (so I can choose) so including a word in my corpus (and eventually my app) that is not likely to be used in Spain is not a problem (I do need metadata to note this however, for that term). If I never see the term it does no harm to never have it found in any lookup. OTOH, it would be a problem if I’m trying to translate English into Spanish, as in don’t use a word not found in Spain. It appears, for instance, frijoles, which is well-known to most in USA who visit Mexican restaurants is one such word, not commonly used in Spain, but possibly likely a Spaniard would know the word. That might lead to a scene (from The Way) like no tapas in Navarra, only pinxtos, and thus make you look foolish.

blanchir (to make white, which isn’t exactly synonymous with blanch but one might assume that’s what this means) was interesting in that it did not occur in any dictionary but did have an entry in Wiktionary. The standard term  for blanch is palidecer (purely in the sense of turn white) and escaldar or blanquear for the culinary sense. I suspect  blanchir might be used somewhere (possibly Puerto Rico) where it is just the cognate of the English verb. But, again, in collecting the corpus I should not make judgments like this although I might add metatext to an blanchir entry and meanwhile add it to corpus and then let the “big data” statistical analysis decide if this is a word or not.

bresear really looks like a misspelling (more likely to be brasear, to barbecue) but again it should go into the corpus with metadata notion rather than my passing a judgment on it (IOW, only a real expert in Spanish should be decided what to include or not in any translation dictionary, so if I find only one instance of a misspelled word it will get washed out since there are few occurrences of it in the corpus; OTOH, maybe people do commonly misspell this word so it needs to be in my app). caramerizar appears to be some variant of caramelizar, again perhaps used somewhere and not just a mistake. cerner has exactly the same definition (in the glossary itself, but also spanishdict) as the more common spelling cernir, although both appear in reverse lookup of ‘to sift’ in spanishdict (which is it, then? just a common confusion?) cernido is a possible term to see on a menu so it matters that my dictionary could spot this as past participle of cerner.

So again all this goes to show the work that must be done to really develop a very accurate dictionary that drives my app for menu translation (or to be published as a carefully researched culinary glossary).




How to use collected menus

I use this blog to document a project I’m doing which is to obtain an accurate and comprehensive set of terms (isolated words and phrases) to feed a smartphone app so I can “read” menus in Spain. To do this I am first collecting menus on my virtual “trek” (translating miles on a treadmill to position on the Camino de Santiago) and using Google map’s POI to find restaurants and then process those that have websites with some form of menu I can just extract (don’t want to be typing from images and make all those mistakes).

Most of the menus are in Spanish (rarely I can find one that is dual language, and even then: a) their translation may not be so great, and, b) the English menu may not be the same, so this can be tricky). So I use either Google translate (if the menu is standard HTML webpage) or some tedious copy-and-paste to use (really Microsoft) to translate. Of course these machine translations are often not that great (both wrong and miss many terms) and that is a big issue.

Doing this process is fairly mechanically tedious but doing it slowly also gives me a chance to really observe what is going on (plus get a bit of drill on words, my short-term memory of some Spanish terms is increasing, but based on past projects I know I’ll retain little of that). And, as I’ve documented in some posts occasionally menu items complete befuddle the machine translation which sends me off trying to figure it out myself, an interesting challenge since I have next to zero fluency in Spanish.

Now it is important to note my goal. Learning to speak and hear Spanish is entirely different, especially if you want to have conversations about almost anything (even if still oriented toward travel). I just need to be able to read menus (at least for my limited goal) and choose what I want. And I don’t need to translate in the other direction, so knowing whether ‘mushroom’ is hongo or seta doesn’t matter as much as going the other way.

And, of course, this also does imply knowing something about cuisine in Spain (which can be quite different than what we might encounter in restaurants in USA that happen to use Spanish on their menus). And it is turning out to require knowing something about agriculture in general in Spain, especially in different regions. An ingredient, like chorizo is: a) quite different than the Mexican style chorizo I’d find in markets or restaurants here, and, b) somewhat different in different regions in Spain as each has its own traditional way of making something like chorizo.

So after extracting menus from websites with some sort of translation I end up with side-by-side menu items, like below:

Gambas a la Plancha Prawns on the Plate
Setas a la Plancha Grilled mushrooms
Espárragos Especiales “Dos Salsas” Special Asparagus “Two Sauces”
Ensalada Templada con Gulas y Rape Tempered Salad with Gulas and Rape
Cogollitos de Tudela con Anchoas y Salmón Tudela with anchovies and salmon
Tabla de Ibéricos Iberian Table

I choose these particular items to make a couple of points:

  1. Notice that a la plancha occurs in two consecutive entries and given gambas are prawns and setas are mushroom that means there are two different ways, to both parse and assign a tentative meaning to a la plancha (either ‘grilled’ or ‘on the plate’ (more literal). So what does it really mean? Answer, btw, is that plancha is really “iron” which means a cooking device, either pan or typical restaurant flattop is used to “grill” the item.
  2. In the fourth item gulas appears (and didn’t get translated) and rape is quite ambiguous (is it the English word and therefore shouldn’t be translated or is it a Spanish word that means something entirely different?). gulas are baby eels (or possibly synthetic “worms”, like the fake crab) and rape is a type of fish with more than one translation (monkfish, anglefish).  So how can I use information like this?
  3. Cogollitos de Tudela got translated just to Tudela (the other words in this item are easy to match the Spanish and English). This is actually a flaw (I believe) in Google translation process. Cogollitos is looked up to get “A small heart or flower of garden plant” (or sometimes, just ‘buds’) and Tudela doesn’t appear in any dictionary but turns out to be a town (really just a reference location) where a particular type of lettuce (looks like Romaine) is grown and when served at restaurant the inner leaves are used (often in very attractive presentation). So this is a fairly classic ingredient and dish, especially in northeastern Spain but translation isn’t going to help much. So, a) how certain am I that I’ve figured this out correctly (or even how would I put some certainty on it, like how many different sources I found that confirm my guess at what this is? versus any counter-evidence), and, b) how should I use this information in my corpus.
  4. And what is “Iberian Table”? (a valid literal translation but not helpful). Now doing even a little research on menus one quickly learns that Ibéricos almost certainly refers to a prized pig but how is it connected to Tabla? Sometimes one has to be careful here as I’ve already found an instance where silla (literally ‘chair’, but in the context, really ‘saddle’) refers to a cut of meat so maybe the same is true with tabla? IOW, there is quite a lot of uncertainty here BUT this could be an important item to know.  I suspect, BTW, it’s just a plate with some ham or other cured pork, like an antipasta.

So there are several steps in studying menus:

  1. the mechanical part of getting the Spanish aligned with some sort of translation to English
  2. studying the results for what appears to be clear one-to-one correspondence in terms. But beware – on this single menu both hongos and setas translate to mushrooms? Why are there two difference words (previously hongos had shown up as primarily used in Latin America, not Spain, but obviously this menu contradicts that). And if there is a difference (i.e. they’re not just synonyms) what is it. I have vague evidence hongos refers to cultivated button mushrooms and setas to wild mushrooms (like shiitake or others). That is a big difference.
  3. Some items translate very little and therefore can I find other sources to determine what these items might be? (sometimes yes, sometimes no) And even if I figure out what a word (e.g. Cameros from yesterday’s post) or phrase (a la riojana from yesterday’s post) is, these are not literal translations so how do I mark these. For instance I believe  refers to the mountains in southern Rioja and therefore potentially a breed (or just the husbandry of) sheep that would be recognized as distinctive (like Waygu beef). If I figure this out: a) what confidence do I put on this information, and, b) how to I encode this information in my corpus.

Once a corpus is obtained the assumption is a kind of “big data” can help figure all this out (I haven’t quite figured out what code I’ll write for this, Google claims complex deep-learning AI as their method of training their translation and I don’t have the resources for that approach). But my assumption is that everything in my corpus will have multiple entries and some a lot of entries. So in conjunction with my placing some sort of “certainty” weight on each pair and matching up pairs across a large data space some sort of overall certainty can be derived (probably with a lot of exceptions that have to be looked at my human evaluation which Google says they never do, which also might explain some of their odd translations).

So, just to finish this let me provide an example. From this single menu I extracted (manually, can’t quite imagine how to do this in code) the following table of “pairs” where I’m relatively certain these are correct. IOW, these are mostly just the terms derived via literal translation not the more complicated cases where a lot of guessing is required.

Note: more discussion after this table, please scroll down.

a la Plancha Grilled; on the Plate Lechal Baby lamb
a la Vinagreta Vinaigrette Lenguado Sole
Agua Water Limón Lemon
al Horno Baked Macarrones Macaroni
Albóndigas Meatballs Menestra Stew
Anchoas Anchovies Merluza Hake
Arándanos Blueberries Milhojas Fillets
Arroz Rice Mixta Mixed
Asado Roasted Oveja Sheep
Bacalao Cod Pan Bread
Bebida Drink Patatas Potatoes
Berenjena Eggplant Pato Duck
Bistec de Ternera Beef Steak Pescados Fish
Calabacín Zucchini Pimienta Pepper
Calamares Squid Pimientos Peppers
Carne Meat Postres Desserts
Carrilleras Cheek pieces Precio Price
Cerveza Beer Primeros Platos First courses
Codillo Knuckle Puerros Leek
compartir share Pulpo Octopus
Cordero Lamb Queso Cheese
Croqueta Croquettes Rape Anglerfish
de la Abuela Grandma’s Rebozado Coated
de la Casa of the House Refresco Soda
elegir choose Rellenos Stuffed
en su Tinta in ink reservas reservations
Ensalada Salad Revuelto Scrambled
Entrantes Starters Rojo Red
Espárragos Asparagus Sabores Flavors
Fresco Fresh Salsa Sauce
Frutas Fruit Setas Mushrooms
Gambas Prawns sobre on
Gaseosa Soda Solomillo Sirloin
Guisado Stew; Stewed Tarta de Queso Cheesecake
Helado Ice cream Tomate Tomato
Hongos Mushroom Trucha Trout
Huevo Egg Verduras Vegetable
Incluye Includes Vino Wine
Jamón Ham Yogurt Griego Greek Yogurt
Judías Verdes Green Beans

So a single menu provided a significant (about 80 items) source of raw material to feed into my corpus. Now I’ll just note a few things as to whether further processing should be applied to this list before adding it to a corpus (or, IOW, what metadata should also be embedded in the corpus).

  1.  Judías Verdes ‘green beans’: Should there be an entry verdes as ‘green’ and judias as beans? Now in Spanish adjectives match their noun in both number and gender so verdes might not be the lookup dictionary form for ‘green’ (it’s not, the singular verde is). So that could introduce some confusion in the corpus. And ‘bean’ has multiple translations which often one word being used for the dried beans (or the seeds in the bean pod) versus the whole bean, as in typical green beans.
  2. What about Guisado ? These had two literal translations: ‘stew’ and ‘stewed’ by Google. And in English those are not the same thing even though they’re related. guisado is the past participle of the verb guisar which can mean either just simple ‘cook’ or also ‘stew’.  The context in this menu for the two uses of guisado are “Cordero Guisado” and “Cordero Guisado con Pimientos” so why is Google convinced it’s ‘stew’ (the noun) and ‘stewed’ (the conjugated verb) in these two contexts. Is it right?
  3. Another thing I noticed is that often the English translation doesn’t match the Spanish in number. Figuring out plural and singular forms in a corpus analysis process could be interesting, so putting in an incorrect corresponding pair could be problematic.
  4. And, finally (for today) nouns probably fit into a literal translation mode easier than other parts of speech, or especially colloquial usage, so trucha as trout is fairly high certainty but what about mixta as ‘mixed’? It was used in the context of ensalada (salad) and that item appears to be a typical mixed salad (often “house” salad in US restaurants) but the literal translation of ‘mixed’ would be more likely  variado or diverso; mixta doesn’t occur in lookup dictionary at all, but mixto does in the sense of mixed of both sexes (i.e. a group of people), so why did the salad menu items decide to use feminine form or even mixto at all?

So there are lots of challenges, both extracting the raw data itself, assigning some metadata to the pairs to qualify how they should be treated in the corpus and especially assigning some certainty value (i.e. like a probability, where 1.0 would probably never occur (there is always some ambiguity) and 0.0 is meaningless to even include BUT maybe a single scalar value is insufficient since it’s possible to have high incompatible, in not even mutually exclusive, interpretations).

So all of that is a lot of design work to do and then probably an iterative process once I get some code that can crunch the corpus (thus far, I’ve done some by hand to look for design issues). And, fundamentally, is this even a process I can automate at all or at most the code just brings together related pairs for me to analyze with my intelligence.

Who knows, time will tell.

p.s. [personal]. Doing this mechanical work (and some background study as I go along) and also writing these posts is definitely cramming some Spanish into my brain, but I also know that’s a short-term effect. A year from now I’m not going to remember guisado is the past particular of guisar or that it is related to stews/stewing (as cooking process). So converting this work into: a) a more permanent and usable form (like a smartphone app to carry with me to Spain), and/or, b) creating some drill programs so I could “brush up” just before leaving has a more useful effect.


Finished the GallinaBlanca Diccionario

I’ll explain what “finished” means in a minute but first I am almost at another milestone in my journey, so 1/2 mile outside Nájera, about 20 miles from Logroño and about 60 miles to reach Burgos, on my virtual  camino trek. That is since I’m stuck here in the cold midwest USA I do miles on my treadmill in the basement (training for the Camino, I wish!!!) and translate those boring miles onto a GPS track of the Camino de Santiago and then, most of the time, do a little “walking” courtesy of Google StreetView (the Camino is hardly a wilderness trail if a Google car is driving on it).

So what does it mean that I say I finished the GB dictionary. Well it means the tedious part is over. Their dictionary is provided via Javascript popups and one page for each letter of the alphabet and thus: a) there is no way to easily grab all the terms out of the HTML, and, b) Google Translate doesn’t operate on the popups. So I have to manually click each term, use mouse to get the text of its definition in Spanish, paste that in my MSWord document and in the webpage, get the translation (which it turns out seem to actually be provided my Microsoft; I tried the translation built into MSWord itself and it was pretty ragged), mouse that translation and then paste in the side-by-side table. Then I take the term and attempt to get a simple literal translation (more pasting, possibly into three different webpages).

Needless to say this is big-time tedious (and slow) and that’s what I’ve finished. It may be tedious but going slowly through the list means I take the time to study each result. Often even from the English translation of the definition of the term I really don’t know what the English word would be, which makes that lookup sometimes a surprise. Since this is a specialized vocabulary for cooking many of the terms are more obscure and thus missing in dictionary lookups so it’s off to doing searching and guessing and trial-and-error until I get a reasonable answer. Lots of work but a good learning experience.

So now I have that “done” (probably a few mistakes I’ll have to clean up). So I have pages of stuff like this:

HERVIR (literally boil) Cocer en líquido a una temperatura de 100º. Cook in liquid at a temperature of 100 º.
HORNEAR (literally bake) Cocer en el horno mediante calor seco. Cook in the oven with dry heat.
HUMEAR (literally smoke or steam, and one sense is exactly this definition? ahumar is the culinary verb) Se dice cuando el aceite desprende humo, indicando que está caliente, a punto. It is said when the oil emits smoke, indicating that it is hot, ready.
INCORPORAR (literally incorporate, add, include and mix in) Agregar, unir algo a otra cosa para que haga un todo con ella. Add, join something else to do a whole thing with it.
INSTILAR (literally instill) Echar poco a poco, gota a gota, un líquido en otra cosa. Slowly pouring, drop by drop, a liquid into something else.
LAMINAR (literally laminate) Cortar en láminas muy finas. Cut into very thin slices.

So what am I going to do with this now?

I deliberately picked a chunk of the dictionary that is all verbs because that’s my first attempt to create something derived from this list. There are a lot of verbs in this dictionary because it accompanies recetas (recipes) and these verbs (in some conjugated form) probably occur in the collection of all those recetas. So GallinaBlanca is nicely helping cooks read recetas that might contain a verb they don’t know. There are some fairly obscure verbs in the list.

Now what has this got to do with reading menus which is the focus of my project. Rarely are the menus (at least the list of items you can order) going to have complete sentences explaining the food (perhaps a brief, just a phrase, description). So verbs don’t much matter.

Or do they? A word you will frequently see on menus (even in name of restaurants) is asado.  This is grilled or roasted (as an adjective perhaps modifying some noun) or even just a noun in its own right, grill or roast. But this word has its root in a verb, that is asar (in the infinitive form, i.e. the typical word to lookup in a dictionary (Note: Online dictionaries are often smart enough to handle conjugated forms but typical non-interactive dictionaries (paper or smartphone) require you to see this is a conjugation of a verb and deduce the infinitive form to do the lookup – not easy if you’re unfamiliar with Spanish).  asado is the past participle of asar and as Spanish verbs are far more regular (some exceptions) than English this is almost an algorithmic rule to form past participle from infinitive very (like to baked and baked as a regular case in English). So in a quick extract from my list here are a couple more examples: hervir (to boil) hervido (boiled), estofar (to stew) estofado (stewed), picar (to mince or chop)  picado (minced).

So knowing some cooking verbs could come in handy. Memorizing them all is probably a waste of time but as I intend to collect everything I’ll need this in my smart app that is going to translate menus (having all the conjugations is then easy as well).

But I don’t like to depend on a single source for literal translation (each verb to its most direct English equivalent). Plus some verbs have a ton of different meanings and they are not always labeled as being the culinary sense in every dictionary. And some verbs don’t have much connection, given GallinaBlanca’s definition to the standard (at least online) dictionary definitions. For instance, this tough one to figure out:

ALBARDAR (literally: to saddle, put a  packsaddle on)  Envolver piezas de carne con lonchas finas de tocino, para evitar que se sequen al cocinarlas. Wrap pieces of meat with thin slices of bacon to avoid drying when cooking.

I suppose one might deduce that wrapping meat with bacon is “saddling” it, but really the clue comes from this:

Saddle is a butchery term that refers to the meat that is at the animal’s back and hips. Think of it in terms of the meat that would be in more or less the same place as a saddle on a horse.

I’ve done a fair amount of cooking (and reading cookbooks) and ‘saddle’ as a cut of meat never registered. Or what about this one:

CINCELAR (literally chisel, carve, engrave) Hacer incisiones en una pieza (se utiliza sobre todo para pescados) para facilitar su proceso de cocción, generalmente en los asados. Make incisions in one piece (mainly used for fish) to facilitate their cooking process, usually in roasts.

I’ve done exactly this cooking fish (and more so bread) but I don’t think I’d use any of those literal English verb equivalents to describe the process.

So there is a lot of learn from these verbs. And as I said I don’t like single sources so I sometimes use a page here in this blog (test data) to paste some Spanish in, view that page, and then fire up Google Translate (maybe there is some simpler way but this works without too much hassle).

Now what I’ve read about Google Translate context matters. So a pure list of verbs, especially in infinitive form eliminates any possibility of a contextual AI-ish translation and thus is just a simple literal translation. For verbs with many meanings there is nothing to clue Google about which one to use.

So it was interesting to see how Google did on this translation. I found a total of 132 verbs in GallinaBlanca dictionary. Of these the following 44 had no Google translation:


Now Google can be forgiven (except it claims it’s AI does better than rule-based literal translation) for the verbs in RED since none of my dictionaries know what these are. For instance I actually think acidelar is just a typo since the definition GB gives it “Put lemon juice or vinegar in the water to cook poached eggs or vegetables, so that they do not blackened. ” is fairly similar for the known acidular whose definition is “Sprinkle with an acidic liquid fruit, vegetables or vegetables so that they retain their whiteness or colour.” But the definitions are not exactly the same and for me to declare acidelar to be a mistake is premature; after all it could be some alternate spelling or perhaps a regional difference from the standard dictionary Spainish, or, worse, it might be the spelling used in Spain versus what is used elsewhere. I simply do not have enough data to decide.

So what about something like

MOREAR (not in any dictionary) Dar vuelta sobre el fuego bajo y con un poco de aceite en un sartén o cacerola a los alimentos, para que tomen color antes de añadirle salsa o caldo. Turn over the low heat and with a little oil in a frying pan or pan to the food, so that they take color before adding sauce or broth.

This comes up blank in all dictionaries and most web searches I’ve tried. So the question is do I believe this is even a word (or perhaps it’s from some other language used in Spain). It certainly sounds like sauté (cooking technique) but that is saltear GB defines as “Stir the food in butter or hot oil when frying in an uncovered skillet.”

Now for the words not in RED I did find literal translations of them including ASAR which I find surprising that Google doesn’t know (this, as you recall, is the verb I used as example above to explain why I’m investigating verb, i.e. it is the infinitive root for asado, a very common word on menus). And I’m also surprised it didn’t know GUISAR (cook, stew; cook up) since I can recall from memory seeing that and especially its past participle guisado (refers, as a noun, to  casserole, stew, or, most generically, dish) and as an adjective as stewed. And I’ve seen rebozado (covered in batter or breadcrumbs) on numerous menus and it’s the past participle of REBOZAR (to coat in batter or breadcrumbs) that Google didn’t know. Now, OTOH, TRUFAR (try to guess before reading the translation) is probably sufficiently obscure Google may not have seen this but given the price of the item for this word you’d want to know what it means if you saw it on a mean (it means, to stuff with truffles).

Now as the other verbs which Google did have some translation I’m going through a somewhat tedious process of digging out (again, but this time in a single consistent process) the literal translations so I can compare Google to other sources. And sources are going to matter. Not only is it hard to say with absolute certainty what an appropriate translation is going to be (I believe even fluent Spanish speaking authorities might debate some verbs) I need to do this comparison of various sources in a systematic way, not believing one source over another until I can potentially “confirm” a translation via some processing of a large corpus of translated food related material, IOW, exactly what I’m building up now.

For the verbs Google did translate here are a few of the issues I’ve found thus far (not done with this analysis):

  1.  Often Google chooses the present participle as the translation instead of the infinitive, e.g. ADOBAR, Google says marinating instead of to marinate, not a big deal overall but this might get into a corpus and create a statistical flaw later in the analysis.
  2. For AVIAR Google picked the most literal, namely an adjective ‘avian’ rather than to prepare as the root verb (multiple meanings, this one matches the GB definition, “Prepare birds for cooking. It consists of all pre-elaborations that must be made to a piece: cleaning, flamed, wicking, flanged, etc.”  Note: That GB has defined this in more specific way than did and given the Latin root for both the verb and the adjective the GB definition is definitely superior (plus being more useful to understand in the context of cooking).
  3. Picking one of several literal translations, but not in the culinary sense (which I do, looking at because I know culinary is the context), e.g. BRIDAR which Google translates as ‘bridle’ (literally OK), but to tie or truss is much more useful in cooking sense.
  4. Or something like DESPLUMAR, which Google picks the present participle Fleecing, which is a plausible translation. But the GB definition is “Remove the feathers from the bird.” which comes closer to an alternate definition, ‘to pluck’. Amazingly using fleecing is a colloquial usage somewhat like English where someone is taken advantage of and thus “fleeced”.

I’m sure there will be more as I finish grinding through but this post, already TMI, hopefully gives a sense of how I’m post-processing the pure mechanical part of my study to pound the raw data into a more usable form to then create my corpus (all preliminary to creating my AI-ish smart menu translator).




Adventure in menu and receta

As I’ve mentioned I’m doing a “virtual” trek of the Camino by converting the miles I do on a treadmill in my basement to locations along the Camino (I found a detailed GPS track). It would be way more fun to be walking the real Camino but this is better than nothing, especially as the Google cars cover most of the route (or nearby) and so I can use StreetView to “see” my surroundings. Plus for restaurants in the towns usually there are many photos and sometimes websites with menus.

My progress in this virtual walk is much slower than I’d have to do (and could do) on a real walk but it’s fairly steady. So a few days ago I passed through Logroño and found as many online menus as I could (saved the URLs for later). Meanwhile the recipe dictionary I’ve been studying has taken most of my time. But my trek continues and I’m at the edge of Navarrete so I thought I should at least dig through one of the menus from Logroño.

So I’ve been looking at Tondeluna’s website. The carta actually has English translations BUT, unfortunately that document is in the form of a PDF that is locked and so I can’t extract any text from it to create my corpus entries. I’m not going to try to type it into my corpus because I’ll make too many mistakes and create bad data. But, the group menu, also a PDF, does permit me to extract its text but it has no translations so I have to do all the tedious manual mouse work to finally get side-by-side Spanish and English.

Right away I encountered this:

COCINA CLÁSICA PARA PICAR  AL CENTRO Y SEGUNDOS INDIVIDUALES Classic kitchen for center-chopping and individual seconds

Now “center-chopping” is one of those translations that immediately catches my attention (as wrong, even often silly) so I tried to figure it out. Starting with picar whose first couple of translations (sting, bite, peck at) don’t make a lot of sense I realized I’ve seen this before and the more useful translation (way down in spanishdict’s list) is the colloquial meaning (nibble on or snack). But what has centro got to do with it? Seeing that in addition to the obvious cognate of center ‘middle’ is another translation, and, I realized this must have something to do with putting (what is the starters and appetizers of this menu) in the middle of the table for all the diners to snack on. Then segundos individuales implies that those plates are then served to each individual.

In trying to figure this out my various searches (about how meals get served in restaurants) didn’t directly answer my question but one article had a somewhat different description of ración than what I’d previously found.  [btw: That article has lots of interesting information about restaurant meals] They claimed that unlike how it sounds ración is too large for one person and so it is typically shared and, of course obviously, it would probably be placed in the middle of the table to be shared. OK, fine, but I still can’t quite decide how to interpret these entries (and haven’t found anything online to explain)

LAS CROQUETAS que mi madre Marisa, nos enseñó a hacer (al centro 2 unidades por persona) The croquettes that my mother Marisa, taught us to do (to the center 2 units per person)
LA ENSALADILLA RUSA de Tondeluna con mahonesa aireada (1×3 al centro) The Russian salad of Tondeluna with aerated mayonnaise (1×3 to the center)

It’s this bit, (al centro 2 unidades por persona), that makes some sense. The ración will include two croquetas per person on a big serving platter in al centro (this is just for pricing, so person could eat one and another eats three, or whatever). But what about (1×3 al centro)? Given this is salad, not a discrete item like a croqueta what are they saying about how much is included in the ración? Is is one “serving” for every three people or three servings per person or what? The other ENTRANTES FRÍOS on Menu1 (25€, the cheapest of the five different group menus) is

TARTAR ALIÑADO DE SALMÓN aliñado con lima y alga wakame (1×3 al centro) Seasoned tartare of salmon with lime and wakame seaweed (1×3 to the center)

and that is no more illuminating on this point. Scanning further in the more expensive menus reveals:

CARPACCIO DE GAMBA sobre tartar de tomate, dátiles, cebollino y ajo blanco (1×4 al centro) Prawn CARPACCIO on tomato tartare, dates, chives and white garlic (1×4 to the center)

so clearly this 1xN is some kind of notation indicating quantity that is put on the serving platter in the middle of the table, but I don’t get it. It doesn’t much matter to me since I wouldn’t be in the restaurant with a group so I’d just be ordering off the carta (which I can read online, some good items, but can’t (easily) add to my corpus).

Moving on this work led off in a different direction. One item on the group menu

SAN JACOBO DE LENGUA  y queso de Cameros y salsa de champiñones (1/2 individual) SAN JACOBO de LENGUA y queso de cameros and mushroom sauce (1/2 individual)

It’s surprising to me that queso didn’t get the obvious literal translation so chasing down queso de cameros was my first quest which had a simple reference (a goat cheese originating in the Sierra de los Cameros in La Rioja) which makes sense given this is a restaurant in La Rioja. It wasn’t hard to get the literal of lengua (tongue) but otherwise this dish remained mostly a mystery. But as I’ve found before it’s likely the San Jacobo qualifier would lead to a fairly specific dish in a search and it did. There are photos and multiple links to recipes so I chose to look at this recipe.

Long story short I went through all the actual cooking instructions (not sure why but called Elaboración on this webpage) doing my side-by-side (Google translation) and then analyzing the entries. This one caught my attention:

Emplata a tu gusto los san jacobos con la ensalada y sirve . Emplata to your taste the san jacobos with the salad and serves.

Google couldn’t figure out emplata nor could spanishdict but interestingly wiktionary had an entry that made total sense (in the context of this step of the recipe) – to plate. So I realized I’d found a “cooking” verb I hadn’t encountered before (I have a running list of these). So I decided to find all the verbs in all the steps of the recipe and ended up with several new ones for my list (in some cases it’s the meaning, in this context, that is new as the verb had appeared before in some other sense). So just from this one recipe I found all these:

añadir to add
cocinar to cook
condimentar to season
cotar to cut
cubrir to cover
dejar to leave or let
desgranar to shell
emplatar to plate
enfriar to cool
escurrir to drain
freír to fry
introducer to insert
mezclar to mix
pasar to pass
pelar to peel
poner to put or add
repitir to repeat
servir to serve
subir to rise
trocear to cut up

Pretty nifty, eh, plus some practice reading Spanish. I know that repetition is key to learning a language so every time I go off on one of these digressions a bit more sinks it each time.