De nada por usar mi glosario

Thanks for using my glossary.

I’ve noticed an uptick in hits on my glossary and verbs pages and that’s fine with me. I’ve spent a lot of time searching for clues about Spanish food words online and have found a lot of material. So now that I’ve assembled my list I’m happy to give back. I hope no one extracts by glossary directly, not so much because I’m worried about IP rights (intellectual property) but because anything online should definitely be considered of dubious authoritative meaning.

In my glossary I tried hard to get reasonably accurate English equivalents, but: a) I have only a little Spanish fluency and thus might easily make mistakes, and, b) without thorough proofreading and review even correct information may have errors, especially typos.

Also exactly where (geographically) a word is used matters. Once when I just blindly compiled and merged some glossaries I found I didn’t realize that the same Spanish word, in Spain, in Mexico, in Puerto Rico and other Spanish speaking countries doesn’t have the same meaning, especially for food. If you think this is a minor issue, see what you get (being used to Western Hemisphere food) when you order a tortilla in Spain or tostada in a Mexican restaurant in USA.

I think I’ve found most of the online glossaries and dictionaries and be assured I’ve found lots of mistakes and inconsistencies in them. So it would be a big surprise that mine also doesn’t have numerous mistakes. I’d love it if I had readers that wanted to comment corrections or even disputes about word meanings. I’ve found a few web sites where people do debate the meaning of Spanish words, but, thus far, haven’t managed to bring that kind of discussion to my blog, which I’d really like. But, Oh Well, at least someone is extracting something from my glossary.

I’d also note that both my glossary and my cooking verb list is an ongoing project, especially the verb list. I am spending as much time as I can immersed in Spanish, but recently mostly actually learning Spanish (I gave in and finally signed up for a real course, not just my self-study). So the time I have to continue to improve my glossary (I have a lot of source material I could compile, edit and add to my glossary) and I really need to get back to my verb list (quite extensive, but not thoroughly researched or published).

I enjoy doing this project even without any sharing with anyone, but it’s even better if the work I’m doing somehow benefits someone else. But, again, while I don’t mind people extracting material from this site, just be aware its quality is somewhat suspect and you should not think this is some authoritative source. Use my material but use your own judgment about its accuracy.

Animales en el menú

Just in case you ever dine at some restaurant in a Spanish speaking country where some more unusual animals might appear on the menu I thought I’d give you a list.

I like making lists and this was a fairly easy one as there were numerous sources with only a few contradictions. Of course a constant challenge with picking up Spanish words from the Net is some of these might be regional. And while I like crunching through lists (and now I think this one here is the largest one you’ll find) it’s a lot more work (than I want to do now) to research these palabras and see if they’re really 100% accurate, at least according to authoritative sources.

Note: While I list the sources (first column) in this table, just to show how frequently a word appears in various lists, I don’t actually provide the sources, so this is fully derivative work and not credited to original sources. The words show in bold are my choices for one word for each letter, usually something that was original in Spanish and imported into English.

So enjoy and beware in your dining that if these appear on the menu you’ll actually want to order them.

1 5 abeja bee
1 áfido aphid
2 aguaviva jellyfish
1 3 4 5 águila eagle
4 aguilón large eagle
5 aguja del diablo dragonfly
1 alacrán scorpion
3 5 albatros albatross
1 4 alce elk
1 3 4 5 alce moose
4 alce de américa moose
1 5 almeja clam
1 alondra lark
1 alpaca alpaca
1 5 anchoa anchovy
1 5 anfibios amphibian
1 5 anguila eel
1 3 4 antílope antelope
1 2 5 araña spider
2 ardil squirrel
1 3 4 5 ardilla squirrel
3 4 ardilla listada chipmunk
3 ardilla voladora flying squirrel
1 5 arenque herring
1 3 4 armadillo armadillo
1 asno donkey
1 5 atún tuna
4 1 ave general class of birds (also a large bird)
1 3 4 5 avestruz ostrich
1 3 5 avispa wasp
3 babuino baboon
1 5 bacalao cod
1 3 bagre catfish
1 2 3 4 5 ballena whale
3 ballena jorobada humpback whale
1 barracuda barracuda
4 becerro calf
4 bicha snake
2 bichos bugs
1 3 4 bisonte bison
1 4 boa boa
4 borrego lamb
1 4 5 buey ox
1 4 3 5 búfalo buffalo
4 búfalo de agua water buffalo
1 2 4 5 búho owl
1 3 5 buitre vulture
4 burrito baby donkey
1 2 3 4 burro donkey
1 4 caballa mackerel
1 caballito de mar seahorse
1 2 3 4 5 caballo horse
3 caballo de mar seahorse
1 2 3 5 cabra goat
4 cabra montés mountain goat
4 cabrito baby goat
1 cacatúa cockatoo
2 3 4 cachorro puppy
4 caimán alligator
1 3 4 5 caimán alligator
1 3 5 calamar squid
3 calamar gigante giant squid
1 3 camaleón chameleon
1 5 camarón shrimp
1 3 4 5 camello camel
1 5 canario canary
1 2 5 cangrejo crab
1 2 3 4 5 canguro kangaroo
1 capibara capybara
1 2 3 caracol snail
5 caribú caribou
4 carnero ram
1 3 4 5 castor beaver
1 2 3 4 5 cebra zebra
1 2 3 4 5 cerdo pig
4 cerdo salvaje wild hog
3 cerdo vietnamita pot-bellied pig
1 chacal jackal
1 3 4 5 chimpancé chimpanzee
1 5 chinche bedbug
5 chita cheetah
4 chivo goat
4 chucho dog
1 3 5 ciempiés centipede
2 3 4 ciervo deer
1 cigarra cicada
1 4 5 cigüeña stork
1 3 5 cisne swan
3 coala koala
2 cobaya guinea pig
1 4 cobra cobra
4 cochino pig  (usually live)
4 cocodrilo alligator
1 2 3 4 5 cocodrilo crocodile
1 codorniz quail
1 3 5 colibrí hummingbird
1 3 comadreja weasel
1 cóndor condor
3 conejillo de indias guinea pig
1 cobaya guinea pig
1 2 3 4 5 conejo rabbit
4 cordero lamb
4 coto monkey
4 cotorra parrot
4 couger couger
1 4 coyote coyote
5 crustáceos crustacean
1 2 3 5 cucaracha cockroach
1 cuco cuckoo
1 5 cuervo crow/raven
4 5 culebra snake
1 danta tapir
1 2 3 4 5 delfín dolphin
1 demonio de tasmania Tasmanian devil
1 dingo dingo
3 dragon dragon
1 dragón de komodo Komodo dragon
1 dromedario dromedary
1 2 3 4 5 elefante elephant
3 elefante africano African elephant
3 elefante asiático Asian elephant
1 emú emu
1 3 erizo hedgehog
1 erizo de mar sea urchin
1 5 escarabajo beetle
1 3 5 escorpión scorpion
1 estornino starling
1 3 estrella de mar starfish
1 3 faisán pheasant
1 5 flamenco flamingo
1 2 3 4 5 foca seal
1 4 gacela gazelle
1 galápago freshwater tortoise
4 galbana sloth
1 2 3 4 gallina chicken (usually a hen)
4 gallito baby chicken
1 2 4 gallo rooster
1 gamba shrimp
1 2 4 5 ganso goose
1 garrapata tick
1 garza heron
4 gatito kitten
1 2 3 4 5 gato cat
3 gato montés bobcat
1 gato montés wildcat
1 gavilán buzzard
1 5 gaviota seagull/gull
1 gecko gecko
3 geco gecko
3 glotón wolverine
1 golondrina swallow
1 3 5 gorila gorilla
1 3 gorrión sparrow
1 5 grillo cricket
1 grulla crane
1 guacamayo macaw
1 guepardo cheetah
1 2 gusano worm
1 3 5 halcón falcon
5 halibut halibut
1 2 3 hámster hamster
1 3 4 5 hiena hyena
5 hipogloso halibut
1 2 3 4 5 hipopótamo hippopotamus
1 3 5 hormiga ant
3 hormigas rojas fire ants
5 huachinango red snapper
1 3 hurón ferret
1 4 iguana iguana
1 2 5 insectos insects
1 jabalí boar
1 4 jaguar jaguar
1 jerbo gerbil
1 jilguero goldfinch
2 jirafa giraffe
1 2 3 4 5 jirafa giraffe
1 kiwi kiwi
1 5 koala koala
1 3 5 lagartija lizard
4 lagarto alligator
5 lagarto lizard
5 lamprea lamprey
1 3 5 langosta lobster
5 langostino crayfish
4 lechón pig    (usually cooked)
1 4 lechuza owl
1 5 lenguado sole
1 2 3 4 5 león lion
1 león marino sea lion
1 3 4 5 leopardo leopard
3 leopardo cazador cheetah
1 3 5 libélula dragonfly
1 liebre hare
4 lince bobcat
1 3 5 lince lynx
1 4 llama llama
1 2 3 4 5 lobo wolf
2 lobo marino sea lion
1 lombriz earthworm
1 3 4 loro parrot
1 3 luciérnaga firefly
4 macaco monkey
5 makerela mackerel
5 mamboretás praying mantis
1 3 5 manatí manatee
1 5 mandril baboon/ mandrel
1 mangosta mongoose
1 3 mantarraya stingray
5 mantis praying mantis
1 mantis religiosa praying mantis
1 2 3 4 mapache raccoon
1 2 3 5 mariposa butterfly
3 mariposa monarca monarch butterfly
5 mariposa nocturna moth
1 5 mariquita lady beetle (lady bug)
1 marmota marmot/groundhog
5 marsopa porpoise
1 martín pescador kingfisher
2 mascotas pets
4 maza monkey
1 medusa jellyfish
2 medusa jellyfish
1 mejillón mussel
5 mero bass
4 mico monkey
1 3 5 milpiés millipede
1 mirlo blackbird
1 4 5 mofeta skunk
5 moluscos mollusk
4 mongosta mongoose
1 mono ape
1 2 3 4 5 mono monkey
3 mono araña spider monkey
1 2 3 4 5 morsa walrus
4 morueco ram
1 2 3 5 mosca fly/housefly
3 mosca de la fruta fruit fly
1 5 mosquito mosquito
1 3 mula mule
1 3 4 5 murciélago bat
1 musaraña shrew
1 narval narwhal
1 3 4 5 nutria otter
1 4 ñandú rhea
1 ñu wildebeest
3 ocelote ocelot
4 oposum opossum
3 orangután orangutan
1 5 orca orca / killer whale
4 orco killer whale
1 3 ornitorrinco platypus
1 2 3 oruga caterpillar
1 2 3 4 5 oso bear
1 3 oso hormiguero anteater
3 oso negro black bear
1 3 oso panda panda bear
1 5 oso perezoso sloth
1 3 oso polar polar bear
5 ostiones oysters
1 5 ostra oyster
5 otaria sealion
1 2 3 4 5 oveja sheep
2 3 4 5 pájaro bird
1 3 5 pájaro carpintero woodpecker
5 paloma dove
1 5 paloma pigeon
5 panda panda
1 panda rojo red panda
1 pangolín pangolin
1 panther panther
1 4 5 papagayo parrot
3 pastor alemán German shepherd
1 4 5 pato duck
1 4 5 pavo turkey
1 3 5 pavo real peacock
1 2 3 4 5 peces/pez fish
1 3 4 5 pelícano pelican
5 perca perch
1 perdiz partridge
3 4 perezoso sloth
1 2 4 perico parakeet
1 4 periquito parakeet
4 perrito puppy
1 2 3 4 5 perro dog
4 pescado fish  (caught, usually cooked)
1 5 petirrojo robin
1 pez espada swordfish
3 pez león lionfish
1 2 3 5 pingüino penguin
1 pinzón finch
1 5 piojo lice / louse
1 3 piraña piranha
1 4 pitón python
2 polil moth
1 3 polilla moth
2 pollito chick/chicken
1 4 5 pollo chicken
1 4 potro colt/foal
2 4 puerco pig
3 puerco espín porcupine
1 5 puercoespín porcupine
1 pulga flea
1 pulgón aphid
1 3 5 pulpo octopus
1 3 4 5 puma cougar
5 rana frog
1 3 5 rana frog
3 rana de árbol tree frog
5 rape monkfish
1 3 4 5 rata rat
1 2 3 4 5 ratón mouse
1 raya ray
1 reno reindeer
1 5 reptiles reptiles
5 rezaderas praying mantis
1 2 3 4 5 rinoceronte rhino
5 róbalo haddock
1 ruiseñor nightingale
1 3 5 salamandra salamander
1 5 salmón salmon
1 3 5 saltamontes grasshoper
1 sanguijuela leech
5 santa teresas praying mantis
1 4 5 sapo toad
1 5 sardina sardine
1 sepia cuttlefish
4 serpiente snake
1 2 3 serpiente snake
3 serpiente de cascabel rattlesnake
1 serpiente de coral rattlesnake
1 suricata meerkat
1 tapir tapir
5 tecolote owl
1 4 tejón badger
1 tejón australiano wombat
1 5 termita termite
1 4 ternero calf
1 2 3 4 5 tiburón shark
3 tiburón martillo hammerhead shark
1 2 3 4 5 tigre tiger
3 tigre siberiano siberian tiger
3 tijereta earwig
1 topo mole
1 2 4 5 toro bull
5 tortolita lady beetle (lady bug)
5 tortuga tortoise
2 3 4 5 tortuga turtle
3 tortuga baula leatherback turtle
1 tortuga de mar turtle
1 tortuga de tierra tortoise
3 tortuga marina sea turtle
1 tritón triton
1 5 trucha trout
1 5 tucán toucan
1 5 urraca magpie
1 2 3 4 5 vaca cow
1 4 5 venado deer
1 víbora adder
1 visón mink
1 wombat wombat
1 yegua mare
1 5 zancudo mosquito
3 4 zarigüeya opossum / possum
1 3 4 5 zorrillo skunk
2 3 4 5 zorro fox

Additions to glossary

The glossary page in this blog, at the moment, has been compiled by hand. This is NOT the process I intend to use for my definitive glossary to embed in my translation app since hand compilation is subject to numerous errors, plus the source material may be incorrect. But I like having some result even before I manage to generate a definitive glossary where each entry is found in numerous sources and checked against authoritative guides.

In the past I’ve searched for glossaries all over the Net and manually consolidated them. The result was a mess due to: a) often the source glossary had mistakes made by whoever compiled it, b) the Spanish terms may not apply to Spain which is my focus (for example, hongo is mushroom in most of Latin America but rarely used in Spain), and, c) terms from a glossary may not overlap as I want with actual references on menus (in Spain) which is my focus.

All that said, nonetheless I continue to make additions. In this case I was looking at some travel books and cookbooks I’d gotten on my previous fascination from Spain and realized I had the Langenscheidt Pocket Phrasebook (Spanish), 2006 edition, which includes a 1400 word dictionary. So I fairly quickly went through the dictionary and extracted words that relate to food or to restaurants. From that list I found which were not already in my v3.2 of the glossary. Now I’ve updated my glossary page, but here I’ll show what kinds of words were missing (previously the glossary page had come entirely from extracts of menus). A few of these terms, I realized, should also be included in my restaurant terms page, so that has been updated as well.

abierto open
achicoria chicory
amarg{o|a} sour
aromáticas herb
asiento seat
avena oats
batido milkshake
boca mouth
bombilla bulb
botella bottle
brazo arm
brécol broccoli
bufet buffet
caballa mackerel
cabeza head
calle street
camarer{o|a} waiter/waitress
carajillo coffee with brandy
cartilago cartilage
cena dinner
centeno rye
cerebro brain
cereza cherry
cerrado closed
cervecería beer hall
cerveza de barril draft beer
cerveza rubia lager
champán champagne
cóctel cocktail
col cabbage
comer to eat
comestibles groceries
composición ingredients
coñac brandy
concha shell
condimentad{o|a} seasoned
confitería candy store
conserva canned food
cortado espresso with a dash of milk
crema de leche coffee creamer
cruasán croissant
cubiertos silverware
cuchillo knife
cuello neck
cuenta bill
cuerpo body
desnatada low-fat
destilería brewery
diente tooth
dinero money
endibia[s] endive, correct spelling as previous was wrong
entero whole
entrada entrance
erizo de mar sea urchin
especias spices
espeto skewered
espina fish bone
espumoso sparkling (in wine context)
estómago stomach
estragón tarragon
estrella star
factura bill
fruta del tiempo seasonal fruit
ginebra gin
gofres waffles
gratuito free of charge
guayaba guava
helada frost
hervid{o|a} boiled
hervid{o|a} cooked
hierba herb
higos fig
hornillo stove
hueso bone
infusion de hierbas herbal tea
jardín garden
jarra jug, pitcher
langosta lobster
lengua tongue
limonada soda
macedonia de frutas fruit salad
manzanilla chamomile tea
margarina margarine
menú menu
mojado wet
molino mill
músculo muscle
nectarina nectarine
número size
ocupad{o|a} taken
ojo eye
pan integral whole grain bread
penecillo roll
pez espada swordfish
pierna leg
poleo de minta peppermint tea
polvo powder
pomelo grapefruit
primavera spring
propina tip
raíz root
reserved{o|a} reserved
ron rum
rosado róse
rosbif roast beef
sala hall, room
salami pepperoni
salida exit
sandía watermelon
sangre blood
sarro tartar
semana week
semiseco medium dry
sémola semolina
servicio restroom, service
servilleta napkin
suplemento surcharge
taberna bar
tea
tenedor fork
terraza terrace
trucha trout
uva grape
vajilla tableware
ventanilla counter (window)

 

Blog note

After consolidating terms from numerous menus, plus the recent post about restaurant terms, I substantially updated the page under the tab RESTAURANT PHRASES. The main change was the addition of a list of phrases which I’ll include here for convenience. Enjoy!

 

In this list the notation {x|y} means this word occurs with either x or y in this position, usually this is gender in adjectives, so {a|o}. [x] means optional, most often [s].

a elegir to choose [from]
a tu elección at your choice
acompañad{a|o}[s] accompanied
al centro in the center (of table, i.e. for sharing)
al estilo X in the style of X
al gusto to taste (doneness), i.e. cooked to order
al peso by weight
bebida[s] drinks
carta the a la carte menu
casa literally house, from this restaurant
caser{a|o} homemade
combinados combinations
degustación tasting/taste (often a separate menu)
del día of the day
diario daily (available item or open)
elaboración preparation
eliges tú los ingredientes you choose the ingredients
en temporada in season
entrantes starters (aka appetizers)
especialidad specialties
horario hours (as in when it is open)
incluid{a|o}[s] included
ingredientes ingredients
mesa table (different from tabla)
para acabar to finish (after main part of meal)
para comer to eat (main part of menu)
para compartir to share
para picar to nibble on (aka snacks or appetizers)
por encargo on request
postres desserts
precio[s] price
primeros [platos] (primer) first course
segundos [platos] second course
selección/seleccionado selection/selected
servido [con] served [with]
surtido assortment
tabla board/plank or platter (usually an assortment, often of ham)
unidad unit (abbreviation uds)
vari{e|a}d{a|o}[s] assorted, varied, variety

Quesos de España – A Great Source

I took a break from decoding menus from restaurants in Spain to look at cheeses that originate in Spain. I’ve done this type of investigation before (previously for Italy) and it’s a challenging task. Names of cheeses can be very inconsistent from different sources. Even with DOP names now more common there can still be inconsistencies.

And, of course, using any online source for raw material has the challenge that its author may be wrong or misspelled names or introduced other errors. And consolidating all the names found in different sources is difficult to automate while simultaneously this is a large quantity of information to attempt to mentally collate especially when one is not conversant in the language.

I’ll explain my process below but in case you just want the excellent source I found I’ll describe it first, even though it was after a lot of searching I discovered it.


While it’s entirely in Spanish and as a PDF not subject to Google Translate when accessed through the web browser this is a very nice document: CATÁLOGO ELECTRÓNICO DE QUESOS DE ESPAÑA (slow to download but worth the wait).

It has pictures of the cheeses and even some of the animals for the milk plus standardized descriptions including items like: Zona de Elaboración (processing area), Ingredientes (ingredients), Tipo de Queso (cheese type), Aspecto Exterior (outward appearance) and Aspecto Interior (interior appearance).

And then even more helpful is this section, Características Organolépticas (Organoleptic  characteristics, I had to look up the English definition on this which is “acting on or involving the use of the sense organs”), which then includes: Textura al Tacto (texture to touch), Olor (odor), Textura en Boca (texture in mouth), Aroma (aroma), Sabor (flavor), Otras Sensaciones (other sensations), Gusto Residual (residual taste), Persistencia (persistence). In case you’re not sure what Gusto Residual means here it is for Gamonedo cheese (from  Principado de Asturias):

El gusto después de ser tragado es: a avellana, con predominio suave de humo (The taste after being swallowed is: a hazelnut, with soft predominance of smoke.)

And here is an example of Persistencia for Curado (cured/aged) Mahón-Menorca cheese:

Media-elevada, presencia de mantequilla fundida, aceite de oliva y caldo de carne. Entre quince y treinta segundos  (Medium-high, presence of melted butter, olive oil and meat broth. Between fifteen and thirty seconds)

In addition to this extensive, informative and attractive PDF there is another part of this site where you can filter the list of cheeses, i.e. Buscador de quesos (Cheese Finder (aka Search Engine)). The filters are: Seleccione (Select): Comunidad Autónoma (Autonomous Community), tipo de leche (milk type), calidad diferenciada, régimen de calidad (differentiated quality, quality regime).  So for example I did search for cow’s milk (leche de vaca) cheeses from Cantabria and all (todas) quality regimes and got:

Marca

(mark or brand)

Tipo

(type)

Procedencia Leche

(Origin of milk)
Comunidad Autónoma

(Autonomous Community)

Picón-Bejes-Tresviso D.O.P. Leche de vaca CANTABRIA
Queso Nata de Cantabria D.O.P. Leche de vaca CANTABRIA
Queso Pasiego Sin figura de calidad comunitaria reconocida

(No recognized community quality figure)
Leche de vaca CANTABRIA

After finding the list you can click on the cheese name for the full information page equivalent to the CATÁLOGO pages. You could either use the search tool to find a cheese you might want to try (some Spanish cheeses can be obtained online) or browse the CATÁLOGO.


back to my process for compiling a list of cheeses

But undaunted by these challenges, from past experience, I decided it was time to assemble a complete and accurate list. This only slightly matters for reading menus at restaurants and more likely would be useful for purchases at retail establishments but again knowing what you’re eating in another country is the inspiration for my project.

So I proceeded with the usual suspects, first doing several Google searches (to get the terms right to provide the best source materials) and then following several promising sources. As usual Wikipedia had a useful page List of Spanish cheeses with a fairly long list (fortunately tagged by region) with some links to pages for the more common cheeses. Having processed this list I immediately assumed the Spanish language version of Wikipedia would possibly have an even better list and it did – Quesos de España. Another seemingly authoritative source, Spanish Cheese Guide, covers all (?) of the DOP names.

From all these sources I generated a single list which required picked a “canonical” name and then finding all the variations from the sources. For example this cheese, Arzúa-Ulloa, appeared in all my sources (compiled thus far) but as you can see under quite different names even including a misspelling.

Queso Arzúa-Ulloa (P.D.O.) Galicia 1 link
Arzula Illoa 2 link
Arzúa Galicia 3
Arzúa-Ulloa Galicia 5 link
Arzúa-Ulloa Galicia 6 link

So after consolidating the list from five sources and choosing what appears to the the “standard” name (for those cheeses that appear on more than one list) here is what I believe is a fairly comprehensive lists:

Abredo, Acehúche, Afuega’l Pitu, Ahumado de Pría, Alhama de Granada, Alpujarras, Andalucía de cabra, Ansó-Hecho, Aracena, Arribes de Salamanca, Arzúa-Ulloa, Babia y Laciana, Barros, Benasque, Beyos¸Buelles, Burgos, Cabrales, Cáceres, Cádiz, Camerano, Campo Real, Campoo-Los Valles, Casín, Cassoleta, Castellano, Cebreiro, Colmenar Viejo, Flor de Guía, Fresnedillas de la Oliva, Gamonedo, Garrotxa, Gata-Hurdes, Gaztazarra, Genestoso, Gran Canaria, Grazalema, Guriezo, Herreño, Ibores, Idiazábal, L’alt Urgell y La Cerdanya, La Adrada, La Bureba, La Calahorra, La Gomera, La Montaña de León, La Nucía, La Peral, La Serena, La Siberia, La Sierra de Espadán, La Vera, Lanzarote, Letur, Los Montes de Toledo, Mahón-Menorca, Majorero, Málaga, Mallorquí, Manchego, Mató, Miraflores, Montsec, Murcia, Murcia al vino, Nata de Cantabria, Oropesa, Oscos, Ossera, Palmero, Pasiego, Pastor, Pata de mulo, Pedroches. Peñamellera, Picón Bejes-Tresviso, Pido, Quesaílla, Quesucos de Liébana, Requeixo, Roncal, San Simón da Costa, Serrat, Servilleta, Sierra Morena, Tenerife, Teruel, Tetilla, Tiétar, Torremocha del Jarama, Torta del Casar, Trapo, Tronchón, Tupí, Urbiés, Valdeón, Valle de Alcudia, Valle del Narcea, Vidiago, Villalón, Zamorano

There are around 30 more where I’ve found at least one mention but I’ll have to search for each of these individually (once I have the complete list) to see if these cheeses really exist (at least currently) or are just a spurious mention in some online list.

Small experiment

Most of the time I’ve spent on this project has involved looking at various source documents from Spain, then with multiple methods of doing translations. Ultimately the point of all this is to build a large corpus of “pairs” (words or phrases in Iberian Spanish and English translation (or some kind of equivalent). Critically I also need to add some measure of how likely the pairs represent valid equivalents so the code (yet to be done) can attempt to establish the probability of the consolidated list of pairs being correct. And also it has to handle the ambiguity, for instance, very common with ternera (is this veal or beef or both? as it often seems to be used for both.) And the multiple and overlapping and contradictory terms for shrimp vs prawns vs langostines (the small rock lobster) is a strong example of confusion on menus.

So given I haven’t yet designed my corpus or the code in ingest new pairs into the corpus and then process the related pairs I have to do experiments, by hand, on a smaller dataset to attempt to visualize the challenges I will face when this is all done with code on a much larger corpus.

So I recently processed an extensive menu from a single restaurant in Granada and just before that two restaurants in Santo Domingo de La Calzada, La Rioja. By process I mean the mostly mechanical work of getting entire sections of menu text side-by-side in original Spanish and then the translated English. Then I look for untranslated terms or silly translations to try to find other sources on the Net (often recetas) to determine the correct correspondence, for instance, manos de ministro is NOT minister’s hands but a colloquial version of the more common manitas de cerdo, or pig’s trotters (feet).

So having done this I’ll provide a few results. In total I ended up with 277 “pairs” with 50 of those on both lists (and thus likely to be very common food terms from menus – see list below). The two restaurants in Santo Domingo de La Calzada contributed 132 unique pairs and the Granada restaurant contributed 95 unique pairs. The various terms in the list are sometimes not that specific to food, for instance:

  1. blanco and negro, colors but used as qualifiers of chocolate in menus; rosada (pink as a color) ended up being quite a chase when it referred to a specific fish.
  2. aroma or chocolate which are the same in Spanish and English but I include them even though it (and others like it) are obvious loanwords as a piece of code doesn’t just “know” this and has to be told.
  3. especialidad (specialties) or vinagreta (vinaigrette) or salmón (salmon) even though these are easy to guess, eventually an app doing translation still needs to recognize these terms.
  4. arrozcarnedulcehuevoleche, panpescadopolloqueso, salsa and vino that are used so much, not just in Mexican restaurant menus but even in TV ads we can effectively consider these loanwords into English now, but again, a computer program doesn’t know that and so still needs to have this in the corpus that will then be the key to its translation.
  5. I did try to consolidate terms that have alternate gender forms and/or singular/plural but didn’t do this as precisely and consistently as a really good corpus would require

While just findings lists of food/cooking terms is easy on the Net whether they are correct or apply to Spain is more problematic. Even a source like a dictionary should be taken with a small dollop of skepticism. Certainly asking any of the various voice assistants is not going to have a very high accuracy rate. So it is necessary to: a) try to focus on sources and thus pairs that are really for Spain and not somewhere in western hemisphere (unless you, Dear Reader, are planning a trek in Bolivia, then do as you need).

So that was my experiment and I end with this list of 50 pairs that are so common you’re very likely to run into them BUT even this list is not 100% accurate as there are various issues with translation (see previous posts).

Cover up the right-hand column and see how many of these you know.

a la plancha grilled
aceite de oliva olive oil
anchoas anchovies
arroz rice
asados roasted
atún tuna
bacalao cod
blanco white
café coffee
Cantábricas/Cantábrico Cantabrian
caramelizados caramelized
carne meat
casera/o caseras/caseros homemade
cerdo pork
chocolate chocolate
comida meal
croquetas croquettes
deliciosa/o deliciosas delicious
dulce sweet
ensalada salad
frita/o fritas fried
guarnición garnish
helado ice cream
huevo egg
jamón ham
langostinos prawns
leche milk
lomo loin (generically; or cured meat specifically)
miel honey
pan bread
patata potato
pato duck
pechuga breast
pescado fish
pimientos peppers
plato dish
pollo chicken
postre dessert
pulpo octopus
queso cheese
revuelto scrambled
salsa sauce
solomillo tenderloin or fillet
tarta cake, also pie
ternera beef (alt: veal)
tomate tomato
tosta toast
vainilla vanilla
verdura vegetable
especialidad especialidades specialty

Mystery post – pez/peces or pescado

My title contains some bits of useful information. While I’m not absolutely certain some sources say peces is the plural of pez. Of course in English the plural of fish is fish so peces seems relatively uncommon. pecado also translates to fish BUT the key difference is that pescado is the piece of fish on your plate and pez is the living animal.

I let Google Translate loose on my previous “mystery” post and it had three types of results: 1) a few of the words translated correctly, 2) some translated but to nonsense, and, 3) some were missed altogether. I’ve tracked a few of the latter.

My big list of words (with cognates or loanwords removed to avoid giving a clue) was a lengthy list of the names of fish, probably as they are called in Spain. I found two long lists on the Net with Latin (scientific names) as well as names in English, Spanish and some other languages. Both were European sources so less likely to include fish found primarily in South America, but who knows how lists get compiled.

Plants and animals from natural world (versus cultivated plants/animals) are frequently misidentified and very tough to get accurate common names. Sometimes even the scientific names are in dispute or contradictory so big surprise the more colloquial names are. After all who but ichthyologists, some fisherman and a few fish mongers actually know these names accurately and/or could just by looking at a fish decide what to call it.

So this is probably the toughest area to compose an accurate Iberian Spanish to English translation list. I’m going to have a third post in this series about the names I conclude are fairly likely but for now here’s a subset of the list from the mystery post that Google failed to translate at all.

alfonsino Golden eye perch
badexo Lythe or pollack
boga bogue
brama bream Pomfret
brotola de roca Greater forkbeard
calion Shark, porbeagle
callas Callas
capelan capelin
chicharro scad – also called horse mackerel
chincharro Horse mackerel or scad
choupa Black bream or porgy or seabream
chucla picarel
cigala crawfish Norway lobster – also called Dublin Bay prawn
colin Coley or saithe
côngrio conger eel conger eel – also called conger
coregono whitefish
escolano smelt – also called sparling
espadilla frostfish – also called silver scabbardfish
espadín sprat sprat – also called brisling
espárido sea bream
illiseria megrim
lanzon sandeel – also called sand lance
limanda dab
longeirón razor clam – also called razor shell
lucioperca pike-perch
lumpo lumpfish Lumpfish
maganto Dublin Bay prawn or langoustine or scampi
mendo Witch or Torbay sole
merlan whiting
mollera poor cod
muergo razor clam – also called razor shell
musola smooth hound – also called dogfish, flake, huss, rigg
pardete Grey mullet
pejerrey silver side, sand smelt argentine – also called silver smelt
pejesapo angler fish Anglerfish or monkfish
perlón Grey gurnard
pescadillo Hake
plegonero whiting
quisquilla shrimp prawn – also called shrimp
salton sandeel – also called sand lance
salvelino char
solla plaice

The left column is the Spanish (with at least one spelling error, don’t know which (chicharro chincharro) is actually correct). The middle column is the few that the Oxford dictionary recognizes. And the third column is from one of these two sources (here and here) which I originally used to compile the list (I found a third list with scientific (Latin) names but didn’t originally use it and haven’t (yet) processed it). I’m a bit surprised Google missed the names that are in Oxford as I’ve encountered some of these in other places.

Now note that even with some of the Spanish names “translated” there are bunches of fish on this list I don’t recognize and I suspect few people would. So probably only a small subset of this list (the names Google didn’t recognize, not the full list) would ever appear on menus.

The two longer lists, with scientific names, seemed to potentially be the most accurate lists but I’ve found others at some other websites. The trouble with these is the names may not relate to Spain and may be from other Spanish speaking areas. This is a very common problem trying to find and merge and consolidate lists from the Net. In addition what is the level of authority of anyone who provides a list – rarely is that known and I see enough mistakes in almost any list to shed some doubt on the accuracy of the information. But all that said I’ll be trying, in the next post, to produce the largest and most accurate list from the raw material I can find.

So stay tuned for the final result.

Wine Terms

In my last post I mentioned I was using several websites (and pages within those sites) that had English translations to extract side-by-side human English translation of the (presumably) original Spanish. OK, done – so what? Like I’ll be doing with all sources then I begin an extraction process to add pairs (words or phrases) of translations to my corpus. A key part of that also has to be asserted some measure of “certainty” whether the translation is correct. Using a probability type measure (0.0…1.0 obviously fits). Then the corpus analysis program can find as many of the same pair as it can and evaluate a new certainty, i.e. something like – lots of pair instances that are the same but possibly each low certainty may be as good as few of a pair with high certainty. An interesting question, then, is human translation (relatively rare) of websites (mostly menus) more reliable source of information than machine translation.

Of course the extraction process itself (which I do and therefore is subject to error) plays a role as well so I’ll use my small corpus of wine webpages to extract a set of pairs and then use any other sources of wine terminology to confirm/deny my pairs (just manually, so I understand the data, before trying to write code to do this). So here’s my result:    (scroll down past list for more of this post)

abierto 2 open
acerb 2 acerbic
acidez 1 acidity
ácido 1 acid
aciete esencial 2 essential oils
afinamiento 1 refinement
afrutado[s] 1 fruity
agradables 1 nice, pleasant, agreeable
alegre 2 zingy
amoratado 2 inky
amplio 2 big
añada 2 vintage year
arcillo 1 clay
armónico 2 harmonious
aromas 1 aromas
aromática 1 aromatic
barrica Bordelesa 2 Bordeaux cask
barrica 1 cask or barrel
beber 1 to drink
blanco seco 3 dry white
blanco 2 white
boca 1 literally mouth, but can mean palette in wine tasting context
bodega 3 winery
bodeguero 3 winemaker
bota 2 butt
botella 1 bottle
Botritis 2 Botrytis
brillante 1 bright
brotaciones, brotación 1 [not found] budding ? (derivative of brotar)
brotar 1 to sprout, bud
calidad 1 quality
campaña 1 growing period, season
campo 1 field
canela 1 cinnamon
cánones del clasicismo Riojano 1 classic Rioja style (not literal)
capa 2 layer
cata 1 tasting (action of)
cereza 1 cherry
cerrado 2 closed
clarificación 2 fining
clásico de Rioja 1 Rioja classic
comarca 1 region, district
complejidad 1 complexity
complejo 2 complex
corcho cork
cosecha 1 harvest, crop; vintage
crianza en barrica 4 aging in barrel
crianza en madera 1 aged in wood (literally, cask colloquially)
crianza 1 aging
cuerpo 1 body
dejo 2 aftertaste
denso 2 dense
depositos 4 deposits
dorado 2 golden
dulce 2 sweet
elaborado por 3 produced, matured by.
elegante 1 elegant
embotellado por 3 bottled by
embotellar 4 to bottle
en barrica 1 in cask or barrel
envejecimiento 1 aging (also laying down)
equilibrado 1 balanced
equilibrio 1 balance
especiado 2 spicy
espeso 2 thick
estructura 2 structure
evolucionado 2 evolved
expresivo 1 expressive
fermentación alcohólico 4 alcoholic fermentation
fermentación maloláctica 4 malolactic fermentation
fermentación 1 fermentation
final de boca 1 “finish” (literally end/finish of mouth)
final 1 after-taste
fino 1 fine
florals 1 floral
fresco 1 fresh
frescura 1 freshness
frutos cítricos 1 citrus fruits
fuerte 2 strong
graciano 1 red grape variety
grados 1 grade or degree (but alcohol by volume)
heces 2 sediment
hoja 4 leaf
hollejo 2 grape skin
joven 2 young (little or no aging)
jurado de cata 2 wine tasting panel
lágrimas 2 tears
levaduras 4 yeast
lías 2 lees
limpio 1 clean
maceración carbónica 2 carbonic maceration
maceración en frío 2 cold maceration
maceración 1 maceration
madera 1 wood
madura 1 ripe, mature
madurar 1 to mature
manchado 2 literally ‘stained’
manzana 1 apple
maridaje 1 literally marriage or combination; food matches/pairings
Mazuelo 1 red grape variety
mezcla 1 mixture, blend
mosto 1 must (grape juice)
nariz 1 nose (also aroma)
notas 1 notes
olores 1 smell (scents in corpus)
oro 2 gold
oxidación 2 oxidation
parámetros de calidad 1 quality indicators
pasa 2 raisin
pepita 4 seed
perfumado 2 perfumed
persistencia 2 persistence
pimienta 2 black pepper
postgusto (posgusto) 1 [not found] after-taste
prensa 4 press
prensado 1 pressing
pulidos 1 polished
rama 2 branch
recio 2 gutsy
redondo 2 rounded
refrescar 2 refresh
regaliz 2 liquorice
roble Americano 4 American oak
roble Francés 4 French oak
roble 1 oak (as in the barrels)
rojo 2 red
rosado 2 rosé
sabor 1 flavor, taste
sabroso 2 flavorsome
seco 2 dry
sedoso 1 silky
semidulce 2 semi-sweet
semiseco 2 semi-dry
sensación 1 sensation
suave 2 smooth
suelos 1 soils (also ground, floor, land)
tabaco 2 tobacco
tanino 1 tannin
temperatura controlada 1 controlled temperature
temperature de servicio 1 serving temperature, aka, best served at
Tempranillo 1 grape variety
terciopelo 1 velvet
típico 2 typical
trasiegas 1 decant (rackings in corpus)
untuoso 1 literally greasy (aka unctuous), but nicer means ‘smooth’
uva 1 grape
vainilla 2 vanilla
valores 1 values (as in levels of an indicator)
variedad 1 variety or varietal
vendimia 1 vintage, grape harves (whole process)t
vid 4 vine
vina 3 vineyard.
viñedos 1 vineyard, vines
vino blanco 4 white wine
vino de calidad (Quality wine) 3 Must come from a DO or DE. Only wine made from the free-run or lightly pressed juice of ripe healthy grapes, which has undergone a temperature controlled fermentation, qualifies.
vino de cosecha, or vendimia 3 Wines of a particular vintage year. In special cases, if the purpose is to improve the quality of the wine, a maximum of 15% of wine of a previous year may be added.
vino espumoso 4 sparkling wine
vino Fino de Mesa 3 fine table wine.
vino generoso 3 Special aged dry or sweet wines of higher alcoholic strength than table wines. From the Latin term for excellence. Sherries are vinos generosos.
Vino rosado 4 Rosé wine
vino tinto 4 red wine
vino 1 wine
Viura 1 white grape variety
viveza 1 vividness, strength
vivo 2 lively
yema 2 yolk
zarzamora 2 blackberry

I combined four lists. In MSWord I can use different colors and fonts for each list so when I merge them I can easily see where any pair came from, but here in WordPress formatting is more limited so the middle column indicates the source. My extracted list (from all those webpages I processed from both bodegas and restaurants) is 1.  I choose not to provide links for the other three sources, but 2 was certainly the largest.

I eliminated duplication and then used a simple notion of “certainty”. Items from list 1 that are shown here in bold had one or more identical (or almost identical) translation in one of the other lists. This isn’t particularly robust definition of certainty but it will do for this proof concept.

So of the 171 terms in the merged list (82 are from my manual extraction, the remainder from one of the other three lists) only 24 of my extracted terms get marked as “certain” due to occurring in other lists:

afrutado[s], barrica, botella, cata, cosecha, elegante, equilibrado, fermentación, final de boca, fresco, maceración, madura, mosto, postgusto (posgusto), roble, sabor, sedoso, tanino, untuoso, uva, variedad, vendimia, viñedos, vino

There could have been some more since I did not extract really obvious terms from my corpus, such as blanco or seco or dulce or uva. And two of the “confirmed” terms actually are in dispute. Once source admits afrutado is used for ‘fruity’ but this is actually wrong and the term should be frutal. The dictionary confirms afrutado does mean ‘fruity’ but this does not confirm it is the correct term to use in a wine context. Likewise it confirms frutal to be fruit or fruit tree but doesn’t mention how this would be a taste term for wine. So who knows? Which is right? Wine terminology (in English) sometimes contradicts the more common meanings of words since wine tasters understand a particular word in a particular context (and we amateurs just have to learn what they mean). So it’s certainly possible this source might be right BUT how would this ever be confirmed.

Likewise postgusto (clearly ‘after taste’ from context) doesn’t appear in any dictionary. And, in the other lists it appears but is spelled posgusto. Now I’m not sure if this meets the definition of neologism, especially as ‘post’ can mean ‘after’ (in this context) in English but doesn’t occur in Spanish whereas is ‘taste’ or ‘flavor’ so does this word actually exist (or get used in wine documents) and which is the appropriate form?

There was also some conflict between viñedos and vina.  Both are in the dictionary as vineyard but only vina is listed as vines. That is then potentially a flaw in my extraction of pairs since I saw viñedos clearly translated as ‘vines’ in a human translation, but, of course, that person may confuse these two terms.

The term I’m happy I was able to figure out (lots of examination of text to reach my conclusion) is final de boca. This literally would translate to ‘end of mouth’. but it’s more accurate to translate it as ‘finish’, which is actually one of those terms where its usage in wine descriptions has quite different meaning than its common meaning. And one of the lists pronounced that just final is sufficient for ‘finish’ which is one of the literal translations itself. OTOH boca itself has some ambiguity.  It literally means ‘mouth’ but was commonly translated as ‘palette’ in the human translations. That’s not any of the literal translations of ‘palette’. But, again, palette is a word that has different meaning in wine tasting context than its more common meanings.

So, this is all human analysis, with a lot of trial-and-error, back-and-forth, looking in dictionaries and doing web searches. In this contest of John Henry and the machine I think man will win so I really wonder how effective any AI (or just statistical analysis) can be. OTOH, ‘man’ needs to be a fluent Spanish speaker who participates in Jurado de Cata (wine judging panel) and I fall way short of that. But, still, what is the chance I can still produce the best list of wine terms freely available on the Internet? Pretty good, I’d say (given few are even trying).

 

Verbs again

In my previous post (about finishing initial processing of GallinaBlanca dictionary) I mentioned that verbs can be of some use in interpreting menus, possibly through derivatives of the infinitive form of the verb. So I’ve continued to do some digging in this area and have a few results to share.

Anticipating I’d be looking at verbs, independently of extracting them from the GB dictionary I used about nine online “lists” to compile an aggregate list. These verbs: a) may have nothing to do with cooking or cuisine, b) tend to be more commonly used verbs, and, c) may not be used (at all, or in same way) in Spain. So this is the list I’m calling C.

In the process of other searches I stumbled onto a culinary glossary. It has no connection with Spain and therefore the Spanish words might come from any part of the world. And as I worked with it more extensively and carefully I observe many of the issues with online resources of unknown origin: a) misspellings (probably, don’t want to jump to conclusion just because words seem to be misspelled), b) duplications, often including the singular and plural form, c) words that make no sense appearing in Spanish culinary dictionary (how did these drift in), d) inconsistent formatting and thus order (e.g. A la cazuela vs Cazadora, A la). In a previous iteration of my project I created a “glossary” by merging information from many sources and eventually it became a pisto (hotchpodge, if I can use that word in a non-culinary sense), especially losing any notion of whether the words applied to Spain or some other Spanish speaking area. So with these caveats I’ll call this list G.

And I have my list of verbs from the GallinaBlanca dictionary which I previously described. I’ll call this list D.

Now, simply, it’s too much work to compare the entirety of all three of these lists so I just did the subset (verbs only, of course) of verbs starting with A B or C. While this may be a biased sample it still reveals some interesting information.

Sorting the three lists together (with different fonts and colors for each list so I can distinguish) then I did manual processing to consolidate like terms together. As a result I ended coding each entry with GDC (or – if not in that list). So I generate the following table:

G– 44
-D- 4
–C 35
GD- 28
-DC 1
G-C 9
GDC 5

There are 126 verbs that appear in at least one of these lists. Only 5 verbs appear in all three lists. The list with the largest number of unique verbs is the G (glossary, 44), which thus indicates this is potentially very useful as it adds over 50% more verbs than I had previously found.  The verbs in the C (common) list may have nothing to do with cooking or food (we’re explore that later in the post) so this may not add much. Only 5 verbs from the GallinaBlanca list don’t appear in the glossary list so whoever compiled that got most of the cooking verbs.

So looking at the verbs that are only in the C (common) list and not in either cooking related list we do see a few surprising omissions (I’m assuming that these are SO common no one bothers to include them):

abrir –C to open; to turn on; to whet (as in appetite)
agregar –C to add
añadir –C to add
beber –C to drink
calentar –C to heat, heat up, warm up; to inflame
cocinar –C to cook
combinar –C to combine, mix; to put together, match, coordinate
comer –C to eat; to have for lunch; [Latin America] to have for dinner
concinar –C not in any dictionary, probably misspelling of cocinar
convertir –C to turn into, convert into, change into, make
cortar –C to cut, cut off, carve, slice, cut out; to chop; to cut (dilute sense); …

So out of the 35 verbs in the C (common) list only I’d probably include these 11 in a general purpose culinary list.

Now some of the verbs in the G (glossary) don’t appear to be useful. Some have no definition in any of the dictionaries I routinely use, including the most authoritative of the Spanish language (which is NOT limited to Spain so could include verbs that don’t get used in Spain).  So here are a few I’d consider dubious to include in a culinary glossary:

achicalar G– [Mexico] to cover in honey; soak in honey
añejar G– to age; [vino] to mature; to get stale
apanar G– to coat in breadcrumbs (also EMPANAR or EMPANIZAR)
apuntillar G– to finish off (a toro); to round off
ataviar G– to dress up
bardar G– to thatch
blanchir G– (not in dict) Wiktionary has it as a French term for make white
bresear G– (from glossary) To cook to slow fire, during long time, with condiments (generally vegetables, wine, broth and spices). Clearly a spelling error since not found.
cantar G– to sing; to crow, chirp
caramerizar G– (not in dict), another spelling? [from glossary] Spread a mold with sugar honey.
castigar G– to punish; to ground, keep in; to damage, harm
cerner G– to sift, sieve (same as cernir, which is it?)
chapurrar G– to speak badly

I wouldn’t include achicalar as it doesn’t appear to be used in Spain but this is a good point about my goal here. If I wanted to know the Spanish word, used in Spain, for an English word, I wouldn’t include anything that may be only used outside Spain. But my goal is asymmetric – to translate Spanish (on menus) only into English (so I can choose) so including a word in my corpus (and eventually my app) that is not likely to be used in Spain is not a problem (I do need metadata to note this however, for that term). If I never see the term it does no harm to never have it found in any lookup. OTOH, it would be a problem if I’m trying to translate English into Spanish, as in don’t use a word not found in Spain. It appears, for instance, frijoles, which is well-known to most in USA who visit Mexican restaurants is one such word, not commonly used in Spain, but possibly likely a Spaniard would know the word. That might lead to a scene (from The Way) like no tapas in Navarra, only pinxtos, and thus make you look foolish.

blanchir (to make white, which isn’t exactly synonymous with blanch but one might assume that’s what this means) was interesting in that it did not occur in any dictionary but did have an entry in Wiktionary. The standard term  for blanch is palidecer (purely in the sense of turn white) and escaldar or blanquear for the culinary sense. I suspect  blanchir might be used somewhere (possibly Puerto Rico) where it is just the cognate of the English verb. But, again, in collecting the corpus I should not make judgments like this although I might add metatext to an blanchir entry and meanwhile add it to corpus and then let the “big data” statistical analysis decide if this is a word or not.

bresear really looks like a misspelling (more likely to be brasear, to barbecue) but again it should go into the corpus with metadata notion rather than my passing a judgment on it (IOW, only a real expert in Spanish should be decided what to include or not in any translation dictionary, so if I find only one instance of a misspelled word it will get washed out since there are few occurrences of it in the corpus; OTOH, maybe people do commonly misspell this word so it needs to be in my app). caramerizar appears to be some variant of caramelizar, again perhaps used somewhere and not just a mistake. cerner has exactly the same definition (in the glossary itself, but also spanishdict) as the more common spelling cernir, although both appear in reverse lookup of ‘to sift’ in spanishdict (which is it, then? just a common confusion?) cernido is a possible term to see on a menu so it matters that my dictionary could spot this as past participle of cerner.

So again all this goes to show the work that must be done to really develop a very accurate dictionary that drives my app for menu translation (or to be published as a carefully researched culinary glossary).

 

 

 

How to use collected menus

I use this blog to document a project I’m doing which is to obtain an accurate and comprehensive set of terms (isolated words and phrases) to feed a smartphone app so I can “read” menus in Spain. To do this I am first collecting menus on my virtual “trek” (translating miles on a treadmill to position on the Camino de Santiago) and using Google map’s POI to find restaurants and then process those that have websites with some form of menu I can just extract (don’t want to be typing from images and make all those mistakes).

Most of the menus are in Spanish (rarely I can find one that is dual language, and even then: a) their translation may not be so great, and, b) the English menu may not be the same, so this can be tricky). So I use either Google translate (if the menu is standard HTML webpage) or some tedious copy-and-paste to use spanishdict.com (really Microsoft) to translate. Of course these machine translations are often not that great (both wrong and miss many terms) and that is a big issue.

Doing this process is fairly mechanically tedious but doing it slowly also gives me a chance to really observe what is going on (plus get a bit of drill on words, my short-term memory of some Spanish terms is increasing, but based on past projects I know I’ll retain little of that). And, as I’ve documented in some posts occasionally menu items complete befuddle the machine translation which sends me off trying to figure it out myself, an interesting challenge since I have next to zero fluency in Spanish.

Now it is important to note my goal. Learning to speak and hear Spanish is entirely different, especially if you want to have conversations about almost anything (even if still oriented toward travel). I just need to be able to read menus (at least for my limited goal) and choose what I want. And I don’t need to translate in the other direction, so knowing whether ‘mushroom’ is hongo or seta doesn’t matter as much as going the other way.

And, of course, this also does imply knowing something about cuisine in Spain (which can be quite different than what we might encounter in restaurants in USA that happen to use Spanish on their menus). And it is turning out to require knowing something about agriculture in general in Spain, especially in different regions. An ingredient, like chorizo is: a) quite different than the Mexican style chorizo I’d find in markets or restaurants here, and, b) somewhat different in different regions in Spain as each has its own traditional way of making something like chorizo.

So after extracting menus from websites with some sort of translation I end up with side-by-side menu items, like below:

Gambas a la Plancha Prawns on the Plate
Setas a la Plancha Grilled mushrooms
Espárragos Especiales “Dos Salsas” Special Asparagus “Two Sauces”
Ensalada Templada con Gulas y Rape Tempered Salad with Gulas and Rape
Cogollitos de Tudela con Anchoas y Salmón Tudela with anchovies and salmon
Tabla de Ibéricos Iberian Table

I choose these particular items to make a couple of points:

  1. Notice that a la plancha occurs in two consecutive entries and given gambas are prawns and setas are mushroom that means there are two different ways, to both parse and assign a tentative meaning to a la plancha (either ‘grilled’ or ‘on the plate’ (more literal). So what does it really mean? Answer, btw, is that plancha is really “iron” which means a cooking device, either pan or typical restaurant flattop is used to “grill” the item.
  2. In the fourth item gulas appears (and didn’t get translated) and rape is quite ambiguous (is it the English word and therefore shouldn’t be translated or is it a Spanish word that means something entirely different?). gulas are baby eels (or possibly synthetic “worms”, like the fake crab) and rape is a type of fish with more than one translation (monkfish, anglefish).  So how can I use information like this?
  3. Cogollitos de Tudela got translated just to Tudela (the other words in this item are easy to match the Spanish and English). This is actually a flaw (I believe) in Google translation process. Cogollitos is looked up to get “A small heart or flower of garden plant” (or sometimes, just ‘buds’) and Tudela doesn’t appear in any dictionary but turns out to be a town (really just a reference location) where a particular type of lettuce (looks like Romaine) is grown and when served at restaurant the inner leaves are used (often in very attractive presentation). So this is a fairly classic ingredient and dish, especially in northeastern Spain but translation isn’t going to help much. So, a) how certain am I that I’ve figured this out correctly (or even how would I put some certainty on it, like how many different sources I found that confirm my guess at what this is? versus any counter-evidence), and, b) how should I use this information in my corpus.
  4. And what is “Iberian Table”? (a valid literal translation but not helpful). Now doing even a little research on menus one quickly learns that Ibéricos almost certainly refers to a prized pig but how is it connected to Tabla? Sometimes one has to be careful here as I’ve already found an instance where silla (literally ‘chair’, but in the context, really ‘saddle’) refers to a cut of meat so maybe the same is true with tabla? IOW, there is quite a lot of uncertainty here BUT this could be an important item to know.  I suspect, BTW, it’s just a plate with some ham or other cured pork, like an antipasta.

So there are several steps in studying menus:

  1. the mechanical part of getting the Spanish aligned with some sort of translation to English
  2. studying the results for what appears to be clear one-to-one correspondence in terms. But beware – on this single menu both hongos and setas translate to mushrooms? Why are there two difference words (previously hongos had shown up as primarily used in Latin America, not Spain, but obviously this menu contradicts that). And if there is a difference (i.e. they’re not just synonyms) what is it. I have vague evidence hongos refers to cultivated button mushrooms and setas to wild mushrooms (like shiitake or others). That is a big difference.
  3. Some items translate very little and therefore can I find other sources to determine what these items might be? (sometimes yes, sometimes no) And even if I figure out what a word (e.g. Cameros from yesterday’s post) or phrase (a la riojana from yesterday’s post) is, these are not literal translations so how do I mark these. For instance I believe  refers to the mountains in southern Rioja and therefore potentially a breed (or just the husbandry of) sheep that would be recognized as distinctive (like Waygu beef). If I figure this out: a) what confidence do I put on this information, and, b) how to I encode this information in my corpus.

Once a corpus is obtained the assumption is a kind of “big data” can help figure all this out (I haven’t quite figured out what code I’ll write for this, Google claims complex deep-learning AI as their method of training their translation and I don’t have the resources for that approach). But my assumption is that everything in my corpus will have multiple entries and some a lot of entries. So in conjunction with my placing some sort of “certainty” weight on each pair and matching up pairs across a large data space some sort of overall certainty can be derived (probably with a lot of exceptions that have to be looked at my human evaluation which Google says they never do, which also might explain some of their odd translations).

So, just to finish this let me provide an example. From this single menu I extracted (manually, can’t quite imagine how to do this in code) the following table of “pairs” where I’m relatively certain these are correct. IOW, these are mostly just the terms derived via literal translation not the more complicated cases where a lot of guessing is required.

Note: more discussion after this table, please scroll down.

a la Plancha Grilled; on the Plate Lechal Baby lamb
a la Vinagreta Vinaigrette Lenguado Sole
Agua Water Limón Lemon
al Horno Baked Macarrones Macaroni
Albóndigas Meatballs Menestra Stew
Anchoas Anchovies Merluza Hake
Arándanos Blueberries Milhojas Fillets
Arroz Rice Mixta Mixed
Asado Roasted Oveja Sheep
Bacalao Cod Pan Bread
Bebida Drink Patatas Potatoes
Berenjena Eggplant Pato Duck
Bistec de Ternera Beef Steak Pescados Fish
Calabacín Zucchini Pimienta Pepper
Calamares Squid Pimientos Peppers
Carne Meat Postres Desserts
Carrilleras Cheek pieces Precio Price
Cerveza Beer Primeros Platos First courses
Codillo Knuckle Puerros Leek
compartir share Pulpo Octopus
Cordero Lamb Queso Cheese
Croqueta Croquettes Rape Anglerfish
de la Abuela Grandma’s Rebozado Coated
de la Casa of the House Refresco Soda
elegir choose Rellenos Stuffed
en su Tinta in ink reservas reservations
Ensalada Salad Revuelto Scrambled
Entrantes Starters Rojo Red
Espárragos Asparagus Sabores Flavors
Fresco Fresh Salsa Sauce
Frutas Fruit Setas Mushrooms
Gambas Prawns sobre on
Gaseosa Soda Solomillo Sirloin
Guisado Stew; Stewed Tarta de Queso Cheesecake
Helado Ice cream Tomate Tomato
Hongos Mushroom Trucha Trout
Huevo Egg Verduras Vegetable
Incluye Includes Vino Wine
Jamón Ham Yogurt Griego Greek Yogurt
Judías Verdes Green Beans

So a single menu provided a significant (about 80 items) source of raw material to feed into my corpus. Now I’ll just note a few things as to whether further processing should be applied to this list before adding it to a corpus (or, IOW, what metadata should also be embedded in the corpus).

  1.  Judías Verdes ‘green beans’: Should there be an entry verdes as ‘green’ and judias as beans? Now in Spanish adjectives match their noun in both number and gender so verdes might not be the lookup dictionary form for ‘green’ (it’s not, the singular verde is). So that could introduce some confusion in the corpus. And ‘bean’ has multiple translations which often one word being used for the dried beans (or the seeds in the bean pod) versus the whole bean, as in typical green beans.
  2. What about Guisado ? These had two literal translations: ‘stew’ and ‘stewed’ by Google. And in English those are not the same thing even though they’re related. guisado is the past participle of the verb guisar which can mean either just simple ‘cook’ or also ‘stew’.  The context in this menu for the two uses of guisado are “Cordero Guisado” and “Cordero Guisado con Pimientos” so why is Google convinced it’s ‘stew’ (the noun) and ‘stewed’ (the conjugated verb) in these two contexts. Is it right?
  3. Another thing I noticed is that often the English translation doesn’t match the Spanish in number. Figuring out plural and singular forms in a corpus analysis process could be interesting, so putting in an incorrect corresponding pair could be problematic.
  4. And, finally (for today) nouns probably fit into a literal translation mode easier than other parts of speech, or especially colloquial usage, so trucha as trout is fairly high certainty but what about mixta as ‘mixed’? It was used in the context of ensalada (salad) and that item appears to be a typical mixed salad (often “house” salad in US restaurants) but the literal translation of ‘mixed’ would be more likely  variado or diverso; mixta doesn’t occur in lookup dictionary at all, but mixto does in the sense of mixed of both sexes (i.e. a group of people), so why did the salad menu items decide to use feminine form or even mixto at all?

So there are lots of challenges, both extracting the raw data itself, assigning some metadata to the pairs to qualify how they should be treated in the corpus and especially assigning some certainty value (i.e. like a probability, where 1.0 would probably never occur (there is always some ambiguity) and 0.0 is meaningless to even include BUT maybe a single scalar value is insufficient since it’s possible to have high incompatible, in not even mutually exclusive, interpretations).

So all of that is a lot of design work to do and then probably an iterative process once I get some code that can crunch the corpus (thus far, I’ve done some by hand to look for design issues). And, fundamentally, is this even a process I can automate at all or at most the code just brings together related pairs for me to analyze with my intelligence.

Who knows, time will tell.

p.s. [personal]. Doing this mechanical work (and some background study as I go along) and also writing these posts is definitely cramming some Spanish into my brain, but I also know that’s a short-term effect. A year from now I’m not going to remember guisado is the past particular of guisar or that it is related to stews/stewing (as cooking process). So converting this work into: a) a more permanent and usable form (like a smartphone app to carry with me to Spain), and/or, b) creating some drill programs so I could “brush up” just before leaving has a more useful effect.