Back to work – lists

As I don’t have any more travel planned I can get back to work, perhaps with a renewed effort. So I returned to looking at lists, at least three I’ve found and with more to go. Lists come as: just translation of terms in English and Spanish, glossaries and dictionaries where dictionaries supply an actual definition and glossaries sometimes just provide translation (where literal is possible) or definition otherwise. The Net is full of these but using them can be a challenge. Also I’ve usually looked only at these lists where the terms are Spanish but the translation or definition is in English. It’s more interesting, although more work, to get the lists entirely in Spanish. And ideally as apply to Spain rather than anywhere Spanish is used.

So in my first attempt to build up a translation dictionary I only used lists I could find. It never dawned on me to use purely sources in Spanish and in particular menus, but of course machine translation has advanced a lot since my V1.0 attempt years ago so now sources entirely in Spanish and especially as applied to Spain are my primary sources.

But lists provide a lot information in a hurry. And despite the issues they often provide terms that are unlikely to be found elsewhere. But the biggest issue is that whole thing of Spanish throughout the world versus Spanish gastronomy terms for Spain. As I’ve mentioned tortilla is common in western hemisphere but something entirely different than you’d get in Spain even if the menu does say tortilla patatas. Now where lists might include New World terms not used in Spain it’s just a waste of time, at least for my purpose to process them. But when they conflict in meaning between Spain and elsewhere that is a problem.

So I’ve been crunching through three lists. Finding more lists is a lot easier (at least until I’ve found most of them) than processing the lists, especially when the lists are entirely in Spanish. Plus some types of webpages are hard to “mine” (also known as scraping when code is doing it). Web authors design pages to be most useful for their intended audience and not for someone accumulating a corpus. And even when I’ve processed lists I have to be careful with the whole copyright issue. If I published (except in the fair use case, i.e. a small sample with attribution) any substantial portion of any list I find that is improper. But since my real notion is accumulating a large corpus from many sources and then basing my final translation vocabulary on a meta-analysis of many sources I think I should be OK. Also whenever I only have a term translation from a single source I need to be suspicious of the accuracy of that as well.

So thus far I’ve looked at: 1) the Gallina Blanca Diccionario which is from a website in Spain representing a food company producing packaged products for Spain markets and supplied the diccionario to aid their users of the recetas they also provide; this has Spanish terms and definitions in Spanish but does not apply, at least exclusively to Spain; 2) Nitty Grits, a glossary with Spanish terms and English definitions, not exclusive to Spain, but as I learned after crunching through most of it each term is clickable and often (not always) then indicates where this term is used; Nitty Grits is a large list and allows me to get fairly unambiguous definitions (since they’re in English) and avoid the often incorrect machine translations (such as occurred in Gallina Blanca); and, 3) now I’ve return, since doing some work by in May to a complex website, ARecetas, a recipe site that then has multiple glossaries especially the largest and most directly useful, Glosario de Alimentos.  And there are more I’ve found but haven’t yet crunched through at all. Of these ARecetas glosario is the hardest to process so I only briefly looked at in May and instead focused on Nitty Grits. But for several months Nitty Grits was not operational (at first I thought they might have blocked me but that was not the case).

Anyway now I have more issues having finished two of these sources and now resumed work on the third. First, the way I’ve extracted information (often a tedious process) is inconsistent between the three lists (meaning the tables I created in MSWord manually). Second, my notion system was inconsistent, i.e. I annotated much of what I found with no particular notation as to what is original source text and then my annotation. These issues meant I can’t possibly consolidate the three lists manually. So I had started some code to create a consistent format across all lists (in XML which is more robust than just text in MSWord with a few fonts and colors). I was able to do Nitty Grits fairly easily but ARecetas and GallinaBlanca are toughter, i.e. it’s not just code I need, but I have to go back to the manually compiled lists and use consistent inline markup so the code can parse all entries to the common XML I want for all three lists.

Now I need to finish ARecetas (and perhaps also some other smaller sites I found and also do a thorough job of searching) before moving on to the real world. Once I can convert each list, with my annotations and markup, to a consistent XML structure then I can attempt a “merge”. Once that is done I can then look for agreement or disagreement between the sources (as I processed them) and start fixing errors or doing more searching to get more accurate answers (although without wasting much time on non Spain terms).

People who compile lists usually have some other work. They usually want to get their list with minimal effort to achieve their purpose. Simply put, this means they make mistakes, sometimes even blatantly obvious to simple analysis, sometimes more subtle. I’m well familiar with this from my career, a concept of “good enough”. No compilation of information is ever perfect anyway so it’s more a question of how good does it need to be for the intended purpose versus how much work (usually measured as cost since some paid person is doing the work). So online lists have many flaws. And it’s not just online lists. I’ve bought a few books about food in Spain back in my V1.0 effort and these books have inconsistencies and errors (where error means they disagree with other sources). I’ve looked and I’ve never found a “best” or even highly accurate and comprehensive source.

And that’s part of why I’m even doing this project. Unlike the other people creating materials, either free on the Net or in for-sale published works I don’t have a cost issue with my work. As I’m retired and unlikely to ever even be a temporary consultant the marginal value of my time, measured in money, is zero. Therefore I can spend an infinite amount of it trying to be as accurate and comprehensive as I can be, even (and that would be fun) doing original field research, i.e. actually going to lots of restaurants in Spain with some consultant I could hire who’d be fluent in Spanish and cooking (then the bills do add up). So at least my “free” effort is just a question of how much work I wish to put in it.

So I do believe, despite my lack of fluency in Spanish language, it is feasible that I could compile the best list, meaning the most comprehensive and accurate. Of course my list would have mistakes too but I think it could be better than any I’ve seen. AND, if I write good code to does the bulk of the work consolidating the raw materials for my corpus and then extracting I should have an easier time making corrections, especially as my targeted application is either machine-generated webpages or a smartphone app, i.e. updates should be possible once I actually get feedback (too many sites or apps fail to take advantage of the knowledge of their users to provide very valuable feedback to constantly improve the product, either its usability or its underlying database of Spain culinary terminology.

So I hope to get back into it and finishing these three lists would be a critical milestone because then I can really get down to designing my corpus and the code for importing and consolidating and proofing the information in the corpus.

Advertisements

cata de vinos

I’ve been spending a lot (too much?) time trying to mine Spanish terms associated with wine. Discovering a large list of these is only somewhat useful for reading menus in Spain which is the primary purpose of my project. But sometimes you look where the light is, not where your keys are (this is a cliche in USA, perhaps not obvious to others).

Anyway cata de vinos is not quite what it says literally. The literal translation is simple – ‘wine tasting’, something rather obvious that any of us do when we drink wine, at a restaurant or at a party or wherever. BUT, there is a more formal meaning which is spelled out in this Spanish language Wikipedia article.  This is the kind of tasting “professionals” do to write all those articles (or a description of a particular wine on a menu) in all that wonderful (and frankly somewhat snobbish) wine jargon.

Any kind of tasting that involves comparative analysis requires training but also requires a vocabulary that can be fairly precisely defined and used by different tasters in the same way. We amateur wine “tasters” often don’t really know these terms.

I was surprised to find a number of fairly detailed sources, in Spanish (both the terms and definitions) covering “official” cata de vinos. While many of these terms would not have a precise (or sometimes any) meaning to us amateurs it’s still worthwhile to attempt to dig them out.

So this has been a long duration for me doing this since I found such rich and extensive, but difficult to process sources. By now I’d hoped to provide a more complete post on this subject but I’m still not done so this is just a fragment to demonstrate some of the issues of decoding vocabulary like this, especially for a non Spanish-speaker.

The source I’ll discuss here is Vocabulario del Vino that is reached by the Glosario tab at a site © 2011-2017 Enominer.  Try as I have I can’t actually figure out who/what Enomier is! (no translation I can find)    It is a web domain name as per https://www.enominer.com/ but it doesn’t have an About… to actually figure out what this is. I suspect it’s a publisher of magazines about wine but that’s just a guess. The page name containing the glossary is diccivino.html which, again I’m guessing, I think just a contraction of diccionario and vino. And in the many searches I’ve done trying to expand on the definitions here I seem to have encountered very similar lists at other URLs so despite the © at this site (no idea if it really is their copyrighted material or a copy from elsewhere) some/all of this glossary is published elsewhere on the web. Which, btw, doesn’t help me when I search to just find what I already have as text from this glossary. The sub-heading under the name at this site just says:

cultura del vino, desarrollo rural y ciencias de la tierra Wine culture, rural development and Earth sciences

As explanation of their glossary the webpage explains that it is presenting a formal terminology.

Toda ciencia o materia cuenta con un conjunto ordenado y sistemático de términos y de su correspondiente significado.

La viticultura y la enología no son una excepción.

Aún siendo comúnmente admitido que la cata de vinos es una acción de los sentidos que aprecian sensaciones de aromas y sabor con un contenido más subjetivo que objetivo,
no es menos cierto que hay un conjunto de normas y reglas no escritas que permiten traducir las apreciaciones sensoriales que influyen principalmente en la cata de un vino (vista, olfato y gusto) en valores que pueden comprobarse de una forma objetiva.

All science or matter has an ordered and systematic set of terms and their corresponding meaning.

Viticulture and winemaking are no exception.

Although it is commonly accepted that wine tasting is an action of the senses that appreciate sensations of aromas and flavor with a more subjective than objective content,
it is no less true that there is a set of rules and unwritten rules that allow the translation of sensory appreciations that influence mainly in the tasting of a wine (sight, smell and taste) in values ​​that can be checked in an objective way.

They divide their glossary in four sets:

Términos relativos al color Color-related terms
Términos relativos al aroma. Terms related to the aroma
Términos relativos al sabor. Terms related to taste
Otros términos. Other terms

So I’ve been churning through these using both Google and Microsoft to do the translations. So as a fragment of this work here are a few terms (from the sabor/taste set under R):

rancio

Vino oxidado, licoroso y seco. Es un defecto en los vinos de mesa, pero no en los vinos generosos.

stale Rancio

Rusty, dry and dried wine. It is a flaw in table wines, but not in generous wines.

Oxidized wine, liqueur and dry. It is a defect in table wines, but not in generous wines. 

Purple text is the Google Translation and black text is the Microsoft (inside MSWord translation). Note that Google doesn’t translate rancio to ANY English word. This has been common in analyzing the cata terms as many don’t seem to have a direct English equivalent and thus require a lot of research to make a guess. Microsoft picked ‘stale’. Looking at my usual two online dictionaries, spanishdict.com and Oxford I get a variety of English terms for rancio:  rancid (the obvious cognate), mellow (interesting this is the wine sense), ancient, long-established, stale (bread sense), antiquated, old-fashioned, sour and unpleasant. That’s a lot to choose from to decide what rancio means in the cata sense; IOW, how would a professional taster apply this term and if they were also fluent in English what English term would they use?

So we look at how it is defined. In the first phrase of the definition:

Vino oxidado, licoroso y seco.

Google and Microsoft have some significant difference. MSFT translates oxidado as ‘rusty’ (a valid dictionary literal translation) but Google uses the more appropriate ‘oxidized’. Even a somewhat amateur taster like me is familiar with ‘oxidized’ as a flaw in wine and ‘rusty’ is a chemical oxidation process but not likely to really apply in this case.  Likewise for licoroso  MSFT and Google disagree and in my research I think both are wrong (although Google’s liqueur  is closer.  licoroso is a concept that doesn’t really have a single English equivalent, only a definition which is ‘strong; of high alcoholic content’.

So we still haven’t quite got this figured out but the critical clue lies in the next sentence and the words vinos generosos. Both Google and Microsoft translate this literally (generous wines) BUT in this case this is a very specific word pair that really means a type of sherry as explained in this source which indicates generoso is a regulated term of Consejo Regulador.

Now actually this issue (sherry versus table wines) has occurred many times in studying the cata vocabulary.  I’ve learned that Spain is actually the leading wine producer (by volume) in the world, surpassing both France and Spain and also easily California (which as a former citizen, to me, is US, when it comes to wine). Simply put the fortified sherry wines are quite different from the lower alcohol table wines and thus tastes, aroma (bouquet) and color attributes can be quite different.

So in this case this source is telling us that an acceptable (possibly desirable) taste in sherry is not attractive in table wines BUT it is hardly the same as rancid (I doubt even in sherry this is good) or oxidized or any of the other translations of rancio. So if I were forced to pick an English equivalent I would go with ‘mellow’/’ancient’. And this shows the problem – these words don’t really describe this taste but none of the other translations do either.

In short, especially trying to understand the specialized vocabulary of cata de vinos you really have to have experience tasting, in Spain, in the context of all the wines available in Spain. It’s basically not possible to translate this over to English.

And since rancio looks a lot like rancid so a non-Spanish speaker who saw this as a term describing a wine it’s unlikely they’d try it, which, according to this, they shouldn’t if it is table wine but should if it’s sherry.

I had planned to discuss several other R taste terms but this post is already too long so I’ll merely mention one more:

retronasal

Es el aroma de menor intensidad que el olfato que se percibe por vía interna desde el paladar cuando respiramos por la boca con una pequeña cantidad de vino en la cavidad bucal.

Aftertaste Retronasal

It is the aroma of less intensity than the smell that is perceived by internal way from the palate when we breathe through the mouth with a small amount of wine in the oral cavity.

It is the aroma of less intensity than the smell that is perceived internally from the palate when we breathe through the mouth with a small amount of wine in the oral cavity.

Again the stuff in purple is Google’s Translation. Interestingly Microsoft actually picked a translated English word (aftertaste) for retronasal. But to my eye retronasal doesn’t even look Spanish at all and thus might be a loanword from English. In fact it is. But what does it mean? Actually finding a description of this in English wine tasting sources shows approximately the same thing as the translation (almost identical between Google and Microsoft) of the definition.

The funny thing is I didn’t know what retronasal meant BUT I’ve actually done exactly what it’s definition describes (if I was told this term I’ve forgotten but I don’t believe I ever knew it). Not long after moving to California and just as California was becoming a major player in wine (hard to believe it once was poorly regarded, decades ago) I took a course on California wines and how to do tasting at a community college in the Bay Area. We were actually taught how to do this – take a sip, hold the wine in your mouth, open your mouth slightly and breathe in. The sensation one gets is entirely different than just tasting (mouth closed) or the aftertaste (breathing in after swallowing). And if you’ve ever watched a professional tasting you see the tasters doing this (and of course, also spitting out the possibly very expensive wines they’re tasting).

Anyway this diversion in my project has taken a lot of time and hasn’t provided a great deal of material to put in my corpus for my menu translation app but it has certainly provided a lot of opportunity to see challenges in translation.

So I’ll leave you, Dear Reader, with a couple of quiz questions.

aguja

Vino con contenido carbónico perceptible al paladar y visiblemente observado al descorchar la botella. El gas carbónico procede de su propia fermentación y da sensación picante y agradable

needle

Wine with carbonic content perceptible to the palate and visibly observed when uncork the bottle. Carbon dioxide comes from its own fermentation and gives a pungent and pleasant feeling

quebrado

Vino alterado por las quiebras, que afectan al color.

broken

It was altered by bankruptcies which affect the color.

What English equivalent would you use for aguja and quebrado?

And there are about 50 more of these just in this source!

 

A few terms from ensaladas

I’m continuing to extract terms from a large set of recetas, having switched from postres (desserts) to ensaladas (salads).  Now thinking about salads there is a lot more diversity than merely leafy green stuff with some dressing so this is another lode of terms to find and add to my corpus. So here are a few fragments I’ve found:

Ensalada de verdinas con perdiz escabechada, receta fácil Salad of verdinas with pickled partridge, easy recipe

As usual terms that Google Translate doesn’t translate or has silly answers catch my attention, so what are verdinas? Oxford has an entry that translates to ‘moss’ and it’s plausible a salad might include moss. But this is what makes this source so useful, it’s not just titles of dishes, but the full recipe (ingredients and instructions) and a photo of the dish. In this case the photo reveals the clue to verdinas, showing a bag of alubia verdina which are called Verdinas De Nuestra Tierra in the ingredient list. IOW, since I’ve seen alubia often this is just a specific type of bean (visible in the photo) described here.

So moving on:

Remojón  granadino, receta fácil para el Verano Remodo  granadino, easy recipe for summer

Why Google Translate translate remojón to ‘remodo’ remains a mystery as I can’t find any association. Oxford literally translates remojón to ‘soaking’ and granadino to ‘of Granada’ which doesn’t help much. Fortunately this has no English equivalent but is

a specific recipe with oranges, cod, onions, tomatoes and olives, soaked in olive oil for at least four hours.

so an item like this has to be entered in my corpus with a “description” rather than a translation.

And finally:

Salpicón de bogavante con vinagreta de su coral Lobster salty  with vinaigrette of its coral

So we have two mysteries here: 1) what is a ‘salty’ (presumably the translation of salpicón), and, 2) what is ‘its coral’ (untranslated from coral in the Spanish)?

salpicón is the easier one since it’s a particular preparation of “chopped seafood or meat with onion, tomato and peppers” described here so ‘salty’ is a mysterious translation and inaccurate.

Salpicon (or salpicón, meaning “hodgepodge” or “medley” in Spanish) is a dish of one or more ingredients diced or minced and bound with a sauce or liquid.

But to figure out coral required looking at the recipe which fortunately describes it thusly:

the contents of the inside of the head (of the lobster) and the dark colored matter that is full of flavor

While I couldn’t find any English equivalent for coral (or any definition that matches the recipe) I believe this is a delicacy that some adventuresome foodies like. Now I’ve use the heads of shrimp and their shells to make stock so I suppose this is the same but this sound pretty yucky to me, which means if I had this salad and quite possibly enjoyed it I’d rather not know what coral is.

As the last tidbit the recipe text also includes two interesting terms:

  1. brutal bogavante which Google translated to ‘brutal lobster’. What’s this, some lobster with monster claws that fights back? Actually Oxford did explain that brutal has a colloquial meaning of ‘incredible’ or ‘amazing’ which is a lot more appealing (and reasonable guess at translation)
  2. and un platazo which didn’t appear in any dictionary but was found by search in an obscure (scanned) old text as ‘great dish’ which does fit the rest of the context so also is a likely translation.

These “guesses” I sometimes make have some amount of likelihood of being correct. I’m fairly certain of something like verdinas as a type of bean, but it is a guess and therefore has to be entered in my corpus which some uncertainty. And brutal and platazo have even less authoritative evidence and so would have higher uncertainty.  The Google Translates corresponding English to Spanish also can not be viewed as “certain”. Probably only translations appearing in one of the authoritative dictionaries can be entered as p=0.999 in the corpus. So getting as much volume as possible so every term in the corpus has multiple instances will be key to getting the best possible translation dataset.

 

A blogging dilemma

I’m using this blog (partially) to “document” interesting tidbits I encounter while doing research for my anticipated smartphone app to translate menus in Spain. That app needs to have a comprehensive and accurate dataset to use in the translation, not just the equivalent English term (which doesn’t always exist) but also some description. For example, what is sobrasada? Yes, it’s ‘sausage’ but saying that (or even ‘spicy pork sausage’) doesn’t tell you very much.

So I’m using various sources to build up a “big data” corpus which will have translation errors and other errors. But algorithmically I can extract from that corpus what I’ll need to power the app. But I have to build that corpus manually, often exploring “puzzles” I find in trying to figure out a proper equivalent in English for some culinary item I find in Spain (btw, I am focusing on Iberian Spanish and trying to prevent terms only found (or used differently) in the New World from defocusing my corpus).

So I’m doing several things with these posts. First they are a kind of journal (or lab notebook) for various translation/description puzzles I try to solve. While I have many MSWord files with the raw work the blog posts highlight some interesting (at least to me) bits. Second by writing for potential readers I have to work a bit harder to try to have my posts accurate and at least somewhat coherent (instead of the real-time stream-of-consciousness in my raw material). This more careful writing makes the posts better but does have a real downside – it’s SLOW. It might not seem like it to you, Dear Reader, but I probably spend more time writing a post about something interesting in a menu than it took me to decipher the entire menu. So at some point the blogging gets in the way of my work.

But the real “dilemma”  I have is that I just don’t get the posts done, at anywhere near the rate I’m discovering the tidbits I want to write about. And days later when I go back over my raw data I often can’t recreate my thoughts or discover I forget to include links or definitions or whatever and don’t much feel like repeating my work.

My posts are fairly long which is good and bad. It’s good because I try to weave multiple points into a post, often with some background research. It’s bad, because the posts are probably too long for most readers’ attention spans and because I don’t get them done.

So every now I’m tempted to do short posts, literally for each situation I encounter, rather than trying to organize multiple examples into a single post.

For instance, I’ve started looking at a new source. Previously I’d used menus I could extract from restaurant websites along the course of the Camino de Santiago, and several online glossaries and dictionaries. But I’d also stumbled on many sites (focused on Spain and entirely in Spanish) for recetas (recipes). These are more tedious to process but often contain information I don’t find elsewhere and therefore can stuff in my corpus so potentially less frequently used (in menus) terms are still incorporated.

So I just started a small trial to look at this recipe site. Under its recetas tab it has 14 categories, and under Pasta y Arroz (pasta and rice) there are 15 webpages with about 12-16 recetas per page. IOW, this is a lot. And every receta is presented on the webpage as a caption (to a photo) where I can use Google Translate and then manually produce a side-by-side Spanish and English pair, such as:

Ñoquis de calabaza y boniato con salsa de gorgonzola Pumpkin and sweet potato gnocchi with gorgonzola sauce

For this I’d extract for my corpus ñoquis (gnocchi ), calabaza (pumpkin), boniato (sweet potato), salsa (sauce), and gorgonzola (gorgonzola). If I double check these term associations by looking in the Oxford dictionary or the DLE (more authoritative, but harder to use than Oxford) I  could add these associations to my corpus with higher confidence levels. IOW, mistakes are bound to get into the corpus without a lot of checking, but I’m also hoping the “big data” type filtering will eliminate the spurious pairs.

But what I just described as the process in this post took me quite a bit more time than it did for me to extract the side-by-side pair (still tedious but relatively quick) and do a quick visual parsing (really looking for any terms that require more research). Note that while I have no fluency in Spanish I do know a bit about the grammar and thus know how to spot parts-of-speech and change the word order used in Spanish to my normal English and thus find the term-by-term association. This entry was simple to do and the only (slightly) interesting part is that the original ‘gnocchi’ does have a different word in Spanish but ‘gorgonzola’ doesn’t (and as a somewhat interesting question, are these “Italian” words or now so incorporated in English, at least by foodies, to consider them English words (known linguistically as ‘loanwords’).

So of the first webpage of pastas this was the most interesting puzzle:

Escudella con sopa de galets, el plato estrella de la Navidad catalana Escudella (in Oxford as -dilla, but some searches appeared with this spelling; is it a typo? here? and on web?) with soup of galets (is this short for galettas?), the star dish of Catalan Christmas

but Oxford has it with a definition (didn’t have translation) in which case it was a specific dish

no, galets appears to be a type of pasta (shells) https://www.tienda.com/products/galets-nadal-pasta-sandro-desii-su-40.html

This is my raw entry. Since escudella and galets appear in the Google Translate as same word in English (i.e. not translated or perhaps there is no translation) this is the type of thing I look for to do more research. When I merely asked Oxford for the translation of  escudella it said that was missing. What it does show (helpfully) is close matches which in this case I tried its suggestion of escudilla (which is bowl and kinda seems to fit this recipe name). So you see the note I made to myself (in Oxford as -dilla, but some searches appeared with this spelling; is it a typo? here? and on web?) but that’s just a start. Since I’ve done this a lot I immediately used the Oxford a different way; instead of asking for translation I asked for definition (of escudella ) and it had this in Spanish (then with Google’s English:

Plato que consiste en un caldo de carne y hortalizas, colado, en el que se cuece arroz, fideos u otro tipo de pasta; es un plato típico de Cataluña, comunidad autónoma de España. Plate consisting of a broth of meat and vegetables, strained, in which rice, noodles or other type of pasta are cooked; It is a typical dish of Catalonia, autonomous community of Spain.

Now I could immediately point out that Google’s translation of plato as ‘plate’ is not correct as plato also means ‘dish’ which fits better but that’s the typical kind of digression I get into that just makes posts take even longer.

Now meanwhile I thought I recognized galets. I did a previous post about the menu from a store selling cookies (as a bit of diversity from just restaurant menus). So I double checked by asking Oxford for the Spanish translation of ‘cookie’ (which is lists also as biscuit in British English) and it has galletas (as I thought I recalled). So I thought this might be some colloquial term for cookie.

But now my “translation” ‘bowl with soup of cookies’ is pretty obvious nonsense and so no better than the untranslated correspondence. So, since this is a new source and I’d already discovered I could click on each receta and get a full page explanation (intro to the disk, ingredients, preparation) I began to see the flaws in my attempt to unravel this puzzle. As the recipe page itself is entirely in Spanish I have the same kind of puzzle, i.e. Google again botched some of the translation. But there is enough text and importantly a picture that I could try some searching and I found galets as an item I can buy online (I’ve often used this source in this project). These look like (in both the recipe picture and the tienda picture as fairly ordinary pasta shells (I don’t see what’s special about them) but pasta shells are pasta shells (except maybe tiny details) so now I’d know what I am getting if I’d picked this off a menu in a restaurant.

So finally I know both these words don’t have English translations so I’d want a different kind of entry in my corpus of a short description and then potentially a longer one. Thus a diner using my app could learn about this dish.

So there, you see what I mean. This post has taken me far longer than the original analysis. Yet it’s good (for my purposes, hopefully somewhat interesting to you, Dear Reader) to have this more complete explanation (I can re-read this post someday when I’ve completely forgotten this and have to resolve something in my app). But if I’d simply written this one item in the most brief form (to jog my memory later, plus at least some glue prose to make it read better than my raw notes) I would have gotten this done.

But it also means I’d probably have many more posts which is mixed benefit as well. So, IOW, there really isn’t a great answer.

So I have a solution. I can use categories to distinguish the posts that are really minimal and that I create almost immediately after doing the work for the corpus. These will really be post “fragments” but at least I get more recorded.

For instance, I was looking at a menu on Friday and its Menu del Dia was for Mother’s Day so I had in mind a post to create on the 5th. But instead I spent most of the day cooking for our Cinco de Mayo feast (and drinking a few too many margaritas). So I never did that post and now the “joke” of it is gone as its timeliness is past.

So I’ll continue to struggle with this, fragmentary and terse posts, or (sometimes too long) complete posts.

A few random bits

Rather than a focused post I’ll just catch up on a few disparate items.

First I’m recording another milestone along my virtual trek which is arriving in Burgos. Burgos was one of the main locations in the movie The Way (where Tom’s pack was stolen) and its main feature is the cathedral. A virtual trek, (i.e. actually exercising on a treadmill in the basement and transferred the accumulated miles onto a GPS trace of the Camino de Santiago) may seem silly but it serves two purposes for me: 1) walking on a treadmill is really boring so I need to have some goal and sense of accomplishment, since I need the treadmill exercise (esp. during the winter here) so I’m in shape to do some real outside walking, and, 2) the slow pace gives me a chance to fairly thoroughly investigate the route (using satellite views, Google StreetView (often available on the Camino and I see lots of peregrinos) and Points of Interest (so I look at photos of albergues and restaurants, plus sometimes find menus). It’s certainly not the same as the real thing but better than nothing.

Before reaching Burgos I’d not found any online menus in other small towns on my virtual trek since Logroño so I had begun to extract terms from a couple of glossaries I’d previously found. I’d already spent a long time (previously reported) on the GallinaBlanca online dictionary so I was also interested in seeing whether the two other lengthy lists I’d found would just be redundant. So that led me back to a bit of coding (haven’t done that for a while) in order to automate the comparison (each extract I’d done was in an incompatible format so first my code had to generate a canonical extract to compare). During that process one of my lists just disappeared (I was only about 1/4 done with it). That’s disappointing since it was a good list and had many terms I hadn’t previously found. Crunching through dictionaries or glossaries is very tedious and nowhere nearly as interesting as looking at menus (which is the purpose of my project here). But it’s a different way to get a sufficiently large corpus to feed into the menu translator I’m building.

So with Burgos on the horizon I began, once again, to focus on restaurant menus. In the small towns I find the restaurants directly as Google Maps POI’s which are clickable to get some info (esp. user contributed photos) and perhaps then linked to a website. Those with websites (fairly uncommon on the small places in small towns) might have a textual menu (many just have photos) and that allows me to generate side-by-side Spanish and English (usually translated by Google Translate, sometimes other ways) terms that I’ll feed into my corpus. Without all the fancy deep learning AI Google uses to train their translator I’ll be using a more algorithmic process to train mine, but mostly to spot Spanish terms that have multiple translations and try to determine the best (more on that below).

So for Burgos the area is quite large (you have to zoom in a lot on Google Maps for the POIs to appear) so I used a different approach. There are numerous rating services for restaurants (I only partly trust them here in USA, so no clue whether they work well in Spain) so just because it has a convenient format I used the Trip Advisor list, which has a total of 376 restaurants. I’ve only looked through the first 40 or so. Less than half of these have websites and probably only about half of those have text I can scrap off the website (often the menu is a photo or some other type of document where the browser can’t select any text that I can then paste in my working document). So with this vast amount of material I’ve been quite busy with menus, having now crunched through six already (with some stories to tell). And I’ve got enough more to finish to keep me busy as in fact my virtual trek has already left Burgos.

But as a random tidbit, tied to the notion of producing entries for my corpus, is the variable translation of the term ración. And I do mean translation (not definition) and usually by Google. The simplest (and most frequent) literal translation is ‘ration’ but even seeing exactly the same word (although sometimes modified with 1/2) on the same page Google translates it differently and also as ‘portion’ or ‘serving’. That’s a bit of a mystery to me why there is the inconsistency but of course Google claims (in its limited online explanations of how Google Translate works) that it is “context-sensitive” in doing translations (IOW, Google also had a large corpus, mostly of translated material in the United Nations, that their AI analyzed to decide both the translation and the “context”). But within a single website, all about food, one would think the context would always be the same. But it’s not the webpage that represents “context” (I realized) it’s the source corpus where “context” is being deduced. So the notion of using “context” to improve translation doesn’t mean quite what one would think.

Now instead of translation here’s what Oxford has as definitions:

1 Cantidad de alimento que se da en una comida a una persona o animal. Amount of food that is given in a meal to a person or animal.
2 Porción unitaria de algo que puede dividirse en varias partes iguales. Unitary portion of something that can be divided into several equal parts.
3 Cantidad determinada de alimento que se toma como aperitivo entre varias personas o comida informal; suele tomarse como acompañamiento de una bebida en un establecimiento público. Quantity of food that is taken as an aperitif among several people or informal food; It is usually taken as an accompaniment to a drink in a public establishment.
4 Cantidad suficiente de algo, generalmente la que se consume en un solo día o a intervalos regulares por una persona o animal. Sufficient quantity of something, usually that which is consumed in a single day or at regular intervals by a person or animal.

Since porción is literally portion it makes some sense to have that as a translation (along with ‘helping’ and ‘serving’) the part of the definition that seems to make the most sense in the context of a restaurant menu is #3 (also #2) more than the sense of the literal ‘ration’ (as in #1 or #4, more a military term). But it is also a quantity designation (more than pincho) even if it is only consumed by one person. Now deciding how much a 1/2 or 1/4 ración is yet another challenge but it appears most restaurants do price a 1/2 at more than 50% of the price of a whole, so if you want a whole order it as two 1/2’s will cost a lot more. IOW, you probably need to be able to discuss this with your server, once again evidence that a menu translator (vs fluency in Spanish) is not going to be sufficient.

Finally as yet another random tidbit one dessert item that didn’t translate (as I’ve described before, it just is what it is) was mantecado. It wasn’t heard to find this (I thought it might be a brand but it’s just the name of a cookie) with an interesting description (here) where it is described as being similar to polvorón which has its own Wikipedia page (here) that also that mentions mantecados and says they are not the same as polvorón (you could fool me looking at the pictures in that page).

From that same menu (here) for the item espárragos cojonudos Google Translate doesn’t have English for cojonudos (espárragos is asparagus in case you’re wondering). Tracking down cojonudos with search quickly led to the connection to cojones which is a term many Americans know as part of slang but it’s not clear how ‘ballsy’ would apply to asparagus . But this article assures us the slang meaning is not the relevant one and the more respectable is ‘awesome’ or ‘outstanding’. Furthermore a particular asparagus from Navarra chooses to label itself with cojonudos  so I guess the connection to cojones doesn’t bother them (or maybe they’re not aware of the etymology of cojonudos).

 

Updated the Spanish Term Index page

You can see on the menu bar of this post “Spanish Term Index”. This is a “page” (not a post) in WordPress.com terminology. I just caught up a bit and have now indexed terms on the oldest 20 posts (of now 67). I only include terms that I discuss enough in a post to get a reasonable understanding of the term (casual mentions without definition I exclude).

So I have a lot of work to do to catch up with all the more recent posts.

Fortunately WordPress.com and MicroSoft Word cooperate with each other. MSWord has a variety of tools to make updating the list easier. When I’m done then I can copy the list from MSWord and just paste into WordPress.com’s page. This page is going to get very long, once I catch up and then keep updating from new posts.

In addition to providing a guide to you, Dear Reader, this also provides me the opportunity to quickly see if I’ve done terms in previous posts and thus avoid a later post (unless I have new information) that would be redundant.

Note that the index does NOT provide the English translation (you’ll have to click on the links to see that). So I have another page, now out-of-date as well that will be my accumulated glossary, which I hope, someday, to be the most complete and accurate glossary you can find on the Net. Right now it’s more an experiment than my actual “authoritative” (i.e. researched) glossary but [eventually] it will be my glossary.

WordPress.com isn’t the best tool for compiling a glossary but it’s all I’ve got. OTOH, glossaries (or dictionaries) I’ve found elsewhere on the Net aren’t so great either (either the method of access or their content). Maybe if I get a really solid and very good list I’ll spring for a website and build some interactive code to be able to lookup these food terms from Spain. If not a smartphone app (that is, something a lot more portable that could work offline) at least you could come to my glossary, with the browser in your phone (if you have a connection) to get information about menus. That may have to do until I can figure out how to actually code an app and have a really good term list for it.

 

Small experiment

Most of the time I’ve spent on this project has involved looking at various source documents from Spain, then with multiple methods of doing translations. Ultimately the point of all this is to build a large corpus of “pairs” (words or phrases in Iberian Spanish and English translation (or some kind of equivalent). Critically I also need to add some measure of how likely the pairs represent valid equivalents so the code (yet to be done) can attempt to establish the probability of the consolidated list of pairs being correct. And also it has to handle the ambiguity, for instance, very common with ternera (is this veal or beef or both? as it often seems to be used for both.) And the multiple and overlapping and contradictory terms for shrimp vs prawns vs langostines (the small rock lobster) is a strong example of confusion on menus.

So given I haven’t yet designed my corpus or the code in ingest new pairs into the corpus and then process the related pairs I have to do experiments, by hand, on a smaller dataset to attempt to visualize the challenges I will face when this is all done with code on a much larger corpus.

So I recently processed an extensive menu from a single restaurant in Granada and just before that two restaurants in Santo Domingo de La Calzada, La Rioja. By process I mean the mostly mechanical work of getting entire sections of menu text side-by-side in original Spanish and then the translated English. Then I look for untranslated terms or silly translations to try to find other sources on the Net (often recetas) to determine the correct correspondence, for instance, manos de ministro is NOT minister’s hands but a colloquial version of the more common manitas de cerdo, or pig’s trotters (feet).

So having done this I’ll provide a few results. In total I ended up with 277 “pairs” with 50 of those on both lists (and thus likely to be very common food terms from menus – see list below). The two restaurants in Santo Domingo de La Calzada contributed 132 unique pairs and the Granada restaurant contributed 95 unique pairs. The various terms in the list are sometimes not that specific to food, for instance:

  1. blanco and negro, colors but used as qualifiers of chocolate in menus; rosada (pink as a color) ended up being quite a chase when it referred to a specific fish.
  2. aroma or chocolate which are the same in Spanish and English but I include them even though it (and others like it) are obvious loanwords as a piece of code doesn’t just “know” this and has to be told.
  3. especialidad (specialties) or vinagreta (vinaigrette) or salmón (salmon) even though these are easy to guess, eventually an app doing translation still needs to recognize these terms.
  4. arrozcarnedulcehuevoleche, panpescadopolloqueso, salsa and vino that are used so much, not just in Mexican restaurant menus but even in TV ads we can effectively consider these loanwords into English now, but again, a computer program doesn’t know that and so still needs to have this in the corpus that will then be the key to its translation.
  5. I did try to consolidate terms that have alternate gender forms and/or singular/plural but didn’t do this as precisely and consistently as a really good corpus would require

While just findings lists of food/cooking terms is easy on the Net whether they are correct or apply to Spain is more problematic. Even a source like a dictionary should be taken with a small dollop of skepticism. Certainly asking any of the various voice assistants is not going to have a very high accuracy rate. So it is necessary to: a) try to focus on sources and thus pairs that are really for Spain and not somewhere in western hemisphere (unless you, Dear Reader, are planning a trek in Bolivia, then do as you need).

So that was my experiment and I end with this list of 50 pairs that are so common you’re very likely to run into them BUT even this list is not 100% accurate as there are various issues with translation (see previous posts).

Cover up the right-hand column and see how many of these you know.

a la plancha grilled
aceite de oliva olive oil
anchoas anchovies
arroz rice
asados roasted
atún tuna
bacalao cod
blanco white
café coffee
Cantábricas/Cantábrico Cantabrian
caramelizados caramelized
carne meat
casera/o caseras/caseros homemade
cerdo pork
chocolate chocolate
comida meal
croquetas croquettes
deliciosa/o deliciosas delicious
dulce sweet
ensalada salad
frita/o fritas fried
guarnición garnish
helado ice cream
huevo egg
jamón ham
langostinos prawns
leche milk
lomo loin (generically; or cured meat specifically)
miel honey
pan bread
patata potato
pato duck
pechuga breast
pescado fish
pimientos peppers
plato dish
pollo chicken
postre dessert
pulpo octopus
queso cheese
revuelto scrambled
salsa sauce
solomillo tenderloin or fillet
tarta cake, also pie
ternera beef (alt: veal)
tomate tomato
tosta toast
vainilla vanilla
verdura vegetable
especialidad especialidades specialty

Mystery post – pez/peces or pescado

My title contains some bits of useful information. While I’m not absolutely certain some sources say peces is the plural of pez. Of course in English the plural of fish is fish so peces seems relatively uncommon. pecado also translates to fish BUT the key difference is that pescado is the piece of fish on your plate and pez is the living animal.

I let Google Translate loose on my previous “mystery” post and it had three types of results: 1) a few of the words translated correctly, 2) some translated but to nonsense, and, 3) some were missed altogether. I’ve tracked a few of the latter.

My big list of words (with cognates or loanwords removed to avoid giving a clue) was a lengthy list of the names of fish, probably as they are called in Spain. I found two long lists on the Net with Latin (scientific names) as well as names in English, Spanish and some other languages. Both were European sources so less likely to include fish found primarily in South America, but who knows how lists get compiled.

Plants and animals from natural world (versus cultivated plants/animals) are frequently misidentified and very tough to get accurate common names. Sometimes even the scientific names are in dispute or contradictory so big surprise the more colloquial names are. After all who but ichthyologists, some fisherman and a few fish mongers actually know these names accurately and/or could just by looking at a fish decide what to call it.

So this is probably the toughest area to compose an accurate Iberian Spanish to English translation list. I’m going to have a third post in this series about the names I conclude are fairly likely but for now here’s a subset of the list from the mystery post that Google failed to translate at all.

alfonsino Golden eye perch
badexo Lythe or pollack
boga bogue
brama bream Pomfret
brotola de roca Greater forkbeard
calion Shark, porbeagle
callas Callas
capelan capelin
chicharro scad – also called horse mackerel
chincharro Horse mackerel or scad
choupa Black bream or porgy or seabream
chucla picarel
cigala crawfish Norway lobster – also called Dublin Bay prawn
colin Coley or saithe
côngrio conger eel conger eel – also called conger
coregono whitefish
escolano smelt – also called sparling
espadilla frostfish – also called silver scabbardfish
espadín sprat sprat – also called brisling
espárido sea bream
illiseria megrim
lanzon sandeel – also called sand lance
limanda dab
longeirón razor clam – also called razor shell
lucioperca pike-perch
lumpo lumpfish Lumpfish
maganto Dublin Bay prawn or langoustine or scampi
mendo Witch or Torbay sole
merlan whiting
mollera poor cod
muergo razor clam – also called razor shell
musola smooth hound – also called dogfish, flake, huss, rigg
pardete Grey mullet
pejerrey silver side, sand smelt argentine – also called silver smelt
pejesapo angler fish Anglerfish or monkfish
perlón Grey gurnard
pescadillo Hake
plegonero whiting
quisquilla shrimp prawn – also called shrimp
salton sandeel – also called sand lance
salvelino char
solla plaice

The left column is the Spanish (with at least one spelling error, don’t know which (chicharro chincharro) is actually correct). The middle column is the few that the Oxford dictionary recognizes. And the third column is from one of these two sources (here and here) which I originally used to compile the list (I found a third list with scientific (Latin) names but didn’t originally use it and haven’t (yet) processed it). I’m a bit surprised Google missed the names that are in Oxford as I’ve encountered some of these in other places.

Now note that even with some of the Spanish names “translated” there are bunches of fish on this list I don’t recognize and I suspect few people would. So probably only a small subset of this list (the names Google didn’t recognize, not the full list) would ever appear on menus.

The two longer lists, with scientific names, seemed to potentially be the most accurate lists but I’ve found others at some other websites. The trouble with these is the names may not relate to Spain and may be from other Spanish speaking areas. This is a very common problem trying to find and merge and consolidate lists from the Net. In addition what is the level of authority of anyone who provides a list – rarely is that known and I see enough mistakes in almost any list to shed some doubt on the accuracy of the information. But all that said I’ll be trying, in the next post, to produce the largest and most accurate list from the raw material I can find.

So stay tuned for the final result.

Mystery post

This is some work in progress. Guess based on any terms you recognize what the work may be.

abadejo, abadejo de Alaska, aguja, aguja azul, aguja azul del Indo Pac, aguja blanca, aguja negra, alacha, alfonsino, almeja, anguila, arenque, bacaladilla, badexo, barrilete, berberecho, bermejuela, bígaro], bocina, boga, bogavante, boquerón, brama, brema común, brosmio, brótola de fango, brotola de roca, caballa, cabezuda, cabracho,  calandino, calion, callas, camarón, camaron tigre, cangrejo, cangrejo de rio, capelan, caramel, carbonero, carpa, centolla, chancha, chicharro, chincharro, choupa, chucla, cicloptero, cigala, colin, colin de Alaska, côngrio, coregono, croque, eglefino, emperador, eperlano, escolano, espadilla, espadín, espárido, estornino, esturión, esturion, esturion estrellado, falsa limanda, falsó lenguado, faneca, faneca noruega, fletán, fletan del Pacifico, gallineta, gallineta nórdica, gallo, gallo de San Pedro, galludo, gata, gato, golleta, granedero, hipogloso negro, husio, illiseria, jibia, jurel, lampuga, langosta, lanzon, lengua lisa, lenguado, libre de mar, limanda, limanda nórdica, lisa, listado, lobo, longeirón, lubina, lubricante, lucio, lucioperca, lumpo, maganto, maruca, maruca azul, mejillón, mendo, mendo limón, merlan, merluza, merluza de cola azul, merluzzo Francese, mero, mielga, mollera, muergo, mújol, musola, nécora, ostión, ostra, palero, pardete, pargos, pejerrey, pejesapo, perca, perlón, perro del norte, pescadillo, pez de plata, pez de San Pedro, pez espada, pez sable negro, pintarroja, platija, platija americana, plegonero, quimera, quisquilla, rabil, rascacio, raya, reloj anaranjado, reloj del Atlántico, rémol, rodaballo, rubio, salmonete, salmonete de roca, salton, salvelino, sapo, sierra, sierra del sur, solla, tenca, tiburón, tota, trucha, trucha arco iris, trucha arcoiris, vieira, volador, volandeira

Clue, Latin matters in figuring all this out.

 

 

Verbs again

In my previous post (about finishing initial processing of GallinaBlanca dictionary) I mentioned that verbs can be of some use in interpreting menus, possibly through derivatives of the infinitive form of the verb. So I’ve continued to do some digging in this area and have a few results to share.

Anticipating I’d be looking at verbs, independently of extracting them from the GB dictionary I used about nine online “lists” to compile an aggregate list. These verbs: a) may have nothing to do with cooking or cuisine, b) tend to be more commonly used verbs, and, c) may not be used (at all, or in same way) in Spain. So this is the list I’m calling C.

In the process of other searches I stumbled onto a culinary glossary. It has no connection with Spain and therefore the Spanish words might come from any part of the world. And as I worked with it more extensively and carefully I observe many of the issues with online resources of unknown origin: a) misspellings (probably, don’t want to jump to conclusion just because words seem to be misspelled), b) duplications, often including the singular and plural form, c) words that make no sense appearing in Spanish culinary dictionary (how did these drift in), d) inconsistent formatting and thus order (e.g. A la cazuela vs Cazadora, A la). In a previous iteration of my project I created a “glossary” by merging information from many sources and eventually it became a pisto (hotchpodge, if I can use that word in a non-culinary sense), especially losing any notion of whether the words applied to Spain or some other Spanish speaking area. So with these caveats I’ll call this list G.

And I have my list of verbs from the GallinaBlanca dictionary which I previously described. I’ll call this list D.

Now, simply, it’s too much work to compare the entirety of all three of these lists so I just did the subset (verbs only, of course) of verbs starting with A B or C. While this may be a biased sample it still reveals some interesting information.

Sorting the three lists together (with different fonts and colors for each list so I can distinguish) then I did manual processing to consolidate like terms together. As a result I ended coding each entry with GDC (or – if not in that list). So I generate the following table:

G– 44
-D- 4
–C 35
GD- 28
-DC 1
G-C 9
GDC 5

There are 126 verbs that appear in at least one of these lists. Only 5 verbs appear in all three lists. The list with the largest number of unique verbs is the G (glossary, 44), which thus indicates this is potentially very useful as it adds over 50% more verbs than I had previously found.  The verbs in the C (common) list may have nothing to do with cooking or food (we’re explore that later in the post) so this may not add much. Only 5 verbs from the GallinaBlanca list don’t appear in the glossary list so whoever compiled that got most of the cooking verbs.

So looking at the verbs that are only in the C (common) list and not in either cooking related list we do see a few surprising omissions (I’m assuming that these are SO common no one bothers to include them):

abrir –C to open; to turn on; to whet (as in appetite)
agregar –C to add
añadir –C to add
beber –C to drink
calentar –C to heat, heat up, warm up; to inflame
cocinar –C to cook
combinar –C to combine, mix; to put together, match, coordinate
comer –C to eat; to have for lunch; [Latin America] to have for dinner
concinar –C not in any dictionary, probably misspelling of cocinar
convertir –C to turn into, convert into, change into, make
cortar –C to cut, cut off, carve, slice, cut out; to chop; to cut (dilute sense); …

So out of the 35 verbs in the C (common) list only I’d probably include these 11 in a general purpose culinary list.

Now some of the verbs in the G (glossary) don’t appear to be useful. Some have no definition in any of the dictionaries I routinely use, including the most authoritative of the Spanish language (which is NOT limited to Spain so could include verbs that don’t get used in Spain).  So here are a few I’d consider dubious to include in a culinary glossary:

achicalar G– [Mexico] to cover in honey; soak in honey
añejar G– to age; [vino] to mature; to get stale
apanar G– to coat in breadcrumbs (also EMPANAR or EMPANIZAR)
apuntillar G– to finish off (a toro); to round off
ataviar G– to dress up
bardar G– to thatch
blanchir G– (not in dict) Wiktionary has it as a French term for make white
bresear G– (from glossary) To cook to slow fire, during long time, with condiments (generally vegetables, wine, broth and spices). Clearly a spelling error since not found.
cantar G– to sing; to crow, chirp
caramerizar G– (not in dict), another spelling? [from glossary] Spread a mold with sugar honey.
castigar G– to punish; to ground, keep in; to damage, harm
cerner G– to sift, sieve (same as cernir, which is it?)
chapurrar G– to speak badly

I wouldn’t include achicalar as it doesn’t appear to be used in Spain but this is a good point about my goal here. If I wanted to know the Spanish word, used in Spain, for an English word, I wouldn’t include anything that may be only used outside Spain. But my goal is asymmetric – to translate Spanish (on menus) only into English (so I can choose) so including a word in my corpus (and eventually my app) that is not likely to be used in Spain is not a problem (I do need metadata to note this however, for that term). If I never see the term it does no harm to never have it found in any lookup. OTOH, it would be a problem if I’m trying to translate English into Spanish, as in don’t use a word not found in Spain. It appears, for instance, frijoles, which is well-known to most in USA who visit Mexican restaurants is one such word, not commonly used in Spain, but possibly likely a Spaniard would know the word. That might lead to a scene (from The Way) like no tapas in Navarra, only pinxtos, and thus make you look foolish.

blanchir (to make white, which isn’t exactly synonymous with blanch but one might assume that’s what this means) was interesting in that it did not occur in any dictionary but did have an entry in Wiktionary. The standard term  for blanch is palidecer (purely in the sense of turn white) and escaldar or blanquear for the culinary sense. I suspect  blanchir might be used somewhere (possibly Puerto Rico) where it is just the cognate of the English verb. But, again, in collecting the corpus I should not make judgments like this although I might add metatext to an blanchir entry and meanwhile add it to corpus and then let the “big data” statistical analysis decide if this is a word or not.

bresear really looks like a misspelling (more likely to be brasear, to barbecue) but again it should go into the corpus with metadata notion rather than my passing a judgment on it (IOW, only a real expert in Spanish should be decided what to include or not in any translation dictionary, so if I find only one instance of a misspelled word it will get washed out since there are few occurrences of it in the corpus; OTOH, maybe people do commonly misspell this word so it needs to be in my app). caramerizar appears to be some variant of caramelizar, again perhaps used somewhere and not just a mistake. cerner has exactly the same definition (in the glossary itself, but also spanishdict) as the more common spelling cernir, although both appear in reverse lookup of ‘to sift’ in spanishdict (which is it, then? just a common confusion?) cernido is a possible term to see on a menu so it matters that my dictionary could spot this as past participle of cerner.

So again all this goes to show the work that must be done to really develop a very accurate dictionary that drives my app for menu translation (or to be published as a carefully researched culinary glossary).