I’m using this blog (partially) to “document” interesting tidbits I encounter while doing research for my anticipated smartphone app to translate menus in Spain. That app needs to have a comprehensive and accurate dataset to use in the translation, not just the equivalent English term (which doesn’t always exist) but also some description. For example, what is sobrasada? Yes, it’s ‘sausage’ but saying that (or even ‘spicy pork sausage’) doesn’t tell you very much.
So I’m using various sources to build up a “big data” corpus which will have translation errors and other errors. But algorithmically I can extract from that corpus what I’ll need to power the app. But I have to build that corpus manually, often exploring “puzzles” I find in trying to figure out a proper equivalent in English for some culinary item I find in Spain (btw, I am focusing on Iberian Spanish and trying to prevent terms only found (or used differently) in the New World from defocusing my corpus).
So I’m doing several things with these posts. First they are a kind of journal (or lab notebook) for various translation/description puzzles I try to solve. While I have many MSWord files with the raw work the blog posts highlight some interesting (at least to me) bits. Second by writing for potential readers I have to work a bit harder to try to have my posts accurate and at least somewhat coherent (instead of the real-time stream-of-consciousness in my raw material). This more careful writing makes the posts better but does have a real downside – it’s SLOW. It might not seem like it to you, Dear Reader, but I probably spend more time writing a post about something interesting in a menu than it took me to decipher the entire menu. So at some point the blogging gets in the way of my work.
But the real “dilemma” I have is that I just don’t get the posts done, at anywhere near the rate I’m discovering the tidbits I want to write about. And days later when I go back over my raw data I often can’t recreate my thoughts or discover I forget to include links or definitions or whatever and don’t much feel like repeating my work.
My posts are fairly long which is good and bad. It’s good because I try to weave multiple points into a post, often with some background research. It’s bad, because the posts are probably too long for most readers’ attention spans and because I don’t get them done.
So every now I’m tempted to do short posts, literally for each situation I encounter, rather than trying to organize multiple examples into a single post.
For instance, I’ve started looking at a new source. Previously I’d used menus I could extract from restaurant websites along the course of the Camino de Santiago, and several online glossaries and dictionaries. But I’d also stumbled on many sites (focused on Spain and entirely in Spanish) for recetas (recipes). These are more tedious to process but often contain information I don’t find elsewhere and therefore can stuff in my corpus so potentially less frequently used (in menus) terms are still incorporated.
So I just started a small trial to look at this recipe site. Under its recetas tab it has 14 categories, and under Pasta y Arroz (pasta and rice) there are 15 webpages with about 12-16 recetas per page. IOW, this is a lot. And every receta is presented on the webpage as a caption (to a photo) where I can use Google Translate and then manually produce a side-by-side Spanish and English pair, such as:
|Ñoquis de calabaza y boniato con salsa de gorgonzola||Pumpkin and sweet potato gnocchi with gorgonzola sauce|
For this I’d extract for my corpus ñoquis (gnocchi ), calabaza (pumpkin), boniato (sweet potato), salsa (sauce), and gorgonzola (gorgonzola). If I double check these term associations by looking in the Oxford dictionary or the DLE (more authoritative, but harder to use than Oxford) I could add these associations to my corpus with higher confidence levels. IOW, mistakes are bound to get into the corpus without a lot of checking, but I’m also hoping the “big data” type filtering will eliminate the spurious pairs.
But what I just described as the process in this post took me quite a bit more time than it did for me to extract the side-by-side pair (still tedious but relatively quick) and do a quick visual parsing (really looking for any terms that require more research). Note that while I have no fluency in Spanish I do know a bit about the grammar and thus know how to spot parts-of-speech and change the word order used in Spanish to my normal English and thus find the term-by-term association. This entry was simple to do and the only (slightly) interesting part is that the original ‘gnocchi’ does have a different word in Spanish but ‘gorgonzola’ doesn’t (and as a somewhat interesting question, are these “Italian” words or now so incorporated in English, at least by foodies, to consider them English words (known linguistically as ‘loanwords’).
So of the first webpage of pastas this was the most interesting puzzle:
|Escudella con sopa de galets, el plato estrella de la Navidad catalana||Escudella (in Oxford as -dilla, but some searches appeared with this spelling; is it a typo? here? and on web?) with soup of galets (is this short for galettas?), the star dish of Catalan Christmas
but Oxford has it with a definition (didn’t have translation) in which case it was a specific dish
no, galets appears to be a type of pasta (shells) https://www.tienda.com/products/galets-nadal-pasta-sandro-desii-su-40.html
This is my raw entry. Since escudella and galets appear in the Google Translate as same word in English (i.e. not translated or perhaps there is no translation) this is the type of thing I look for to do more research. When I merely asked Oxford for the translation of escudella it said that was missing. What it does show (helpfully) is close matches which in this case I tried its suggestion of escudilla (which is bowl and kinda seems to fit this recipe name). So you see the note I made to myself (in Oxford as -dilla, but some searches appeared with this spelling; is it a typo? here? and on web?) but that’s just a start. Since I’ve done this a lot I immediately used the Oxford a different way; instead of asking for translation I asked for definition (of escudella ) and it had this in Spanish (then with Google’s English:
|Plato que consiste en un caldo de carne y hortalizas, colado, en el que se cuece arroz, fideos u otro tipo de pasta; es un plato típico de Cataluña, comunidad autónoma de España.||Plate consisting of a broth of meat and vegetables, strained, in which rice, noodles or other type of pasta are cooked; It is a typical dish of Catalonia, autonomous community of Spain.|
Now I could immediately point out that Google’s translation of plato as ‘plate’ is not correct as plato also means ‘dish’ which fits better but that’s the typical kind of digression I get into that just makes posts take even longer.
Now meanwhile I thought I recognized galets. I did a previous post about the menu from a store selling cookies (as a bit of diversity from just restaurant menus). So I double checked by asking Oxford for the Spanish translation of ‘cookie’ (which is lists also as biscuit in British English) and it has galletas (as I thought I recalled). So I thought this might be some colloquial term for cookie.
But now my “translation” ‘bowl with soup of cookies’ is pretty obvious nonsense and so no better than the untranslated correspondence. So, since this is a new source and I’d already discovered I could click on each receta and get a full page explanation (intro to the disk, ingredients, preparation) I began to see the flaws in my attempt to unravel this puzzle. As the recipe page itself is entirely in Spanish I have the same kind of puzzle, i.e. Google again botched some of the translation. But there is enough text and importantly a picture that I could try some searching and I found galets as an item I can buy online (I’ve often used this source in this project). These look like (in both the recipe picture and the tienda picture as fairly ordinary pasta shells (I don’t see what’s special about them) but pasta shells are pasta shells (except maybe tiny details) so now I’d know what I am getting if I’d picked this off a menu in a restaurant.
So finally I know both these words don’t have English translations so I’d want a different kind of entry in my corpus of a short description and then potentially a longer one. Thus a diner using my app could learn about this dish.
So there, you see what I mean. This post has taken me far longer than the original analysis. Yet it’s good (for my purposes, hopefully somewhat interesting to you, Dear Reader) to have this more complete explanation (I can re-read this post someday when I’ve completely forgotten this and have to resolve something in my app). But if I’d simply written this one item in the most brief form (to jog my memory later, plus at least some glue prose to make it read better than my raw notes) I would have gotten this done.
But it also means I’d probably have many more posts which is mixed benefit as well. So, IOW, there really isn’t a great answer.
So I have a solution. I can use categories to distinguish the posts that are really minimal and that I create almost immediately after doing the work for the corpus. These will really be post “fragments” but at least I get more recorded.
For instance, I was looking at a menu on Friday and its Menu del Dia was for Mother’s Day so I had in mind a post to create on the 5th. But instead I spent most of the day cooking for our Cinco de Mayo feast (and drinking a few too many margaritas). So I never did that post and now the “joke” of it is gone as its timeliness is past.
So I’ll continue to struggle with this, fragmentary and terse posts, or (sometimes too long) complete posts.