Fixed duplicate label translations for secondary itemsets and use itextID for selects with choices that have media specified #467

DavisRayM · 2020-09-03T13:34:48Z

Closes #464, #463

Why is this the best possible solution? Were any other approaches considered?

What are the regression risks?

None.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No

Before submitting this PR, please make sure you have:

included test cases for core behavior and edge cases in tests_v1
run nosetests and verified all tests pass
run black pyxform to format code
verified that any code or assets from external sources are properly credited in comments

codecov-commenter · 2020-09-03T13:40:17Z

Codecov Report

Merging #467 into master will increase coverage by 0.14%.
The diff coverage is 97.61%.

@@            Coverage Diff             @@
##           master     #467      +/-   ##
==========================================
+ Coverage   83.39%   83.53%   +0.14%     
==========================================
  Files          25       25              
  Lines        3601     3633      +32     
  Branches      836      842       +6     
==========================================
+ Hits         3003     3035      +32     
  Misses        454      454              
  Partials      144      144

Impacted Files	Coverage Δ
pyxform/xform2json.py	`79.59% <88.88%> (+0.17%)`	⬆️
pyxform/question.py	`92.95% <100.00%> (+0.10%)`	⬆️
pyxform/survey.py	`91.88% <100.00%> (+0.36%)`	⬆️
pyxform/survey_element.py	`94.71% <100.00%> (+0.27%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 50921a8...f79ade9. Read the comment docs.

lognaturel

Thanks! It would be really helpful to get a high-level view of the decisions you made and some risk analysis. There are clearly ramifications if downstream tools use the itext ids in some way. I don't know how likely that is. Did you think through that?

If we're going to be changing the structure of the itext ids, we should consider alternatives. For example, we could use uuids for those identifiers instead of using the list and choice names. I don't really understand the static_instance prefix. If it's just noise, could we remove it?

lognaturel · 2020-09-08T21:22:55Z

pyxform/tests_v1/test_translations.py

@@ -75,3 +75,41 @@ def test_missing_media_itext(self):
            ],
            itext__excludes=['<value form="audio">jr://audio/-</value>'],
        )
+
+    def test_select_with_choice_filter_translation(self):


This test would pass on master, right? You want to make sure to have a test that would have failed so that it tests the new code paths. I think you can do that by removing the label::Latin column. I highly recommend using a test-driven approach in this code base -- typically it's possible to add tests without much refactor which is great! That way you can really go through the red-green-refactor cycle.

I would expect at the very least test_select_with_media_and_choice_filter_and_no_translations_generates_media and test_select_with_choice_filter_and_translations_generates_single_translation or something like that to capture the desired behavior for the two issues you're closing.

lognaturel · 2020-09-08T21:25:49Z

pyxform/survey.py

-            if multi_language:
-                itext_id = "-".join(["static_instance", list_name, str(idx)])
+            if multi_language or has_media:
+                name = choice.get("name").replace(" ", "_")


Why add this? Wouldn't a name with a space fail earlier because it's not a valid XML identifier? If it's possible to have a name with a space in it, please add a test to cover that case.

It seemed to be possible for this test, I'll need to confirm this though. I'll confirm and add a test if this is the case.

lognaturel · 2020-09-08T21:26:58Z

pyxform/survey.py

-                itext_id = "-".join(["static_instance", list_name, str(idx)])
+            if multi_language or has_media:
+                name = choice.get("name").replace(" ", "_")
+                itext_id = "-".join(["static_instance", list_name, name])


Is this change from index to name so that all identifiers have a consistent structure? Why is this important? Do you know what the static_instance prefix is supposed to mean (I don't). What are the implications of changing this for all translations?

It's to make the ID construction more predictable. Noticed that the itext entry creation happens in multiple areas

pyxform/pyxform/survey.py

Lines 594 to 610 in f79ade9

for element in self.iter_descendants():

for d in element.get_translations(self.default_language):

translation_path = d["path"]

form = "long"

if "guidance_hint" in d["path"]:

translation_path = d["path"].replace("guidance_hint", "hint")

form = "guidance"

self._translations[d["lang"]][translation_path] = self._translations[

d["lang"]

].get(translation_path, {})

self._translations[d["lang"]][translation_path].update(

{form: {"text": d["text"], "output_context": d["output_context"],}}

)

pyxform/pyxform/survey.py

Lines 613 to 629 in f79ade9

for list_name, choice_list in self.choices.items():

multi_language = isinstance(choice_list[0].get("label"), dict)

has_media = bool(choice_list[0].get("media"))

if not multi_language and not has_media:

continue

for choice in choice_list:

for name, choice_value in choice.items():

name = choice.get("name").replace(" ", "_")

itext_id = "-".join(["static_instance", list_name, name])

if isinstance(choice_value, dict):

_setup_choice_translations(name, choice_value, itext_id)

elif name == "label":

self._add_to_nested_dict(

self._translations,

[self.default_language, itext_id, "long"],

choice_value,

)

pyxform/pyxform/survey.py

Line 651 in f79ade9

def _setup_media(self):

using the name makes it a bit easier to recreate the IDs and stop duplicate entries from being created in other areas.

Not entirely sure what the static_instance prefix means... Changing this for all translations related to choices(with multiple languages / media) would mean that all itext entries will have the static_instance-<list_name>-<name> format which seemed a bit more consistent and predictable, I don't believe this would affect any other functionality

lognaturel · 2020-09-08T21:39:33Z

pyxform/xform2json.py

@@ -546,6 +547,15 @@ def _set_translations(self):
        assert "text" in self.translations[0]
        assert "lang" in self.translations[0]

+    def _get_list_name(self, itemset):


This doesn't look right. You should be able to get that instance name from the instance declaration.

Mhm I might have been looking at the wrong area, I might have to look at the functionality that converts the XML to a dict. By this point the list_nameinfo doesn't seem to have been retrieved... Seems to just be a breakdown of the various sections of the XLSForm, didn't contain the list_name within it's contents.

DavisRayM · 2020-09-09T01:19:36Z

Closing this in favor of #468. I didn't probably document my steps here or my reasoning for some of the changes. Most changes made here are actually not needed in order to fix the tagged issues. #468 is cleaner and I believe a bit more documented.

DavisRayM force-pushed the 464-select-single-language-images branch 7 times, most recently from b09cb7c to b4b0683 Compare September 7, 2020 14:25

Use itext to specify labels if media files present

480a0ea

DavisRayM force-pushed the 464-select-single-language-images branch 4 times, most recently from 782eb1e to 361834c Compare September 8, 2020 08:48

DavisRayM added 4 commits September 8, 2020 11:56

Setup translations for choices that have media files

8471127

Update test fixtures

d473471

Set list_name if retrievable from XML

6b2884a

Test that duplicate itext entries are not created

f79ade9

DavisRayM force-pushed the 464-select-single-language-images branch from 361834c to f79ade9 Compare September 8, 2020 08:56

DavisRayM changed the title ~~[WIP] Use itext if select has media~~ Fixed duplicate label translations for secondary itemsets and use itextID for selects with choices that have media specified Sep 8, 2020

DavisRayM marked this pull request as ready for review September 8, 2020 09:28

DavisRayM requested a review from lognaturel September 8, 2020 09:31

lognaturel requested changes Sep 8, 2020

View reviewed changes

DavisRayM mentioned this pull request Sep 9, 2020

Fixed duplicate label translations for secondary itemsets and use itextID for selects with choices that have media specified #468

Merged

4 tasks

DavisRayM closed this Sep 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed duplicate label translations for secondary itemsets and use itextID for selects with choices that have media specified #467

Fixed duplicate label translations for secondary itemsets and use itextID for selects with choices that have media specified #467

DavisRayM commented Sep 3, 2020 •

edited

Loading

codecov-commenter commented Sep 3, 2020 •

edited

Loading

lognaturel left a comment •

edited

Loading

lognaturel Sep 8, 2020

lognaturel Sep 8, 2020

DavisRayM Sep 8, 2020

lognaturel Sep 8, 2020

DavisRayM Sep 8, 2020

lognaturel Sep 8, 2020

DavisRayM Sep 8, 2020

DavisRayM commented Sep 9, 2020

	for element in self.iter_descendants():
	for d in element.get_translations(self.default_language):

	translation_path = d["path"]
	form = "long"

	if "guidance_hint" in d["path"]:
	translation_path = d["path"].replace("guidance_hint", "hint")
	form = "guidance"

	self._translations[d["lang"]][translation_path] = self._translations[
	d["lang"]
	].get(translation_path, {})

	self._translations[d["lang"]][translation_path].update(
	{form: {"text": d["text"], "output_context": d["output_context"],}}
	)

	for list_name, choice_list in self.choices.items():
	multi_language = isinstance(choice_list[0].get("label"), dict)
	has_media = bool(choice_list[0].get("media"))
	if not multi_language and not has_media:
	continue
	for choice in choice_list:
	for name, choice_value in choice.items():
	name = choice.get("name").replace(" ", "_")
	itext_id = "-".join(["static_instance", list_name, name])
	if isinstance(choice_value, dict):
	_setup_choice_translations(name, choice_value, itext_id)
	elif name == "label":
	self._add_to_nested_dict(
	self._translations,
	[self.default_language, itext_id, "long"],
	choice_value,
	)

Fixed duplicate label translations for secondary itemsets and use itextID for selects with choices that have media specified #467

Fixed duplicate label translations for secondary itemsets and use itextID for selects with choices that have media specified #467

Conversation

DavisRayM commented Sep 3, 2020 • edited Loading

Why is this the best possible solution? Were any other approaches considered?

What are the regression risks?

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Before submitting this PR, please make sure you have:

codecov-commenter commented Sep 3, 2020 • edited Loading

Codecov Report

lognaturel left a comment • edited Loading

Choose a reason for hiding this comment

lognaturel Sep 8, 2020

Choose a reason for hiding this comment

lognaturel Sep 8, 2020

Choose a reason for hiding this comment

DavisRayM Sep 8, 2020

Choose a reason for hiding this comment

lognaturel Sep 8, 2020

Choose a reason for hiding this comment

DavisRayM Sep 8, 2020

Choose a reason for hiding this comment

lognaturel Sep 8, 2020

Choose a reason for hiding this comment

DavisRayM Sep 8, 2020

Choose a reason for hiding this comment

DavisRayM commented Sep 9, 2020

DavisRayM commented Sep 3, 2020 •

edited

Loading

codecov-commenter commented Sep 3, 2020 •

edited

Loading

lognaturel left a comment •

edited

Loading