Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymised Variables should have consistent naming corresponding to their column #25

Open
DeNeutoy opened this issue Nov 12, 2018 · 4 comments
Labels
enhancement New feature or request

Comments

@DeNeutoy
Copy link

It's a little annoying that the anonymised variable names sometimes but not always correspond to the table/column name they come from. E.g in some datasets like academic, the variable name is derived from the column name:

        "sql": [
            "SELECT JOURNALalias0.HOMEPAGE FROM JOURNAL AS JOURNALalias0 WHERE JOURNALalias0.NAME = \"journal_name0\" ;"
        ],
        "variables": [
            {
                "example": "PVLDB",
                "location": "both",
                "name": "journal_name0",
                "type": "journal_name"
            }
        ]

whereas in geography, variables are named var1, from which you cannot directly infer their type from either the name or the type key.

        "sql": [
            "SELECT CITYalias0.CITY_NAME FROM CITY AS CITYalias0 WHERE CITYalias0.POPULATION = ( SELECT MAX( CITYalias1.POPULATION ) FROM CITY AS CITYalias1 WHERE CITYalias1.STATE_NAME = \"var0\" ) AND CITYalias0.STATE_NAME = \"var0\" ;"
        ],
        "variables": [
            {
                "example": "arizona",
                "location": "both",
                "name": "var0",
                "type": "state"
            }
        ]
@jkkummerfeld
Copy link
Owner

I've worked on addressing this in #27 have a look and let me know what you think!

@jkkummerfeld
Copy link
Owner

Hm, though having something that directly maps to the DB for all cases is trickier. More thought required.

@DeNeutoy
Copy link
Author

Hmm yeah I also found this once I dug into it more - e.g the limit0 variables in the scholar dataset are really a function of the query rather than particular to the database. What you've done for geography looks like an improvement though!

@jkkummerfeld
Copy link
Owner

I've merged that for now, but will keep this open as a reminder that this issue requires more work. My thinking is that I could do the following:

  • For all variables that clearly map to a single field, rename them, specifying both the table and field.
  • For variables that can be used in multiple ways, change them to have a different name in each case.
  • For variables that do not have a single mapping (e.g. they are used in multiple ways in the query) have a special case.

That would be an improvement over the current state, though would also be a fair amount of work.

@jkkummerfeld jkkummerfeld added the enhancement New feature or request label Jun 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants