-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update README.md #53
Update README.md #53
Conversation
Adding proper branding on first use ("Apache Parquet") and pointer to the project's overview rather than a vendor overview.
Thanks for the PR! The proper branding sounds great. For the second I'd like to retain at least a link to the vendor overview, as I find it to be a much better explanation than the project's overview, and many GeoParquet users will have not heard of Parquet at all. But we can have both links, and explain that the second is an explanation from a vendor. |
@cholmes @jzb I think is better to maintain both links:
I would suggest the following change:
What do you think? |
I'm fine with @cayetanobv proposal. Could you add it as a commit suggestion in the current PR? |
Hi all - sorry, responses were filtered. I'd strongly prefer that we link to the project as definitive source. If the Parquet project needs to improve their definition, then let's do that - the authoritative word on "what is parquet" should be from Parquet. The other helps reinforce a vendor as the SEO authoritative resource on the topic. That's undesirable. |
Hi @jzb . I think you are right in saying we should link to the Parquet website. But it's true that the official explanation from Parquet project is not very good as an introduction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a suggestion to clarify while also including the nice databricks explanation
@jzb - see my latest commit suggestion. I couched the databricks one as a 'vendor explanation', so it doesn't SEO it more. But I believe we need to include a good explanation of that for our users who have never heard of parquet, hadoop or columnar data formats. The current 'overview' just says 'Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.' - that doesn't actually help explain much about it at all or its advantages to our users. Once the official project has something that really explains it to new users we'd be more than happy to only link to that. |
Agreed that we should link to the official docs, and that we can help improve the explanation there. But as long as the explanation on the official docs is unclear (and actually just wrong, it's not at all tied to Hadoop), I agree with the others that we need to keep some link to a better explanation. It's unfortunate that we need to link to a vendor, but if someone finds some more neutral post with a good explanation, we could also use that (at the time of the original PR that added this text, I did a search finding something, but didn't directly find something else that would be fitting to link to) |
Interestingly, the Parquet website was recently updated! It looks like the website had been virtually unchanged from 2015 until last month. Now the homepage doesn't reference hadoop at all: https://parquet.apache.org
The site is now in this repo: https://github.com/apache/parquet-site |
Ah, indeed, the main website has indeed been updated with a much better explanation! The docs link (https://parquet.apache.org/docs/overview/) still has the old explanation, so I would personally still not yet use that for a "what is parquet" link. But maybe the link to the main website is then sufficient? |
The main website is definitely much better now. The main bit that I still find lacking on the main site is 'what is a column-oriented data file format'? I'd be happy to link to another definition of that, but right now the databricks site does the best I've seen of answering that question. I do agree the apache parquet docs overview isn't great, could make sense to just leave it out. |
I think it's fair to link to a better definition of "what is a column-oriented..." on the Databricks site & "what is Apache Parquet" linking to the Parquet homepage. Does that work? |
Ok, made an attempt to capture that. |
That works for me - thank you very much! |
Co-authored-by: Chris Holmes <chomie@gmail.com>
I think it's a good proposal. Thanks all! |
Adding proper branding on first use ("Apache Parquet") and pointer to the project's overview rather than a vendor overview.