PySpark

since 2022-01-27

AWS Athena の CTAS が S3 に作った Parquet + Snappy のファイルを読んでみる。

from pyspark import SparkContext
from pyspark.sql import SparkSession
spark_context = SparkContext()
spark = SparkSession(spark_context)
 
filename = "20220124_093108_00027_*****_********-****-****-****-********"
 
df = spark.read.parquet(filename)
 
type(df)
# => pyspark.sql.dataframe.DataFrame
 
df.show()
 
df.createOrReplaceTempView("chap7_japan_ctas")
 
spark.sql("select * from chap7_japan_ctas").collect()
 
df.count()
 
df2 = df.toPandas()
 
type(df2)
# => pandas.core.frame.DataFrame
 
df2.tail(10)
pyspark.txt · 最終更新: 2022/01/27 15:59 by Takuya Nishimoto
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0