Apache Sedona™ is a spatial computing engine that enables developers to easily process spatial data at any scale within modern cluster computing systems such as Apache Spark and Apache Flink. Sedona developers can express their spatial data processing tasks in Spatial SQL, Spatial Python or Spatial R. Internally, Sedona provides spatial data loading, indexing, partitioning, and query processing/optimization functionality that enable users to efficiently analyze spatial data at any scale.
Some of the key features of Apache Sedona include:
These are some of the key features of Apache Sedona, but it may offer additional capabilities depending on the specific version and configuration.
Click and play the interactive Sedona Python Jupyter Notebook immediately!
Apache Sedona is a widely used framework for working with spatial data, and it has many different use cases and applications. Some of the main use cases for Apache Sedona include:
This example loads NYC taxi trip records and taxi zone information stored as .CSV files on AWS S3 into Sedona spatial dataframes. It then performs spatial SQL query on the taxi trip datasets to filter out all records except those within the Manhattan area of New York. The example also shows a spatial join operation that matches taxi trip records to zones based on whether the taxi trip lies within the geographical extents of the zone. Finally, the last code snippet integrates the output of Sedona with GeoPandas and plots the spatial distribution of both datasets.
taxidf = spark.read.format('csv').option("header","true").option("delimiter", ",").load("s3a://your-directory/data/nyc-taxi-data.csv")
taxidf = taxidf.selectExpr('ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup', 'Trip_Pickup_DateTime', 'Payment_Type', 'Fare_Amt')
zoneDf = spark.read.format('csv').option("delimiter", ",").load("s3a://your-directory/data/TIGER2018_ZCTA5.csv")
zoneDf = zoneDf.selectExpr('ST_GeomFromWKT(_c0) as zone', '_c1 as zipcode')
taxidf_mhtn = taxidf.where('ST_Contains(ST_PolygonFromEnvelope(-74.01,40.73,-73.93,40.79), pickup)')
taxiVsZone = spark.sql('SELECT zone, zipcode, pickup, Fare_Amt FROM zoneDf, taxiDf WHERE ST_Contains(zone, pickup)')
zoneGpd = gpd.GeoDataFrame(zoneDf.toPandas(), geometry="zone")
taxiGpd = gpd.GeoDataFrame(taxidf.toPandas(), geometry="pickup")
zone = zoneGpd.plot(color='yellow', edgecolor='black', zorder=1)
zone.set_xlabel('Longitude (degrees)')
zone.set_ylabel('Latitude (degrees)')
zone.set_xlim(-74.1, -73.8)
zone.set_ylim(40.65, 40.9)
taxi = taxiGpd.plot(ax=zone, alpha=0.01, color='red', zorder=3)
To install the Python package:
pip install apache-sedona
To Compile the source code, please refer to Sedona website
Modules in the source code
Name | API | Introduction |
---|---|---|
Core | Scala/Java | Distributed Spatial Datasets and Query Operators |
SQL | Spark RDD/DataFrame in Scala/Java/SQL | Geospatial data processing on Apache Spark |
Flink | Flink DataStream/Table in Scala/Java/SQL | Geospatial data processing on Apache Flink |
Viz | Spark RDD/DataFrame in Scala/Java/SQL | Geospatial data visualization on Apache Spark |
Python | Spark RDD/DataFrame in Python | Python wrapper for Sedona |
R | Spark RDD/DataFrame in R | R wrapper for Sedona |
Zeppelin | Apache Zeppelin | Plugin for Apache Zeppelin 0.8.1+ |
Please visit Apache Sedona website for detailed information
Follow Sedona on Twitter for fresh news: Sedona@Twitter
Join the Sedona Discord community:
Sedona JIRA: Bugs, Pull Requests, and other similar issues
Sedona Mailing Lists: dev@sedona.apache.org: project development, general questions or tutorials.
Package Download Statistics:
Download statistics | Maven | PyPI | CRAN |
---|---|---|---|
Apache Sedona | 180k/month | ||
Archived GeoSpark releases | 10k/month |
Our users and code contributors are from:
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。