首页 > 其他分享 >[1080] Remove duplicated records based on a specific column in GeoPandas

[1080] Remove duplicated records based on a specific column in GeoPandas

时间:2024-12-04 14:22:51浏览次数:4  
标签:based Point column 1080 keep duplicates ID

To remove duplicated records based on a specific column in GeoPandas, you can use the drop_duplicates method. Here's how you can do it:

Example Script

import geopandas as gpd
from shapely.geometry import Point

# Sample GeoDataFrame
data = {
    'ID': [1, 2, 2, 3, 4, 4, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'],
    'geometry': [Point(1, 2), Point(2, 3), Point(3, 4), Point(4, 5), Point(5, 6), Point(6, 7), Point(7, 8)]
}
gdf = gpd.GeoDataFrame(data, crs="EPSG:4326")

# Remove duplicated records based on the 'ID' column
gdf_cleaned = gdf.drop_duplicates(subset='ID', keep='first')

print(gdf_cleaned)

Explanation:

  • Import Libraries: Import geopandas and Point from shapely.geometry.
  • Create a Sample GeoDataFrame: Define a GeoDataFrame with a column (ID) that contains duplicate values.
  • Drop Duplicates: Use the drop_duplicates method with the subset parameter set to the column of interest (ID in this case). The keep='first' parameter ensures that only the first occurrence of each duplicate is retained. You can also use keep='last' to keep the last occurrence or keep=False to drop all duplicates.

Result:

This script will return a GeoDataFrame with duplicates removed based on the specified column.

Example Output:

   ID     Name                 geometry
0   1    Alice  POINT (1.00000 2.00000)
1   2      Bob  POINT (2.00000 3.00000)
3   3    David  POINT (4.00000 5.00000)
4   4      Eve  POINT (5.00000 6.00000)

In this example, only the first occurrence of each ID is kept, and all subsequent duplicates are removed.

Feel free to try this out, and let me know if you need any further assistance or have any other questions!

标签:based,Point,column,1080,keep,duplicates,ID
From: https://www.cnblogs.com/alex-bn-lee/p/18586193

相关文章