java - Grouping similar items from CSV file column as primary key -
i have large csv file data similar this
user id group abc group1 def group2 abc group3 ghi group4 xyz group2 uvw group5 xyz group1 abc group1 def group2
i need group these items in such way number of times group attribute repeated in user id , value such that
abc group1 ->2 abc group3 ->1 def group2 ->2 ghi group4 ->1 uvw group5 ->1 xyz group2 ->1 xyz group1 ->1
are there clustering algorithm this.
in case somethink if don't want store data in memory:
public class tester { public static multiset<string> getmultisetfromcsv(string csvfilename, string linedelimiter) throws ioexception { multiset<string> mapper = treemultiset.create(); bufferedreader reader = null; try { reader = new bufferedreader(new filereader(csvfilename)); string[] currlinesplitted; while(reader.ready()) { currlinesplitted = reader.readline().split(linedelimiter); mapper.add(currlinesplitted[0] + "-" + currlinesplitted[1]); } return mapper; } { if(reader != null) reader.close(); } } public static void main(string[] args) throws ioexception { multiset<string> set = getmultisetfromcsv("csv", ","); for(string key : set.elementset()) { system.out.println(key + " : " + set.count(key)); } }
}
in way you're able construct map easily. after that, each key can count number of items associated using count method.
Comments
Post a Comment